Opensearch dashboard is not getting log lines after 10-15 minutes

Opensearch and OpenSearch-Dashboard Version: 2.8.0
OS: CentOS7

I have a log ingestion pipeline using fluentbit, data prepper and opensearch. The pipe line sends log lines to opensearch and it shows in opensearch dashboards. But after 10-15 minutes of start, opensearch dashboard does not receive log lines. I am not sure what is exact issue. The fluentbit has to read log lines from log file using tail as input and http as output and it has to read 100 log lines per second in average. The log lines are filtered and sent to data prepper and then to opensearch sink.

It would be great if some one provide me some info about the possible cause for this.

Thanks for help in advance.

@rmstmg Could you share config files of fluentbit and data pepper?

Have you noticed any errors in the logs of either fluentbit, data prepper or OpenSearch?

Hello pablo,

I really appreciate that you are interested to help me out.
I have setup both fluentbit and data prepper on the same server, I am getting data prepper warning as below:

2023-08-01T04:08:57,034 [log-pipeline-sink-worker-2-thread-2] WARN  org.opensearch.dataprepper.plugins.sink.opensearch.BulkRetryStrategy - Bulk Operation Failed. Number of retries 5. Retrying...
java.net.ConnectException: Timeout connecting to [opensearch.serverdomain.com/server_ip:9200]
        at org.opensearch.client.RestClient.extractAndWrapCause(RestClient.java:953) ~[opensearch-rest-client-2.7.0.jar:?]
        at org.opensearch.client.RestClient.performRequest(RestClient.java:332) ~[opensearch-rest-client-2.7.0.jar:?]
        at org.opensearch.client.RestClient.performRequest(RestClient.java:320) ~[opensearch-rest-client-2.7.0.jar:?]
        at org.opensearch.client.transport.rest_client.RestClientTransport.performRequest(RestClientTransport.java:143) ~[opensearch-java-2.5.0.jar:?]
        at org.opensearch.client.opensearch.OpenSearchClient.bulk(OpenSearchClient.java:217) ~[opensearch-java-2.5.0.jar:?]
        at org.opensearch.dataprepper.plugins.sink.opensearch.OpenSearchSink.lambda$doInitializeInternal$1(OpenSearchSink.java:185) ~[opensearch-2.3.1.jar:?]
        at org.opensearch.dataprepper.plugins.sink.opensearch.BulkRetryStrategy.handleRetry(BulkRetryStrategy.java:263) ~[opensearch-2.3.1.jar:?]
        at org.opensearch.dataprepper.plugins.sink.opensearch.BulkRetryStrategy.execute(BulkRetryStrategy.java:191) ~[opensearch-2.3.1.jar:?]
        at org.opensearch.dataprepper.plugins.sink.opensearch.OpenSearchSink.lambda$flushBatch$6(OpenSearchSink.java:292) ~[opensearch-2.3.1.jar:?]
        at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:141) ~[micrometer-core-1.10.5.jar:1.10.5]
        at org.opensearch.dataprepper.plugins.sink.opensearch.OpenSearchSink.flushBatch(OpenSearchSink.java:289) ~[opensearch-2.3.1.jar:?]
        at org.opensearch.dataprepper.plugins.sink.opensearch.OpenSearchSink.doOutput(OpenSearchSink.java:274) ~[opensearch-2.3.1.jar:?]
        at org.opensearch.dataprepper.model.sink.AbstractSink.lambda$output$0(AbstractSink.java:64) ~[data-prepper-api-2.3.1.jar:?]
        at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:141) ~[micrometer-core-1.10.5.jar:1.10.5]
        at org.opensearch.dataprepper.model.sink.AbstractSink.output(AbstractSink.java:64) ~[data-prepper-api-2.3.1.jar:?]
        at org.opensearch.dataprepper.pipeline.Pipeline.lambda$publishToSinks$5(Pipeline.java:336) ~[data-prepper-core-2.3.1.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.net.ConnectException: Timeout connecting to [opensearch.serverdomain.com/server_ip:9200]
        at org.apache.http.nio.pool.RouteSpecificPool.timeout(RouteSpecificPool.java:169) ~[httpcore-nio-4.4.15.jar:4.4.15]
        at org.apache.http.nio.pool.AbstractNIOConnPool.requestTimeout(AbstractNIOConnPool.java:632) ~[httpcore-nio-4.4.15.jar:4.4.15]
        at org.apache.http.nio.pool.AbstractNIOConnPool$InternalSessionRequestCallback.timeout(AbstractNIOConnPool.java:898) ~[httpcore-nio-4.4.15.jar:4.4.15]
        at org.apache.http.impl.nio.reactor.SessionRequestImpl.timeout(SessionRequestImpl.java:198) ~[httpcore-nio-4.4.15.jar:4.4.15]
        at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processTimeouts(DefaultConnectingIOReactor.java:213) ~[httpcore-nio-4.4.15.jar:4.4.15]
        at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:158) ~[httpcore-nio-4.4.15.jar:4.4.15]
        at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:351) ~[httpcore-nio-4.4.15.jar:4.4.15]
        at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221) ~[httpasyncclient-4.1.5.jar:4.1.5]
        at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) ~[httpasyncclient-4.1.5.jar:4.1.5]
        ... 1 more
2023-08-01T04:08:57,034 [log-pipeline-sink-worker-2-thread-4] WARN  org.opensearch.dataprepper.plugins.sink.opensearch.BulkRetryStrategy - Bulk Operation Failed. Number of retries 5. Retrying...
java.net.ConnectException: Timeout connecting to [opensearch.serverdomain.com/server_ip:9200]
        at org.opensearch.client.RestClient.extractAndWrapCause(RestClient.java:953) ~[opensearch-rest-client-2.7.0.jar:?]
        at org.opensearch.client.RestClient.performRequest(RestClient.java:332) ~[opensearch-rest-client-2.7.0.jar:?]
        at org.opensearch.client.RestClient.performRequest(RestClient.java:320) ~[opensearch-rest-client-2.7.0.jar:?]
        at org.opensearch.client.transport.rest_client.RestClientTransport.performRequest(RestClientTransport.java:143) ~[opensearch-java-2.5.0.jar:?]
        at org.opensearch.client.opensearch.OpenSearchClient.bulk(OpenSearchClient.java:217) ~[opensearch-java-2.5.0.jar:?]

In fluentbit, some times I get no upstream connection is available error message.
Please find the dataprepper and fluentbit config as below:

fluentbit.conf:

[SERVICE]
    Parsers_File /home/opensearch/regex.conf
    Log_File      /home/opensearch/fluentbit.log
    Log_Level     debug
   

[INPUT]
    Name             tail
    refresh_interval 5
    Path             /home/opensearch/vault-opensearch-logs.txt
    Tag              vault_logs
  
[FILTER]
    Name parser
    Match vault_logs*
    Key_Name log
    Parser cloud-01
    Parser cloud-02
    Parser cloud-03
    Parser bigg-01
    Parser bigg-02
    Parser bigg-03
    Parser bigg-04


[OUTPUT]
    name http
    match *
   
    host localhost
    port 2021
    uri /log/ingest
    format json

Dataprepper config:
dataprepper.yaml:

log-pipeline:
  workers: 4
  delay: "100"
  source:
    http:
      ssl: false
     
  buffer:
    bounded_blocking:
      buffer_size: 1024
      batch_size: 256

  sink:
    - opensearch:
        hosts: [ "https://opensearch.serverdomain.com:9200" ]
        insecure: true
        username: user
        password: password
        index: logs-vault
        max_retries: 200
        bulk_size: 4

I think the issue is with high volume of input log lines. In average, 100 log lines should be processed every second and it is doing job perfectly for 10-15 minutes. But I am not sure what is the correct settings for fluentbit and data prepper to handle 100 or more log lines per seconds. Your help is really appreciated. Thanks.