Logstash Opensearch connection keeps failing

I’m using:
Opensearch 2.11
Logstash 8.9

My setup has two Logstash Server with equal Config and several Pipelines. My Opensearch cluster has two data/ingestion nodes and three master nodes. The connection from the Logstash servers to one of the data/ingestion nodes keep failing (the other works fine). The connection keeps restoring and as far as I can tell there is no data loss. This keeps happening several times per minute. The opensearch.service also fails from time to time (every other day). All instances run on Linux machines.
I am trying to debug this problem for some days now and have tried performance tuning, but to no success. Although the cluster has to ingest a high amount of events, I don’t think, that this is the problem, since we have a similar setup with no problems and only one of two data/ingestion nodes keeps failing.

Configuration:
Opensearch:

cluster.name: XXX-cluster
node.name: data01
node.roles: [data,ingest]
path.data: /var/lib/opensearch
path.logs: /var/log/opensearch
bootstrap.memory_lock: true
network.host: XX.XX.XX.XX
http.port: 9200
discovery.seed_hosts: [“master01.xx.xx”, “master02.xx.xx”, “master03.xx.xx”, “data01.xx.xx”, “data02.xx.xx”]
cluster.initial_cluster_manager_nodes: [“master01.xx.xx”, “master02.xx.xx”, “master03.xx.xx”]
plugins.security.ssl.transport.pemcert_filepath: /etc/opensearch/certs/data01.crt
plugins.security.ssl.transport.pemkey_filepath: /etc/opensearch/certs/data01.pem
plugins.security.ssl.transport.pemtrustedcas_filepath: /etc/opensearch/certs/ca.pem
plugins.security.ssl.transport.enforce_hostname_verification: false
plugins.security.ssl.http.enabled: true
plugins.security.ssl.http.pemcert_filepath: /etc/opensearch/certs/data01.crt
plugins.security.ssl.http.pemkey_filepath: /etc/opensearch/certs/data01.pem
plugins.security.ssl.http.pemtrustedcas_filepath: /etc/opensearch/certs/ca.pem
plugins.security.allow_unsafe_democertificates: false
plugins.security.allow_default_init_securityindex: true
plugins.security.authcz.admin_dn:

  • ‘XXX’
    plugins.security.nodes_dn:
  • ‘XXX’

  • plugins.security.audit.type: internal_opensearch
    plugins.security.enable_snapshot_restore_privilege: true
    plugins.security.check_snapshot_restore_write_privileges: true
    plugins.security.restapi.roles_enabled: [“all_access”, “security_rest_api_access”]
    plugins.security.system_indices.enabled: true
    plugins.security.system_indices.indices: [“.plugins-ml-model”, “.plugins-ml-task”, “.opendistro-alerting-config”, “.opendistro-alerting-alert*”, “.opendistro-anomaly-results*”, “.opendistro-anomaly-detector*”, “.opendistro-anomaly-checkpoints”, “.opendistro-anomaly-detection-state”, “.opendistro-reports-", ".opensearch-notifications-”, “.opensearch-notebooks”, “.opensearch-observability”, “.ql-datasources”, “.opendistro-asynchronous-search-response*”, “.replication-metadata-store”, “.opensearch-knn-models”]

http.pipelining.max_events: 100000
knn.algo_param.index_thread_qty: 20

Logstash-Pipeline:

output{
opensearch {
hosts => [“ht…data01.xxx.xxx:9200”,“ht…data02.xxx.xxx:9200”]
user => “XXX”
password => “XXX”
index => “web-%{+yyyy.MM.dd}”
ssl => true
ssl_certificate_verification => true
cacert => “/etc/logstash/certs/ca.pem”
timeout => 600
}
}

**Relevant Logs: **
Opensearch (shortened for readability):

[2023-12-05T15:02:47,424][ERROR][o.o.s.s.h.n.SecuritySSLNettyHttpServerTransport] [data01.xx.xx] Exception during establishing a SSL connection: null
[2023-12-05T15:02:47,424][WARN ][o.o.h.AbstractHttpServerTransport] [data01.xx.xx] caught exception while handling client http traffic, closing connection Netty4HttpChannel{localAddress=/xx.xx.xx.xx:9200, remoteAddress=/xx.xx.xx.xx:59962}
io.netty.handler.codec.compression.DecompressionException: CRC value mismatch. Expected: 1398561884, Got: 1034882459
at java.lang.Thread.run(Thread.java:833) [?:?]
[2023-12-05T15:02:47,426][ERROR][o.o.s.s.h.n.SecuritySSLNettyHttpServerTransport] [data01.xx.xx] Exception during establishing a SSL connection: io.netty.handler.codec.PrematureChannelClosureException: Channel closed while still aggregating message
io.netty.handler.codec.PrematureChannelClosureException: Channel closed while still aggregating message
at java.lang.Thread.run(Thread.java:833) [?:?]
[2023-12-05T15:02:47,426][WARN ][o.o.h.AbstractHttpServerTransport] [data01.xx.xx] caught exception while handling client http traffic, closing connection Netty4HttpChannel{localAddress=/xx.xx.xx.xx:9200, remoteAddress=/xx.xx.xx.xx:59962}
io.netty.handler.codec.PrematureChannelClosureException: Channel closed while still aggregating message
at java.lang.Thread.run(Thread.java:833) [?:?]

Logstash:

[2023-12-05T15:05:56,156][WARN ][logstash.outputs.opensearch][web] Restored connection to OpenSearch instance {:url=>“ht…logstash_user:xxxxxx@data01.xx.xx:9200/”}
[2023-12-05T15:07:19,747][WARN ][logstash.outputs.opensearch][web][2964b1369773f13ee2024ed0ffaddf9cf07364df9e7737ea71148f2051d37781] Marking url as dead. Last error: [LogStash::Outputs::OpenSearch::HttpClient::Pool::HostUnreachableError] OpenSearch Unreachable: [ht…logstash_user:xxxxxx@data01.xx.xx:9200/][Manticore::ClientProtocolException] data01.xx.xx:9200 failed to respond {:url=>ht…logstash_user:xxxxxx@data01.xx.xx:9200/, :error_message=>“OpenSearch Unreachable: [ht…logstash_user:xxxxxx@data01.xx.xx:9200/][Manticore::ClientProtocolException] data01.xx.xx:9200 failed to respond”, :error_class=>“LogStash::Outputs::OpenSearch::HttpClient::Pool::HostUnreachableError”}
[2023-12-05T15:07:19,747][ERROR][logstash.outputs.opensearch][web][2964b1369773f13ee2024ed0ffaddf9cf07364df9e7737ea71148f2051d37781] Attempted to send a bulk request but OpenSearch appears to be unreachable or down {:message=>“OpenSearch Unreachable: [ht…logstash_user:xxxxxx@data01xx.xx:9200/][Manticore::ClientProtocolException] data01.xx.xx:9200 failed to respond”, :exception=>LogStash::Outputs::OpenSearch::HttpClient::Pool::HostUnreachableError, :will_retry_in_seconds=>2}
[2023-12-05T15:07:21,204][WARN ][logstash.outputs.opensearch][web] Restored connection to OpenSearch instance {:url=>“ht…logstash_user:xxxxxx@data01.xx.xx:9200/”}

I’m grateful for any help since I’m quite lost as to how to solve this problem…

Is that Logstash’s or OpenSearch’s CA cert?

Thank’s for the reply! This is the root certificate to establish trust to the Opensearch cluster.
It seems though there was a bug with Opensearch. Since the update from Opensearch 2.11.0 to 2.11.1 the connection resets only once or twice per hour, which seems manageable to me.
Although I’m stilled puzzled why this keeps happening, this isn’t a major problem to our system anymore.

@georg HostUnreachableError would be related to connection issues.
How Logstash connect to OpenSearch nodes? Do you use any reverse proxy or a load balancer in the front of the cluster?
Do you use VPN connection between Logstash and OpenSearch nodes?

How far, physically, is the Logstash from OpenSearch ingest nodes?
How did you deploy the cluster?

What do you mean by that? Is the OpenSearch service unstable or are you referring to the reported disconnect errors in OpenSearch logs?