I’m using:
Opensearch 2.11
Logstash 8.9
My setup has two Logstash Server with equal Config and several Pipelines. My Opensearch cluster has two data/ingestion nodes and three master nodes. The connection from the Logstash servers to one of the data/ingestion nodes keep failing (the other works fine). The connection keeps restoring and as far as I can tell there is no data loss. This keeps happening several times per minute. The opensearch.service also fails from time to time (every other day). All instances run on Linux machines.
I am trying to debug this problem for some days now and have tried performance tuning, but to no success. Although the cluster has to ingest a high amount of events, I don’t think, that this is the problem, since we have a similar setup with no problems and only one of two data/ingestion nodes keeps failing.
Configuration:
Opensearch:
cluster.name: XXX-cluster
node.name: data01
node.roles: [data,ingest]
path.data: /var/lib/opensearch
path.logs: /var/log/opensearch
bootstrap.memory_lock: true
network.host: XX.XX.XX.XX
http.port: 9200
discovery.seed_hosts: [“master01.xx.xx”, “master02.xx.xx”, “master03.xx.xx”, “data01.xx.xx”, “data02.xx.xx”]
cluster.initial_cluster_manager_nodes: [“master01.xx.xx”, “master02.xx.xx”, “master03.xx.xx”]
plugins.security.ssl.transport.pemcert_filepath: /etc/opensearch/certs/data01.crt
plugins.security.ssl.transport.pemkey_filepath: /etc/opensearch/certs/data01.pem
plugins.security.ssl.transport.pemtrustedcas_filepath: /etc/opensearch/certs/ca.pem
plugins.security.ssl.transport.enforce_hostname_verification: false
plugins.security.ssl.http.enabled: true
plugins.security.ssl.http.pemcert_filepath: /etc/opensearch/certs/data01.crt
plugins.security.ssl.http.pemkey_filepath: /etc/opensearch/certs/data01.pem
plugins.security.ssl.http.pemtrustedcas_filepath: /etc/opensearch/certs/ca.pem
plugins.security.allow_unsafe_democertificates: false
plugins.security.allow_default_init_securityindex: true
plugins.security.authcz.admin_dn:
- ‘XXX’
plugins.security.nodes_dn:- ‘XXX’
- …
- …
plugins.security.audit.type: internal_opensearch
plugins.security.enable_snapshot_restore_privilege: true
plugins.security.check_snapshot_restore_write_privileges: true
plugins.security.restapi.roles_enabled: [“all_access”, “security_rest_api_access”]
plugins.security.system_indices.enabled: true
plugins.security.system_indices.indices: [“.plugins-ml-model”, “.plugins-ml-task”, “.opendistro-alerting-config”, “.opendistro-alerting-alert*”, “.opendistro-anomaly-results*”, “.opendistro-anomaly-detector*”, “.opendistro-anomaly-checkpoints”, “.opendistro-anomaly-detection-state”, “.opendistro-reports-", ".opensearch-notifications-”, “.opensearch-notebooks”, “.opensearch-observability”, “.ql-datasources”, “.opendistro-asynchronous-search-response*”, “.replication-metadata-store”, “.opensearch-knn-models”]http.pipelining.max_events: 100000
knn.algo_param.index_thread_qty: 20
Logstash-Pipeline:
output{
opensearch {
hosts => [“ht…data01.xxx.xxx:9200”,“ht…data02.xxx.xxx:9200”]
user => “XXX”
password => “XXX”
index => “web-%{+yyyy.MM.dd}”
ssl => true
ssl_certificate_verification => true
cacert => “/etc/logstash/certs/ca.pem”
timeout => 600
}
}
**Relevant Logs: **
Opensearch (shortened for readability):
[2023-12-05T15:02:47,424][ERROR][o.o.s.s.h.n.SecuritySSLNettyHttpServerTransport] [data01.xx.xx] Exception during establishing a SSL connection: null
[2023-12-05T15:02:47,424][WARN ][o.o.h.AbstractHttpServerTransport] [data01.xx.xx] caught exception while handling client http traffic, closing connection Netty4HttpChannel{localAddress=/xx.xx.xx.xx:9200, remoteAddress=/xx.xx.xx.xx:59962}
io.netty.handler.codec.compression.DecompressionException: CRC value mismatch. Expected: 1398561884, Got: 1034882459
at java.lang.Thread.run(Thread.java:833) [?:?]
[2023-12-05T15:02:47,426][ERROR][o.o.s.s.h.n.SecuritySSLNettyHttpServerTransport] [data01.xx.xx] Exception during establishing a SSL connection: io.netty.handler.codec.PrematureChannelClosureException: Channel closed while still aggregating message
io.netty.handler.codec.PrematureChannelClosureException: Channel closed while still aggregating message
at java.lang.Thread.run(Thread.java:833) [?:?]
[2023-12-05T15:02:47,426][WARN ][o.o.h.AbstractHttpServerTransport] [data01.xx.xx] caught exception while handling client http traffic, closing connection Netty4HttpChannel{localAddress=/xx.xx.xx.xx:9200, remoteAddress=/xx.xx.xx.xx:59962}
io.netty.handler.codec.PrematureChannelClosureException: Channel closed while still aggregating message
at java.lang.Thread.run(Thread.java:833) [?:?]
Logstash:
[2023-12-05T15:05:56,156][WARN ][logstash.outputs.opensearch][web] Restored connection to OpenSearch instance {:url=>“ht…logstash_user:xxxxxx@data01.xx.xx:9200/”}
[2023-12-05T15:07:19,747][WARN ][logstash.outputs.opensearch][web][2964b1369773f13ee2024ed0ffaddf9cf07364df9e7737ea71148f2051d37781] Marking url as dead. Last error: [LogStash::Outputs::OpenSearch::HttpClient::Pool::HostUnreachableError] OpenSearch Unreachable: [ht…logstash_user:xxxxxx@data01.xx.xx:9200/][Manticore::ClientProtocolException] data01.xx.xx:9200 failed to respond {:url=>ht…logstash_user:xxxxxx@data01.xx.xx:9200/, :error_message=>“OpenSearch Unreachable: [ht…logstash_user:xxxxxx@data01.xx.xx:9200/][Manticore::ClientProtocolException] data01.xx.xx:9200 failed to respond”, :error_class=>“LogStash::Outputs::OpenSearch::HttpClient::Pool::HostUnreachableError”}
[2023-12-05T15:07:19,747][ERROR][logstash.outputs.opensearch][web][2964b1369773f13ee2024ed0ffaddf9cf07364df9e7737ea71148f2051d37781] Attempted to send a bulk request but OpenSearch appears to be unreachable or down {:message=>“OpenSearch Unreachable: [ht…logstash_user:xxxxxx@data01xx.xx:9200/][Manticore::ClientProtocolException] data01.xx.xx:9200 failed to respond”, :exception=>LogStash::Outputs::OpenSearch::HttpClient::Pool::HostUnreachableError, :will_retry_in_seconds=>2}
[2023-12-05T15:07:21,204][WARN ][logstash.outputs.opensearch][web] Restored connection to OpenSearch instance {:url=>“ht…logstash_user:xxxxxx@data01.xx.xx:9200/”}
I’m grateful for any help since I’m quite lost as to how to solve this problem…