Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Opensearch 2.2, ubuntu 20.04
Describe the issue:
Good day!
Suddenly opensearch stops accepting connections from logstash. There is absolutely nothing in the logs of opensearch, only messages about ism, which go on and on all day.
Through opensearch dashboards queries work, you can get information from opensearch
At the same time if you reload logstash, then opensearch unfreezes and continues to work. Logstash does not restart normally, it has to be restarted through kill -9
, which suggests that it has some messages in its queue that it could not send to opensearch.
Has anyone encountered anything like this, or have any ideas on how to debug this?
This has happened twice in the last two months. Not often, but I get nervous if something happens and I don’t know what it is.
Configuration:
8 servers with ryzen5 and 64gb ram, 3 hot nodes with 4tb ssd and 5 warm nodes with 12tb HDD
Logstash installed on every node and receive events from filebeats installed on infrastructure servers. Every logstash server ip set in filebeat config with loadbalance: true
parameter
Relevant Logs or Screenshots:
Logstash logs while problem
Feb 7 13:36:29 server_name filebeat[902892]: 2023-02-07T13:36:29.036Z#011ERROR#011[publisher_pipeline_output]#011pipeline/output.go:180#011failed to publish events: client is not connected
Feb 7 13:36:30 server_name filebeat[902892]: 2023-02-07T13:36:30.564Z#011ERROR#011[publisher_pipeline_output]#011pipeline/output.go:154#011Failed to connect to backoff(async(tcp://10.101.0.30:5046)): dial tcp 10.101.0.30:5046: connect: connection refused
Feb 7 13:36:30 server_name filebeat[902892]: 2023-02-07T13:36:30.874Z#011ERROR#011[publisher_pipeline_output]#011pipeline/output.go:154#011Failed to connect to backoff(async(tcp://opensearch_ip:5046)): dial tcp opensearch_ip:5046: connect: connection refused
Feb 7 13:36:31 server_name filebeat[902892]: 2023-02-07T13:36:31.703Z#011ERROR#011[publisher_pipeline_output]#011pipeline/output.go:154#011Failed to connect to backoff(async(tcp://10.101.0.16:5046)): dial tcp 10.101.0.16:5046: connect: connection refused
Feb 7 13:36:31 server_name filebeat[902892]: 2023-02-07T13:36:31.799Z#011ERROR#011[publisher_pipeline_output]#011pipeline/output.go:154#011Failed to connect to backoff(async(tcp://10.101.0.15:5046)): dial tcp 10.101.0.15:5046: connect: connection refused
Feb 7 13:36:31 server_name filebeat[902892]: 2023-02-07T13:36:31.919Z#011ERROR#011[publisher_pipeline_output]#011pipeline/output.go:154#011Failed to connect to backoff(async(tcp://10.101.0.24:5046)): dial tcp 10.101.0.24:5046: connect: connection refused
Feb 7 13:36:36 server_name filebeat[902892]: 2023-02-07T13:36:36.568Z#011ERROR#011[publisher_pipeline_output]#011pipeline/output.go:154#011Failed to connect to backoff(async(tcp://10.101.0.23:5046)): dial tcp 10.101.0.23:5046: connect: connection refused
Feb 7 13:36:37 server_name filebeat[902892]: 2023-02-07T13:36:37.775Z#011ERROR#011[publisher_pipeline_output]#011pipeline/output.go:154#011Failed to connect to backoff(async(tcp://opensearch_ip:5046)): dial tcp opensearch_ip:5046: connect: connection refused
Opensearch logs as i say have only ISM work logs like
[2023-02-07T00:13:33,459][INFO ][o.o.i.i.ManagedIndexRunner] [servername] Finished executing attempt_transition_step for .ds-***
[2023-02-07T00:13:33,459][INFO ][o.o.i.i.ManagedIndexRunner] [servername] Finished executing attempt_transition_step for .ds-***