3 nodes in a cluster, each can be a master and is a datanode. Elasticsearch 7.9.1 build opendistro from amazon.
In addition, we write to Elasticsearch using Apache Metron and when we try to write sometimes, we get errors:
java.io.IOException: listener timeout after waiting for [30000] ms
We faced with this problem again. Now I have log, but can’t understang how to load it to this forum. Seems that I can’t change my initial message, and can’t attach to new message any file that not a picture.
Related logs:
Master node(esnode1)
[2021-03-25T13:58:31,547][INFO ][o.e.c.c.C.CoordinatorPublication] [esnode1] after [10s] publication of cluster state version [4502] is still waiting for {esnode5}{RPzC_iENSOiSynpEvT0zag}{T4D5QV6jRvu43I_puZ2iXA}{.191}{172.16.22.191:9300}{dir} [SENT_PUBLISH_REQUEST], {esnode6}{MsHrFuhtR2yp0JGSRsqS5w}{lA8_OIWuQm2gZxdXsEDEdA}{172.16.22.192}{172.16.22.192:9300}{dir} [SENT_PUBLISH_REQUEST], {esnode7}{Yj-61cgOQ–50f8cila67Q}{BSFsIQiSSKSPFUCQ5g65dQ}{172.16.22.193}{172.16.22.193:9300}{dir} [SENT_PUBLISH_REQUEST], {esnode4}{1jH5VI7PQbuLNVWtxhvw8Q}{RpBC6FawS82nBTCjvkh9LQ}{172.16.22.190}{172.16.22.190:9300}{dir} [SENT_PUBLISH_REQUEST]
[2021-03-25T13:58:51,549][WARN ][o.e.c.c.C.CoordinatorPublication] [esnode1] after [30s] publication of cluster state version [4502] is still waiting for {esnode5}{RPzC_iENSOiSynpEvT0zag}{T4D5QV6jRvu43I_puZ2iXA}{172.16.22.191}{172.16.22.191:9300}{dir} [SENT_PUBLISH_REQUEST], {esnode6}{MsHrFuhtR2yp0JGSRsqS5w}{lA8_OIWuQm2gZxdXsEDEdA}{172.16.22.192}{172.16.22.192:9300}{dir} [SENT_PUBLISH_REQUEST], {esnode7}{Yj-61cgOQ–50f8cila67Q}{BSFsIQiSSKSPFUCQ5g65dQ}{172.16.22.193}{172.16.22.193:9300}{dir} [SENT_PUBLISH_REQUEST], {esnode4}{1jH5VI7PQbuLNVWtxhvw8Q}{RpBC6FawS82nBTCjvkh9LQ}{172.16.22.190}{172.16.22.190:9300}{dir} [SENT_PUBLISH_REQUEST]
[2021-03-25T13:58:51,552][INFO ][o.e.c.r.a.AllocationService] [esnode1] updating number_of_replicas to [4] for indices [.opendistro_security]
[2021-03-25T13:58:51,556][INFO ][o.e.c.s.MasterService ] [esnode1] node-left[{esnode8}{HeEjBS5JSCSYeP2zr2MPWA}{nHAb6LFYT6-YjtXO72CO2g}{172.16.22.194}{172.16.22.194:9300}{dir} reason: followers check retry count exceeded], term: 316, version: 4503, delta: removed {{esnode8}{HeEjBS5JSCSYeP2zr2MPWA}{nHAb6LFYT6-YjtXO72CO2g}{172.16.22.194}{172.16.22.194:9300}{dir}}
[2021-03-25T13:59:01,558][INFO ][o.e.c.c.C.CoordinatorPublication] [esnode1] after [10s] publication of cluster state version [4503] is still waiting for {esnode5}{RPzC_iENSOiSynpEvT0zag}{T4D5QV6jRvu43I_puZ2iXA}{172.16.22.191}{172.16.22.191:9300}{dir} [SENT_PUBLISH_REQUEST], {esnode6}{MsHrFuhtR2yp0JGSRsqS5w}{lA8_OIWuQm2gZxdXsEDEdA}{172.16.22.192}{172.16.22.192:9300}{dir} [SENT_PUBLISH_REQUEST], {esnode7}{Yj-61cgOQ–50f8cila67Q}{BSFsIQiSSKSPFUCQ5g65dQ}{172.16.22.193}{172.16.22.193:9300}{dir} [SENT_PUBLISH_REQUEST], {esnode4}{1jH5VI7PQbuLNVWtxhvw8Q}{RpBC6FawS82nBTCjvkh9LQ}{172.16.22.190}{172.16.22.190:9300}{dir} [SENT_PUBLISH_REQUEST]
[2021-03-25T13:59:21,558][INFO ][o.e.c.s.ClusterApplierService] [esnode1] removed {{esnode8}{HeEjBS5JSCSYeP2zr2MPWA}{nHAb6LFYT6-YjtXO72CO2g}{172.16.22.194}{172.16.22.194:9300}{dir}}, term: 316, version: 4503, reason: Publication{term=316, version=4503}
Data node(esnode5)
[2021-03-25T14:00:24,436][INFO ][o.e.c.c.Coordinator ] [esnode5] master node [{esnode1}{raDLHjOiTYaY_5ckIjnLVA}{VlAg-gG5Q72y0KTORWm-uQ}{172.16.22.153}{172.16.22.153:9300}{imr}] failed, restarting discovery
org.elasticsearch.ElasticsearchException: node [{esnode1}{raDLHjOiTYaY_5ckIjnLVA}{VlAg-gG5Q72y0KTORWm-uQ}{172.16.22.153}{172.16.22.153:9300}{imr}] failed [3] consecutive checks
at org.elasticsearch.cluster.coordination.LeaderChecker$CheckScheduler$1.handleException(LeaderChecker.java:293) ~[elasticsearch-7.10.2.jar:7.10.2]
Caused by: org.elasticsearch.transport.RemoteTransportException: [esnode1][172.16.22.153:9300][internal:coordination/fault_detection/leader_check]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: rejecting leader check since [{esnode5}{RPzC_iENSOiSynpEvT0zag}{T4D5QV6jRvu43I_puZ2iXA}{172.16.22.191}{172.16.22.191:9300}{dir}] has been removed from the cluster
Update this issue. We found another post from github opendistro security issue #378.
The same situation, user login using AD, then node crash. Reference below link.