Hello i don’t understand what’s going on. I have cluster from 3 nodes: es01,es02,es03.
Sometimes nodes can’t get checker messages, in result node leave cluster
es01 master log:
[2021-01-11T11:36:47,630][TRACE][o.e.c.c.FollowersChecker ] [h1-es01] handleWakeUp: checking {h1-es03}{Qshtg7-TQIyxeiccpkmlIA}{irRkTH7XQt63qEIGc-SWjA}{<es03_ip>}{<es03_ip>:9300}{dimr} with FollowerCheckRequest{term=129, sender=
{h1-es01}{MT3BSgtaQBWux8BJDBSsHg}{5OJxyuZrR7iXpeL4OqyDiQ}{<es01_ip>}{<es01_ip>:9300}{dimr}}
[2021-01-11T11:36:47,633][TRACE][o.e.c.c.FollowersChecker ] [h1-es01] FollowerChecker{discoveryNode={h1-es03}{Qshtg7-TQIyxeiccpkmlIA}{irRkTH7XQt63qEIGc-SWjA}{<es03_ip>}{<es03_ip>:9300}{dimr}, failureCountSinceLastSuccess=0, [cl
uster.fault_detection.follower_check.retry_count]=3} check successful
[2021-01-11T11:36:48,220][TRACE][o.e.c.NodeConnectionsService] [h1-es01] connectDisconnectedTargets: {{h1-es03}{Qshtg7-TQIyxeiccpkmlIA}{irRkTH7XQt63qEIGc-SWjA}{<es03_ip>}{<es03_ip>:9300}{dimr}=ConnectionTarget{discoveryNode={h1
-es03}{Qshtg7-TQIyxeiccpkmlIA}{irRkTH7XQt63qEIGc-SWjA}{<es03_ip>}{<es03_ip>:9300}{dimr}, activityType=IDLE}, {h1-es01}{MT3BSgtaQBWux8BJDBSsHg}{5OJxyuZrR7iXpeL4OqyDiQ}{<es01_ip>}{<es01_ip>:9300}{dimr}=ConnectionTarget{
discoveryNode={h1-es01}{MT3BSgtaQBWux8BJDBSsHg}{5OJxyuZrR7iXpeL4OqyDiQ}{<es01_ip>}{<es01_ip>:9300}{dimr}, activityType=IDLE}, {h1-es02}{qgmMV2UbT-ScN9uRr6YM8g}{1Oc8tIBFR428oBjV_uLjHw}{<es02_ip>}{<es02_ip>:9300}{dimr}=
ConnectionTarget{discoveryNode={h1-es02}{qgmMV2UbT-ScN9uRr6YM8g}{1Oc8tIBFR428oBjV_uLjHw}{<es02_ip>}{<es02_ip>:9300}{dimr}, activityType=IDLE}}
[2021-01-11T11:36:48,548][TRACE][o.e.c.c.LeaderChecker ] [h1-es01] handling LeaderCheckRequest{sender={h1-es02}{qgmMV2UbT-ScN9uRr6YM8g}{1Oc8tIBFR428oBjV_uLjHw}{<es02_ip>}{<es02_ip>:9300}{dimr}}
[2021-01-11T11:36:48,622][TRACE][o.e.c.c.FollowersChecker ] [h1-es01] handleWakeUp: checking {h1-es02}{qgmMV2UbT-ScN9uRr6YM8g}{1Oc8tIBFR428oBjV_uLjHw}{<es02_ip>}{<es02_ip>:9300}{dimr} with FollowerCheckRequest{term=129, sender=
{h1-es01}{MT3BSgtaQBWux8BJDBSsHg}{5OJxyuZrR7iXpeL4OqyDiQ}{<es01_ip>}{<es01_ip>:9300}{dimr}}
[2021-01-11T11:36:48,623][TRACE][o.e.c.c.FollowersChecker ] [h1-es01] FollowerChecker{discoveryNode={h1-es02}{qgmMV2UbT-ScN9uRr6YM8g}{1Oc8tIBFR428oBjV_uLjHw}{<es02_ip>}{<es02_ip>:9300}{dimr}, failureCountSinceLastSuccess=0, [cl
uster.fault_detection.follower_check.retry_count]=3} check successful
[2021-01-11T11:36:48,634][TRACE][o.e.c.c.FollowersChecker ] [h1-es01] handleWakeUp: checking {h1-es03}{Qshtg7-TQIyxeiccpkmlIA}{irRkTH7XQt63qEIGc-SWjA}{<es03_ip>}{<es03_ip>:9300}{dimr} with FollowerCheckRequest{term=129, sender=
{h1-es01}{MT3BSgtaQBWux8BJDBSsHg}{5OJxyuZrR7iXpeL4OqyDiQ}{<es01_ip>}{<es01_ip>:9300}{dimr}}
[2021-01-11T11:36:48,635][TRACE][o.e.c.c.FollowersChecker ] [h1-es01] FollowerChecker{discoveryNode={h1-es03}{Qshtg7-TQIyxeiccpkmlIA}{irRkTH7XQt63qEIGc-SWjA}{<es03_ip>}{<es03_ip>:9300}{dimr}, failureCountSinceLastSuccess=0, [cl
uster.fault_detection.follower_check.retry_count]=3} check successful
[2021-01-11T11:36:49,550][TRACE][o.e.c.c.LeaderChecker ] [h1-es01] handling LeaderCheckRequest{sender={h1-es02}{qgmMV2UbT-ScN9uRr6YM8g}{1Oc8tIBFR428oBjV_uLjHw}{<es02_ip>}{<es02_ip>:9300}{dimr}}
[2021-01-11T11:36:49,624][TRACE][o.e.c.c.FollowersChecker ] [h1-es01] handleWakeUp: checking {h1-es02}{qgmMV2UbT-ScN9uRr6YM8g}{1Oc8tIBFR428oBjV_uLjHw}{<es02_ip>}{<es02_ip>:9300}{dimr} with FollowerCheckRequest{term=129, sender=
{h1-es01}{MT3BSgtaQBWux8BJDBSsHg}{5OJxyuZrR7iXpeL4OqyDiQ}{<es01_ip>}{<es01_ip>:9300}{dimr}}
[2021-01-11T11:36:49,625][TRACE][o.e.c.c.FollowersChecker ] [h1-es01] FollowerChecker{discoveryNode={h1-es02}{qgmMV2UbT-ScN9uRr6YM8g}{1Oc8tIBFR428oBjV_uLjHw}{<es02_ip>}{<es02_ip>:9300}{dimr}, failureCountSinceLastSuccess=0, [cl
uster.fault_detection.follower_check.retry_count]=3} check successful
[2021-01-11T11:36:49,636][TRACE][o.e.c.c.FollowersChecker ] [h1-es01] handleWakeUp: checking {h1-es03}{Qshtg7-TQIyxeiccpkmlIA}{irRkTH7XQt63qEIGc-SWjA}{<es03_ip>}{<es03_ip>:9300}{dimr} with FollowerCheckRequest{term=129, sender=
{h1-es01}{MT3BSgtaQBWux8BJDBSsHg}{5OJxyuZrR7iXpeL4OqyDiQ}{<es01_ip>}{<es01_ip>:9300}{dimr}}
[2021-01-11T11:36:49,637][TRACE][o.e.c.c.FollowersChecker ] [h1-es01] FollowerChecker{discoveryNode={h1-es03}{Qshtg7-TQIyxeiccpkmlIA}{irRkTH7XQt63qEIGc-SWjA}{<es03_ip>}{<es03_ip>:9300}{dimr}, failureCountSinceLastSuccess=0, [cl
uster.fault_detection.follower_check.retry_count]=3} check successful
[2021-01-11T11:36:50,552][TRACE][o.e.c.c.LeaderChecker ] [h1-es01] handling LeaderCheckRequest{sender={h1-es02}{qgmMV2UbT-ScN9uRr6YM8g}{1Oc8tIBFR428oBjV_uLjHw}{<es02_ip>}{<es02_ip>:9300}{dimr}}
[2021-01-11T11:36:50,626][TRACE][o.e.c.c.FollowersChecker ] [h1-es01] handleWakeUp: checking {h1-es02}{qgmMV2UbT-ScN9uRr6YM8g}{1Oc8tIBFR428oBjV_uLjHw}{<es02_ip>}{<es02_ip>:9300}{dimr} with FollowerCheckRequest{term=129, sender=
{h1-es01}{MT3BSgtaQBWux8BJDBSsHg}{5OJxyuZrR7iXpeL4OqyDiQ}{<es01_ip>}{<es01_ip>:9300}{dimr}}
[2021-01-11T11:36:50,627][TRACE][o.e.c.c.FollowersChecker ] [h1-es01] FollowerChecker{discoveryNode={h1-es02}{qgmMV2UbT-ScN9uRr6YM8g}{1Oc8tIBFR428oBjV_uLjHw}{<es02_ip>}{<es02_ip>:9300}{dimr}, failureCountSinceLastSuccess=0, [cl
uster.fault_detection.follower_check.retry_count]=3} check successful
[2021-01-11T11:36:50,638][TRACE][o.e.c.c.FollowersChecker ] [h1-es01] handleWakeUp: checking {h1-es03}{Qshtg7-TQIyxeiccpkmlIA}{irRkTH7XQt63qEIGc-SWjA}{<es03_ip>}{<es03_ip>:9300}{dimr} with FollowerCheckRequest{term=129, sender=
{h1-es01}{MT3BSgtaQBWux8BJDBSsHg}{5OJxyuZrR7iXpeL4OqyDiQ}{<es01_ip>}{<es01_ip>:9300}{dimr}}
[2021-01-11T11:36:50,639][TRACE][o.e.c.c.FollowersChecker ] [h1-es01] FollowerChecker{discoveryNode={h1-es03}{Qshtg7-TQIyxeiccpkmlIA}{irRkTH7XQt63qEIGc-SWjA}{<es03_ip>}{<es03_ip>:9300}{dimr}, failureCountSinceLastSuccess=0, [cl
uster.fault_detection.follower_check.retry_count]=3} check successful
[2021-01-11T11:36:51,554][TRACE][o.e.c.c.LeaderChecker ] [h1-es01] handling LeaderCheckRequest{sender={h1-es02}{qgmMV2UbT-ScN9uRr6YM8g}{1Oc8tIBFR428oBjV_uLjHw}{<es02_ip>}{<es02_ip>:9300}{dimr}}
[2021-01-11T11:36:51,627][TRACE][o.e.c.c.FollowersChecker ] [h1-es01] handleWakeUp: checking {h1-es02}{qgmMV2UbT-ScN9uRr6YM8g}{1Oc8tIBFR428oBjV_uLjHw}{<es02_ip>}{<es02_ip>:9300}{dimr} with FollowerCheckRequest{term=129, sender=
{h1-es01}{MT3BSgtaQBWux8BJDBSsHg}{5OJxyuZrR7iXpeL4OqyDiQ}{<es01_ip>}{<es01_ip>:9300}{dimr}}
[2021-01-11T11:36:51,629][TRACE][o.e.c.c.FollowersChecker ] [h1-es01] FollowerChecker{discoveryNode={h1-es02}{qgmMV2UbT-ScN9uRr6YM8g}{1Oc8tIBFR428oBjV_uLjHw}{<es02_ip>}{<es02_ip>:9300}{dimr}, failureCountSinceLastSuccess=0, [cl
uster.fault_detection.follower_check.retry_count]=3} check successful
[2021-01-11T11:36:51,639][TRACE][o.e.c.c.FollowersChecker ] [h1-es01] handleWakeUp: checking {h1-es03}{Qshtg7-TQIyxeiccpkmlIA}{irRkTH7XQt63qEIGc-SWjA}{<es03_ip>}{<es03_ip>:9300}{dimr} with FollowerCheckRequest{term=129, sender=
{h1-es01}{MT3BSgtaQBWux8BJDBSsHg}{5OJxyuZrR7iXpeL4OqyDiQ}{<es01_ip>}{<es01_ip>:9300}{dimr}}
[2021-01-11T11:36:51,641][TRACE][o.e.c.c.FollowersChecker ] [h1-es01] FollowerChecker{discoveryNode={h1-es03}{Qshtg7-TQIyxeiccpkmlIA}{irRkTH7XQt63qEIGc-SWjA}{<es03_ip>}{<es03_ip>:9300}{dimr}, failureCountSinceLastSuccess=0, [cl
uster.fault_detection.follower_check.retry_count]=3} check successful
[2021-01-11T11:36:52,557][TRACE][o.e.c.c.LeaderChecker ] [h1-es01] handling LeaderCheckRequest{sender={h1-es02}{qgmMV2UbT-ScN9uRr6YM8g}{1Oc8tIBFR428oBjV_uLjHw}{<es02_ip>}{<es02_ip>:9300}{dimr}}
[2021-01-11T11:36:52,629][TRACE][o.e.c.c.FollowersChecker ] [h1-es01] handleWakeUp: checking {h1-es02}{qgmMV2UbT-ScN9uRr6YM8g}{1Oc8tIBFR428oBjV_uLjHw}{<es02_ip>}{<es02_ip>:9300}{dimr} with FollowerCheckRequest{term=129, sender=
{h1-es01}{MT3BSgtaQBWux8BJDBSsHg}{5OJxyuZrR7iXpeL4OqyDiQ}{<es01_ip>}{<es01_ip>:9300}{dimr}}
[2021-01-11T11:36:52,630][TRACE][o.e.c.c.FollowersChecker ] [h1-es01] FollowerChecker{discoveryNode={h1-es02}{qgmMV2UbT-ScN9uRr6YM8g}{1Oc8tIBFR428oBjV_uLjHw}{<es02_ip>}{<es02_ip>:9300}{dimr}, failureCountSinceLastSuccess=0, [cl
uster.fault_detection.follower_check.retry_count]=3} check successful
[2021-01-11T11:36:52,641][TRACE][o.e.c.c.FollowersChecker ] [h1-es01] handleWakeUp: checking {h1-es03}{Qshtg7-TQIyxeiccpkmlIA}{irRkTH7XQt63qEIGc-SWjA}{<es03_ip>}{<es03_ip>:9300}{dimr} with FollowerCheckRequest{term=129, sender=
{h1-es01}{MT3BSgtaQBWux8BJDBSsHg}{5OJxyuZrR7iXpeL4OqyDiQ}{<es01_ip>}{<es01_ip>:9300}{dimr}}
[2021-01-11T11:36:52,643][TRACE][o.e.c.c.FollowersChecker ] [h1-es01] FollowerChecker{discoveryNode={h1-es03}{Qshtg7-TQIyxeiccpkmlIA}{irRkTH7XQt63qEIGc-SWjA}{<es03_ip>}{<es03_ip>:9300}{dimr}, failureCountSinceLastSuccess=0, [cl
uster.fault_detection.follower_check.retry_count]=3} check successful
[2021-01-11T11:36:53,558][TRACE][o.e.c.c.LeaderChecker ] [h1-es01] handling LeaderCheckRequest{sender={h1-es02}{qgmMV2UbT-ScN9uRr6YM8g}{1Oc8tIBFR428oBjV_uLjHw}{<es02_ip>}{<es02_ip>:9300}{dimr}}
[2021-01-11T11:36:53,631][TRACE][o.e.c.c.FollowersChecker ] [h1-es01] handleWakeUp: checking {h1-es02}{qgmMV2UbT-ScN9uRr6YM8g}{1Oc8tIBFR428oBjV_uLjHw}{<es02_ip>}{<es02_ip>:9300}{dimr} with FollowerCheckRequest{term=129, sender=
{h1-es01}{MT3BSgtaQBWux8BJDBSsHg}{5OJxyuZrR7iXpeL4OqyDiQ}{<es01_ip>}{<es01_ip>:9300}{dimr}}
[2021-01-11T11:36:53,632][TRACE][o.e.c.c.FollowersChecker ] [h1-es01] FollowerChecker{discoveryNode={h1-es02}{qgmMV2UbT-ScN9uRr6YM8g}{1Oc8tIBFR428oBjV_uLjHw}{<es02_ip>}{<es02_ip>:9300}{dimr}, failureCountSinceLastSuccess=0, [cl
uster.fault_detection.follower_check.retry_count]=3} check successful
es03 follower log:
[2021-01-11T11:36:47,633][TRACE][o.e.c.c.FollowersChecker ] [h1-es03] responding to FollowerCheckRequest{term=129, sender={h1-es01}{MT3BSgtaQBWux8BJDBSsHg}{5OJxyuZrR7iXpeL4OqyDiQ}{<es01_ip>}{<es01_ip>:9300}{dimr}} on fast path
[2021-01-11T11:36:48,635][TRACE][o.e.c.c.FollowersChecker ] [h1-es03] responding to FollowerCheckRequest{term=129, sender={h1-es01}{MT3BSgtaQBWux8BJDBSsHg}{5OJxyuZrR7iXpeL4OqyDiQ}{<es01_ip>}{<es01_ip>:9300}{dimr}} on fast path
[2021-01-11T11:36:49,637][TRACE][o.e.c.c.FollowersChecker ] [h1-es03] responding to FollowerCheckRequest{term=129, sender={h1-es01}{MT3BSgtaQBWux8BJDBSsHg}{5OJxyuZrR7iXpeL4OqyDiQ}{<es01_ip>}{<es01_ip>:9300}{dimr}} on fast path
[2021-01-11T11:36:50,638][TRACE][o.e.c.c.FollowersChecker ] [h1-es03] responding to FollowerCheckRequest{term=129, sender={h1-es01}{MT3BSgtaQBWux8BJDBSsHg}{5OJxyuZrR7iXpeL4OqyDiQ}{<es01_ip>}{<es01_ip>:9300}{dimr}} on fast path
[2021-01-11T11:36:51,640][TRACE][o.e.c.c.FollowersChecker ] [h1-es03] responding to FollowerCheckRequest{term=129, sender={h1-es01}{MT3BSgtaQBWux8BJDBSsHg}{5OJxyuZrR7iXpeL4OqyDiQ}{<es01_ip>}{<es01_ip>:9300}{dimr}} on fast path
[2021-01-11T11:36:52,642][TRACE][o.e.c.c.FollowersChecker ] [h1-es03] responding to FollowerCheckRequest{term=129, sender={h1-es01}{MT3BSgtaQBWux8BJDBSsHg}{5OJxyuZrR7iXpeL4OqyDiQ}{<es01_ip>}{<es01_ip>:9300}{dimr}} on fast path
[2021-01-11T11:36:53,644][TRACE][o.e.c.c.FollowersChecker ] [h1-es03] responding to FollowerCheckRequest{term=129, sender={h1-es01}{MT3BSgtaQBWux8BJDBSsHg}{5OJxyuZrR7iXpeL4OqyDiQ}{<es01_ip>}{<es01_ip>:9300}{dimr}} on fast path
[2021-01-11T11:36:54,646][TRACE][o.e.c.c.FollowersChecker ] [h1-es03] responding to FollowerCheckRequest{term=129, sender={h1-es01}{MT3BSgtaQBWux8BJDBSsHg}{5OJxyuZrR7iXpeL4OqyDiQ}{<es01_ip>}{<es01_ip>:9300}{dimr}} on fast path
[2021-01-11T11:36:55,274][DEBUG][o.e.c.c.LeaderChecker ] [h1-es03] 1 consecutive failures (limit [cluster.fault_detection.leader_check.retry_count] is 3) with leader [{h1-es01}{MT3BSgtaQBWux8BJDBSsHg}{5OJxyuZrR7iXpeL4OqyDiQ}{192.168.5
7.101}{<es01_ip>:9300}{dimr}]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [h1-es01][<es01_ip>:9300][internal:coordination/fault_detection/leader_check] request_id [117011842] timed out after [10006ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:1074) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:651) [elasticsearch-7.9.1.jar:7.9.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
[2021-01-11T11:36:55,276][TRACE][o.e.c.c.LeaderChecker ] [h1-es03] scheduling next check of {h1-es01}{MT3BSgtaQBWux8BJDBSsHg}{5OJxyuZrR7iXpeL4OqyDiQ}{<es01_ip>}{<es01_ip>:9300}{dimr} for [cluster.fault_detection.leader_check
.interval] = 1s
So, as you see master answers, but follower don’t get this message. Also you can see that master doesn’t get leader checker messages.