We are facing frequent node disconnects with the error message "master not discovered yet: have discovered "
Below are the exceptions we are seeing in the logs.
#1
[2022-09-22T14:15:26,336][WARN ][o.o.c.c.ClusterFormationFailureHelper] [data1] master not discovered yet: have discovered []; discovery will continue using [] from hosts providers and [**] from last-known cluster state; node term 6, last-accepted version 51 in term 6
#2
[2022-09-19T14:09:45,820][DEBUG][o.o.c.c.LeaderChecker ] [] 1 consecutive failures (limit [cluster.fault_detection.leader_check.retry_count] is 3) with leader []
org.opensearch.transport.RemoteTransportException: [][internal:coordination/fault_detection/leader_check]
Caused by: org.opensearch.cluster.coordination.CoordinationStateRejectedException: rejecting leader check since [] has been removed from the cluster
#3
[2022-09-19T14:09:46,825][DEBUG][o.o.c.c.LeaderChecker ] [] 2 consecutive failures (limit [cluster.fault_detection.leader_check.retry_count] is 3) with leader []
org.opensearch.transport.RemoteTransportException: [][internal:coordination/fault_detection/leader_check]
Caused by: org.opensearch.cluster.coordination.CoordinationStateRejectedException: rejecting leader check since [] has been removed from the cluster
#4
[2022-09-19T13:29:00,783][WARN ][o.o.c.c.JoinHelper ] [] last failed join attempt was 5.8s ago, failed to join {} with JoinRequest{sourceNode=, minimumTerm=24, optionalJoin=Optional[Join{term=24, lastAcceptedTerm=0, lastAcceptedVersion=0, sourceNode={shard_indexing_pressure_enabled=true}, targetNode={}
org.opensearch.transport.RemoteTransportException: [][internal:cluster/coordination/join]
Caused by: org.opensearch.transport.ConnectTransportException: [**] general node connection failure
Oh boi there has to be a lot to unfold, can you share your opensearch.yml configuration?
Also note you have to update configuration files (whatever the change) on every master node when working with opensearch as a service type of cluster.
Hey, I was off the grid for quite some time in Opensearch comunity. @datapal
By any chance do you have every node’s IP or DNS in discovery.seed_hosts section? (It has to be copied for every host opensearch.yml configuration Creating a cluster - OpenSearch documentation)
The problem with running as a service is, that every master node (in my case i did it for every node) has to have IP’s or domain names of nodes they need to form cluster with.
I have provided the service names in the discovery.seed_hosts section. In my case, the issue seems to be with the docker network as I am setting up the cluster on Swarm.