OpenSearch Security Not Initialized although it was worked normally before

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
latest

Describe the issue:
I have 5 nodes (3 master, 2 data) and some reason cluster down with 503 error cluster_manager_not_discovered_exception
I have stop opensearch service on all nodes and restart from master node init cluster, I encounter error while start service again. I also update data role for master node init cluster in order to create security indices.
Refer: Open Search Security Not Initialized - #9 by ddodoo

$ systemctl restart opensearch
[2024-09-16T09:32:03,433][ERROR][o.o.s.a.BackendRegistry  ] [es8-master-2] Not yet initialized (you may need to run securityadmin)

$ /usr/share/opensearch/plugins/opensearch-security/tools/securityadmin.sh -cd /etc/opensearch/opensearch-security/ -cacert /etc/opensearch/certs/root-ca.pem -cert /etc/opensearch/certs/admin.pem -key /etc/opensearch/certs/admin-key.pem -icl -nhnv -arc --diagnose

Diagnostic securityadmin trace
OpenSearch client version: 2.15.0

Who am i:
{
  "dn" : "CN=es8.mycompany.com,OU=Infra,O=mycompany,L=Ho Chi Minh City,ST=District 3,C=VN",
  "is_admin" : true,
  "is_node_certificate_request" : false
}ClusterHealthRequest:
OpenSearchStatusException[OpenSearch exception [type=cluster_manager_not_discovered_exception, reason=null]]
	at org.opensearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:210)
	at org.opensearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:2235)
	at org.opensearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:2212)
	at org.opensearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1931)
	at org.opensearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1884)
	at org.opensearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1852)
	at org.opensearch.client.ClusterClient.health(ClusterClient.java:162)
	at org.opensearch.security.tools.SecurityAdmin.generateDiagnoseTrace(SecurityAdmin.java:1261)
	at org.opensearch.security.tools.SecurityAdmin.execute(SecurityAdmin.java:718)
	at org.opensearch.security.tools.SecurityAdmin.main(SecurityAdmin.java:162)
	Suppressed: org.opensearch.client.ResponseException: method [GET], host [https://localhost:9200], URI [/_cluster/health?master_timeout=30s&level=cluster&timeout=30s], status line [HTTP/1.1 503 Service Unavailable]
{"error":{"root_cause":[{"type":"cluster_manager_not_discovered_exception","reason":null}],"type":"cluster_manager_not_discovered_exception","reason":null},"status":503}
		at org.opensearch.client.RestClient.convertResponse(RestClient.java:376)
		at org.opensearch.client.RestClient.performRequest(RestClient.java:346)
		at org.opensearch.client.RestClient.performRequest(RestClient.java:321)
		at org.opensearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1918)
		... 6 more

NodesInfoResponse:
{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "es8-infra",
  "nodes" : {
    "vGI50yBrTYWz7WnG8qY-Cg" : {
      "name" : "es8-master-2",
      "transport_address" : "172.21.159.102:9300",
      "host" : "es8-master-2.mycompany.com",
      "ip" : "172.21.159.102",
      "version" : "2.15.0",
      "build_type" : "rpm",
      "build_hash" : "61dbcd0795c9bfe9b81e5762175414bc38bbcadf",
      "total_indexing_buffer" : 73400320,
      "roles" : [ "cluster_manager", "remote_cluster_client" ],
      "attributes" : {
        "shard_indexing_pressure_enabled" : "true"
      },

PendingClusterTasksRequest:
org.opensearch.client.ResponseException: method [GET], host [https://localhost:9200], URI [/_cluster/pending_tasks], status line [HTTP/1.1 503 Service Unavailable]
{"error":{"root_cause":[{"type":"cluster_manager_not_discovered_exception","reason":null}],"type":"cluster_manager_not_discovered_exception","reason":null},"status":503}
	at org.opensearch.client.RestClient.convertResponse(RestClient.java:376)
	at org.opensearch.client.RestClient.performRequest(RestClient.java:346)
	at org.opensearch.client.RestClient.performRequest(RestClient.java:321)
	at org.opensearch.security.tools.SecurityAdmin.generateDiagnoseTrace(SecurityAdmin.java:1285)
	at org.opensearch.security.tools.SecurityAdmin.execute(SecurityAdmin.java:718)
	at org.opensearch.security.tools.SecurityAdmin.main(SecurityAdmin.java:162)

IndicesStatsRequest:
org.opensearch.client.ResponseException: method [GET], host [https://localhost:9200], URI [/_stats], status line [HTTP/1.1 503 Service Unavailable]
{"error":{"root_cause":[{"type":"cluster_block_exception","reason":"blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];"}],"type":"cluster_block_exception","reason":"blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];"},"status":503}
	at org.opensearch.client.RestClient.convertResponse(RestClient.java:376)
	at org.opensearch.client.RestClient.performRequest(RestClient.java:346)
	at org.opensearch.client.RestClient.performRequest(RestClient.java:321)
	at org.opensearch.security.tools.SecurityAdmin.generateDiagnoseTrace(SecurityAdmin.java:1293)
	at org.opensearch.security.tools.SecurityAdmin.execute(SecurityAdmin.java:718)
	at org.opensearch.security.tools.SecurityAdmin.main(SecurityAdmin.java:162)

Configuration:

---
action.destructive_requires_name: "false"
bootstrap.memory_lock: "false"
cluster.initial_cluster_manager_nodes: ["es8-master-2"]
cluster.name: "es8-infra"
discovery.seed_hosts:
- "es8-master-2.mycompany.com"
- "es8-master-1.mycompany.com"
- "es8-master-3.mycompany.com"
http.port: "9200"
network.host:
- "ip-172-21-159-102.ap-southeast-1.compute.internal"
- "_local_"
network.publish_host: "es8-master-2.mycompany.com"
node.name: "es8-master-2"
node.roles:
- "remote_cluster_client"
- "cluster_manager"
- "data"
path.data:
- "/mnt/b3/es8-infra/opensearch"
path.logs: "/var/log/opensearch"
compatibility.override_main_response_version: true
opendistro.scheduled_jobs.sweeper.period: 1440m
plugins.security.restapi.roles_enabled: ["all_access", "security_rest_api_access"]
plugins.security.audit.type: internal_opensearch
#plugins.security.disabled: true
plugins.security.allow_unsafe_democertificates: false
plugins.security.ssl_cert_reload_enabled: true
plugins.security.allow_default_init_securityindex: true
plugins.security.system_indices.enabled: true
plugins.security.system_indices.indices: [".opendistro-security", ".opensearch-observability"]
plugins.security.ssl.transport.enforce_hostname_verification: false
plugins.security.ssl.transport.enabled: true
plugins.security.ssl.transport.pemcert_filepath: "/etc/opensearch/config/master-2.pem"
plugins.security.ssl.transport.pemkey_filepath: "/etc/opensearch/config/master-2-key.pem"
plugins.security.ssl.transport.pemtrustedcas_filepath: "/etc/opensearch/config/root-ca.pem"
plugins.security.ssl.http.enabled: true
plugins.security.ssl.http.pemcert_filepath: "/etc/opensearch/config/master-2.pem"
plugins.security.ssl.http.pemkey_filepath: "/etc/opensearch/config/master-2-key.pem"
plugins.security.ssl.http.pemtrustedcas_filepath: "/etc/opensearch/config/root-ca.pem"
plugins.security.authcz.admin_dn:
- CN=es8.mycompany.com,OU=Infra,O=mycompany,L=Ho Chi Minh City,ST=District 3,C=VN
plugins.security.nodes_dn:
- 'CN=es8-*.mycompany.com,OU=Infra,O=mycompany,L=Ho Chi Minh City,ST=District 3,C=VN'
- 'CN=es8-master-1.mycompany.com,OU=Infra,O=mycompany,L=Ho Chi Minh City,ST=District 3,C=VN'
- 'CN=es8-master-2.mycompany.com,OU=Infra,O=mycompany,L=Ho Chi Minh City,ST=District 3,C=VN'
- 'CN=es8-master-3.mycompany.com,OU=Infra,O=mycompany,L=Ho Chi Minh City,ST=District 3,C=VN'
- 'CN=es8-data-h-01.mycompany.com,OU=Infra,O=mycompany,L=Ho Chi Minh City,ST=District 3,C=VN'
- 'CN=es8-data-h-02.mycompany.com,OU=Infra,O=mycompany,L=Ho Chi Minh City,ST=District 3,C=VN'

Hi @huynguyenb3, can you test it with private IP addresses of your master eligible nodes instead of FQDN?

best,
mj

Hi Mantas, thank you very much for your considering.
I removed all seed host except master init cluster and private IP address, I replace to

discovery.seed_hosts:
- "172.21.159.102"

but it still 503 Service Unavailable when run securityadmin.sh

@huynguyenb3, How many nodes are you starting before running securityadmin.sh ?

There are 5 nodes ( 3 master, 2 data), currently I was stopped all node and troubleshoot from scratch init cluster, I probably accept lost data and indices for everything will be ok again

How many nodes are running at the moment (when you run:

all of them or just es8-master-2.mycompany.com )?

yeah, only 1 master node es8-master-2.mycompany.com

Can you start another node (at the same time) from the same cluster and tell if it is joining the cluster? and what happens when you run the securityadmin.sh ?