~12 hours ago, our Kibana instance stopped and restarted, which is when instability with our Elasticsearch master nodes began. The Kibana error was:
"message": "{ [security_exception] Open Distro Security not initialized for indices:admin/get :: {\"path\":\"/.kibana\",\"query\":{},\"statusCode\":503,\"response\":\"{\\\"error\\\":{\\\"root_cause\\\":[{\\\"type\\\":\\\"security_exception\\\",\\\"reason\\\":\\\"Open Distro Security not initialized for indices:admin/get\\\"}],\\\"type\\\":\\\"security_exception\\\",\\\"reason\\\":\\\"Open Distro Security not initialized for indices:admin/get\\\"},\\\"status\\\":503}\"}\n at respond (/usr/share/kibana/node_modules/elasticsearch/src/lib/transport.js:349:15)\n at checkRespForFailure (/usr/share/kibana/node_modules/elasticsearch/src/lib/transport.js:306:7)\n at HttpConnector.<anonymous> (/usr/share/kibana/node_modules/elasticsearch/src/lib/connectors/http.js:173:7)\n at IncomingMessage.wrapper (/usr/share/kibana/node_modules/elasticsearch/node_modules/lodash/lodash.js:4929:19)\n at IncomingMessage.emit (events.js:194:15)\n at endReadableNT (_stream_readable.js:1103:12)\n at process._tickCallback (internal/process/next_tick.js:63:19)\n status: 503,\n displayName: 'ServiceUnavailable',\n message:\n '[security_exception] Open Distro Security not initialized for indices:admin/get',\n path: '/.kibana',\n query: {},\n body:\n { error:\n { root_cause: [Array],\n type: 'security_exception',\n reason: 'Open Distro Security not initialized for indices:admin/get' },\n status: 503 },\n statusCode: 503,\n response:\n '{\"error\":{\"root_cause\":[{\"type\":\"security_exception\",\"reason\":\"Open Distro Security not initialized for indices:admin/get\"}],\"type\":\"security_exception\",\"reason\":\"Open Distro Security not initialized for indices:admin/get\"},\"status\":503}',\n toString: [Function],\n toJSON: [Function] }"
This occurred on 2 separate indexes. When we could still access Kibana, The GUI showed another error on the main splash page: “Tenant Indices migration failed in Kibana”
A few hours later, all three of our master nodes either went down or stopped. They couldn’t, and still cannot rejoin the cluster. When running the following command
sh "/usr/share/elasticsearch/plugins/opendistro_security/tools/securityadmin.sh" -cd "/usr/share/elasticsearch/plugins/opendistro_security/securityconfig" -icl -key "/usr/share/elasticsearch/config/kirk-key.pem" -cert "/usr/share/elasticsearch/config/kirk.pem" -cacert "/usr/share/elasticsearch/config/[REDACTED]" -nhnv --accept-red-cluster --diagnose
On Master 2, it returned with the following output:
Open Distro Security Admin v7
Will connect to localhost:9300 ... done
Connected as CN=kirk,OU=client,O=client,L=test,C=de
Elasticsearch Version: 7.4.2
Open Distro Security Version: 1.4.0.0
Diagnostic trace written to: /usr/share/elasticsearch/config/securityadmin_diag_trace_2020-Nov-09_16-30-35.txt
Contacting elasticsearch cluster 'elasticsearch' ...
Clustername: [REDACTED]
Clusterstate: RED
Number of nodes: 103
Number of data nodes: 101
ERR: An unexpected MasterNotDiscoveredException occured: null
Trace:
MasterNotDiscoveredException[null]
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.onTimeout(TransportMasterNodeAction.java:214)
at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:325)
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:252)
at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:598)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.lang.Thread.run(Thread.java:835)
When run on Master 1, it returned with the following output. I ran it a few minutes ago, and the “Number of nodes” was 27 and the “Number of data nodes” was 25. However, I just ran the same command again, and got the following:
Open Distro Security Admin v7
Will connect to localhost:9300 ... done
ERR: Cannot connect to Elasticsearch. Please refer to elasticsearch logfile for more information
Trace:
NoNodeAvailableException[None of the configured nodes are available: [{#transport#-1}{REDACTED}{localhost}{REDACTED:9300}]]
at org.elasticsearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(TransportClientNodesService.java:352)
at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:248)
at org.elasticsearch.client.transport.TransportProxyClient.execute(TransportProxyClient.java:57)
at org.elasticsearch.client.transport.TransportClient.doExecute(TransportClient.java:394)
at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:396)
at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:385)
at com.amazon.opendistroforelasticsearch.security.tools.OpenDistroSecurityAdmin.execute(OpenDistroSecurityAdmin.java:520)
at com.amazon.opendistroforelasticsearch.security.tools.OpenDistroSecurityAdmin.main(OpenDistroSecurityAdmin.java:153)
When trying to get the cluster IDs for these two master nodes, I got the response “Open Distro Security not initialized.”