3 node opensearch cluster not able initialized

Good afternoon,

i’d like to ask you to help…

im stuck little bit aroung get working opensearch cluster :
Describe the issue:
opensearch is running within rke2 cluster on 3 nodes, so there are 3 pods.
im using custom docker image as statefulset within rke2.
after deploy statefulset all 3 pods are up & running with following errors:

pod: opensearch-cluster-master-0

[2023-04-13T14:41:45,973][ERROR][o.o.s.c.ConfigurationLoaderSecurity7] [opensearch-cluster-master-0] Exception while retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
org.opensearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];
at org.opensearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:205) ~[opensearch-2.6.0.jar:2.6.0]
at org.opensearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:191) ~[opensearch-2.6.0.jar:2.6.0]
at org.opensearch.action.get.TransportMultiGetAction.doExecute(TransportMultiGetAction.java:81) ~[opensearch-2.6.0.jar:2.6.0]
at org.opensearch.action.get.TransportMultiGetAction.doExecute(TransportMultiGetAction.java:58) ~[opensearch-2.6.0.jar:2.6.0]
at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:218) [opensearch-2.6.0.jar:2.6.0]
at org.opensearch.indexmanagement.rollup.actionfilter.FieldCapsFilter.apply(FieldCapsFilter.kt:118) [opensearch-index-management-2.6.0.0.jar:2.6.0.0]
at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) [opensearch-2.6.0.jar:2.6.0]
at org.opensearch.performanceanalyzer.action.PerformanceAnalyzerActionFilter.apply(PerformanceAnalyzerActionFilter.java:78) [opensearch-performance-analyzer-2.6.0.0.jar:2.6.0.0]
at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) [opensearch-2.6.0.jar:2.6.0]
at org.opensearch.security.filter.SecurityFilter.apply0(SecurityFilter.java:232) [opensearch-security-2.6.0.0.jar:2.6.0.0]
at org.opensearch.security.filter.SecurityFilter.apply(SecurityFilter.java:149) [opensearch-security-2.6.0.0.jar:2.6.0.0]
at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) [opensearch-2.6.0.jar:2.6.0]
at org.opensearch.action.support.TransportAction.execute(TransportAction.java:188) [opensearch-2.6.0.jar:2.6.0]
at org.opensearch.action.support.TransportAction.execute(TransportAction.java:107) [opensearch-2.6.0.jar:2.6.0]
at org.opensearch.client.node.NodeClient.executeLocally(NodeClient.java:110) [opensearch-2.6.0.jar:2.6.0]
at org.opensearch.client.node.NodeClient.doExecute(NodeClient.java:97) [opensearch-2.6.0.jar:2.6.0]
at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:465) [opensearch-2.6.0.jar:2.6.0]
at org.opensearch.client.support.AbstractClient.multiGet(AbstractClient.java:581) [opensearch-2.6.0.jar:2.6.0]
at org.opensearch.security.configuration.ConfigurationLoaderSecurity7.loadAsync(ConfigurationLoaderSecurity7.java:208) [opensearch-security-2.6.0.0.jar:2.6.0.0]
at org.opensearch.security.configuration.ConfigurationLoaderSecurity7.load(ConfigurationLoaderSecurity7.java:99) [opensearch-security-2.6.0.0.jar:2.6.0.0]
at org.opensearch.security.configuration.ConfigurationRepository.getConfigurationsFromIndex(ConfigurationRepository.java:372) [opensearch-security-2.6.0.0.jar:2.6.0.0]
at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration0(ConfigurationRepository.java:318) [opensearch-security-2.6.0.0.jar:2.6.0.0]
at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration(ConfigurationRepository.java:303) [opensearch-security-2.6.0.0.jar:2.6.0.0]
at org.opensearch.security.configuration.ConfigurationRepository$1.run(ConfigurationRepository.java:163) [opensearch-security-2.6.0.0.jar:2.6.0.0]
at java.lang.Thread.run(Thread.java:833) [?:?]
[2023-04-13T14:41:48,942][WARN ][o.o.c.c.ClusterFormationFailureHelper] [opensearch-cluster-master-0] cluster-manager not discovered yet, this node has not previously joined a bootstrapped cluster, and [cluster.initial_cluster_manager_nodes] is empty on this node: have discovered [{opensearch-cluster-master-0}{HsYboypaQq-Zk7fipR7Ymw}{PU1iTR7YS_uuhLqU0hsBWg}{10.42.2.238}{10.42.2.238:9300}{dimr}{shard_indexing_pressure_enabled=true}, {opensearch-cluster-master-2}{7gPHTcwsQRWSFyGdanqG7A}{hOB0HJDaRpWSWCHeqOnbew}{10.42.0.215}{10.42.0.215:9300}{dimr}{shard_indexing_pressure_enabled=true}, {opensearch-cluster-master-1}{J8w8fQiaS3GbnLGo5QoQgQ}{smC4fmwqR6G_ixbfrqGk8Q}{10.42.1.91}{10.42.1.91:9300}{dimr}{shard_indexing_pressure_enabled=true}]; discovery will continue using [10.42.1.91:9300, 10.42.0.215:9300] from hosts providers and [{opensearch-cluster-master-0}{HsYboypaQq-Zk7fipR7Ymw}{PU1iTR7YS_uuhLqU0hsBWg}{10.42.2.238}{10.42.2.238:9300}{dimr}{shard_indexing_pressure_enabled=true}] from last-known cluster state; node term 0, last-accepted version 0 in term 0
[2023-04-13T14:41:58,943][WARN ][o.o.c.c.ClusterFormationFailureHelper] [opensearch-cluster-master-0] cluster-manager not discovered yet, this node has not previously joined a bootstrapped cluster, and [cluster.initial_cluster_manager_nodes] is empty on this node: have discovered [{opensearch-cluster-master-0}{HsYboypaQq-Zk7fipR7Ymw}{PU1iTR7YS_uuhLqU0hsBWg}{10.42.2.238}{10.42.2.238:9300}{dimr}{shard_indexing_pressure_enabled=true}, {opensearch-cluster-master-2}{7gPHTcwsQRWSFyGdanqG7A}{hOB0HJDaRpWSWCHeqOnbew}{10.42.0.215}{10.42.0.215:9300}{dimr}{shard_indexing_pressure_enabled=true}, {opensearch-cluster-master-1}{J8w8fQiaS3GbnLGo5QoQgQ}{smC4fmwqR6G_ixbfrqGk8Q}{10.42.1.91}{10.42.1.91:9300}{dimr}{shard_indexing_pressure_enabled=true}]; discovery will continue using [10.42.1.91:9300, 10.42.0.215:9300] from hosts providers and [{opensearch-cluster-master-0}{HsYboypaQq-Zk7fipR7Ymw}{PU1iTR7YS_uuhLqU0hsBWg}{10.42.2.238}{10.42.2.238:9300}{dimr}{shard_indexing_pressure_enabled=true}] from last-known cluster state; node term 0, last-accepted version 0 in term 0

and repeating…

pod: opensearch-cluster-master-1: similar msgs with secruityadmin msg

[2023-04-13T14:44:08,053][ERROR][o.o.s.a.BackendRegistry ] [opensearch-cluster-master-1] Not yet initialized (you may need to run securityadmin)
[2023-04-13T14:44:08,056][ERROR][o.o.s.a.BackendRegistry ] [opensearch-cluster-master-1] Not yet initialized (you may need to run securityadmin)
[2023-04-13T14:44:09,748][WARN ][o.o.c.c.ClusterFormationFailureHelper] [opensearch-cluster-master-1] cluster-manager not discovered yet, this node has not previously joined a bootstrapped cluster, and [cluster.initial_cluster_manager_nodes] is empty on this node: have discovered [{opensearch-cluster-master-1}{J8w8fQiaS3GbnLGo5QoQgQ}{smC4fmwqR6G_ixbfrqGk8Q}{10.42.1.91}{10.42.1.91:9300}{dimr}{shard_indexing_pressure_enabled=true}, {opensearch-cluster-master-2}{7gPHTcwsQRWSFyGdanqG7A}{hOB0HJDaRpWSWCHeqOnbew}{10.42.0.215}{10.42.0.215:9300}{dimr}{shard_indexing_pressure_enabled=true}, {opensearch-cluster-master-0}{HsYboypaQq-Zk7fipR7Ymw}{PU1iTR7YS_uuhLqU0hsBWg}{10.42.2.238}{10.42.2.238:9300}{dimr}{shard_indexing_pressure_enabled=true}]; discovery will continue using [10.42.2.238:9300, 10.42.0.215:9300] from hosts providers and [{opensearch-cluster-master-1}{J8w8fQiaS3GbnLGo5QoQgQ}{smC4fmwqR6G_ixbfrqGk8Q}{10.42.1.91}{10.42.1.91:9300}{dimr}{shard_indexing_pressure_enabled=true}] from last-known cluster state; node term 0, last-accepted version 0 in term 0
[2023-04-13T14:44:10,102][ERROR][o.o.s.a.BackendRegistry ] [opensearch-cluster-master-1] Not yet initialized (you may need to run securityadmin)

same on opensearch-cluster-master-2

Configuration:

opensearch.yml ( as configmap):

cluster.name: opensearch-cluster
network.host: 0.0.0.0
plugins:
security:
audit.type: internal_opensearch
audit.config.index: security-auditlog
authcz.admin_dn:
- CN=admin,OU=X,O=X,L=X,ST=X,C=X
nodes_dn:
- CN=opensearch-cluster-master-,OU=X,O=X,L=X,ST=X,C=X
restapi.roles_enabled: [“all_access”, “security_rest_api_access”]
ssl:
http:
enabled: true
pemcert_filepath: tls-http/tls.crt
pemkey_filepath: tls-http/tls.key
pemtrustedcas_filepath: tls-transport/ca.crt
transport:
enforce_hostname_verification: false
pemcert_filepath: tls-transport/${HOSTNAME}.crt
pemkey_filepath: tls-transport/${HOSTNAME}.key
pemtrustedcas_filepath: tls-transport/ca.crt
allow_unsafe_democertificates: false
allow_default_init_securityindex: false
system_indices.enabled: true
system_indices.indices: [“.opendistro-alerting-config”, ".opendistro-alerting-alert
", “.opendistro-anomaly-results*”, “.opendistro-anomaly-detector*”, “.opendistro-anomaly-checkpoints”, “.opendistro-anomaly-detection-state”, “.opendistro-reports-", ".opendistro-notifications-”, “.opendistro-notebooks”, “.opendistro-asynchronous-search-response*”]
gateway.auto_import_dangling_indices: true

in path tls-http/transport i have signed certificate with own CA,

from pod0:

[opensearch@opensearch-cluster-master-0 ~]$ curl -XGET https://localhost:9200 -u 'admin:xxx' --insecure
OpenSearch Security not initialized.
[opensearch@opensearch-cluster-master-0 ~]$ /usr/share/opensearch/plugins/opensearch-security/tools/securityadmin.sh -cacert /usr/share/opensearch/config/tls-http/ca.crt -cert /usr/share/opensearch/config/tls-http/admin.crt -key /usr/share/opensearch/config/tls-http/admin.key -cd -cd  /usr/share/opensearch/config/opensearch-security/ -h opensearch-cluster-master
**************************************************************************
** This tool will be deprecated in the next major release of OpenSearch **
** https://github.com/opensearch-project/security/issues/1755           **
**************************************************************************
Security Admin v7
Will connect to opensearch-cluster-master:9200 ... done
Connected as "CN=admin,OU=X,O=X,L=X,ST=X,C=X"
OpenSearch Version: 2.6.0
Contacting opensearch cluster 'opensearch' and wait for YELLOW clusterstate ...
Cannot retrieve cluster state due to: 30,000 milliseconds timeout on connection http-outgoing-2 [ACTIVE]. This is not an error, will keep on trying ...
  Root cause: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-2 [ACTIVE] (java.net.SocketTimeoutException/java.net.SocketTimeoutException)
   * Try running securityadmin.sh with -icl (but no -cl) and -nhnv (If that works you need to check your clustername as well as hostnames in your TLS certificates)
   * Make sure that your keystore or PEM certificate is a client certificate (not a node certificate) and configured properly in opensearch.yml
   * If this is not working, try running securityadmin.sh with --diagnose and see diagnose trace log file)
   * Add --accept-red-cluster to allow securityadmin to operate on a red cluster.

logs from pod opensearch-dasboard:

{"type":"log","@timestamp":"2023-04-13T14:50:36Z","tags":["info","plugins-system"],"pid":1,"message":"Setting up [50] plugins: [alertingDashboards,usageCollection,opensearchDashboardsUsageCollection,opensearchDashboardsLegacy,mapsLegacy,dataSource,share,opensearchUiShared,legacyExport,embeddable,expressions,data,securityAnalyticsDashboards,home,console,apmOss,management,indexPatternManagement,advancedSettings,savedObjects,reportsDashboards,indexManagementDashboards,anomalyDetectionDashboards,dashboard,visualizations,visTypeVega,visTypeTimeline,timeline,visTypeTable,visTypeMarkdown,visBuilder,tileMap,regionMap,customImportMapDashboards,inputControlVis,ganttChartDashboards,visualize,searchRelevanceDashboards,queryWorkbenchDashboards,notificationsDashboards,charts,visTypeVislib,visTypeTimeseries,visTypeTagcloud,visTypeMetric,observabilityDashboards,discover,savedObjectsManagement,securityDashboards,bfetch]"}
{"type":"log","@timestamp":"2023-04-13T14:50:36Z","tags":["info","plugins","dataSource","data-source-service"],"pid":1,"message":"Created data source client pool of size 5"}
{"type":"log","@timestamp":"2023-04-13T14:50:36Z","tags":["info","plugins","dataSource","data-source-service"],"pid":1,"message":"Created data source aws client pool of size 5"}
{"type":"log","@timestamp":"2023-04-13T14:50:36Z","tags":["info","plugins","dataSource","data-source-service","legacy"],"pid":1,"message":"Created data source client pool of size 5"}
{"type":"log","@timestamp":"2023-04-13T14:50:36Z","tags":["info","plugins","dataSource","data-source-service","legacy"],"pid":1,"message":"Created data source aws client pool of size 5"}
{"type":"log","@timestamp":"2023-04-13T14:50:37Z","tags":["info","savedobjects-service"],"pid":1,"message":"Waiting until all OpenSearch nodes are compatible with OpenSearch Dashboards before starting saved objects migrations..."}
{"type":"log","@timestamp":"2023-04-13T14:50:37Z","tags":["error","opensearch","data"],"pid":1,"message":"[ResponseError]: Response Error"}
{"type":"log","@timestamp":"2023-04-13T14:50:37Z","tags":["error","savedobjects-service"],"pid":1,"message":"Unable to retrieve version information from OpenSearch nodes."}
{"type":"log","@timestamp":"2023-04-13T14:50:39Z","tags":["error","opensearch","data"],"pid":1,"message":"[ResponseError]: Response Error"}
{"type":"log","@timestamp":"2023-04-13T14:50:42Z","tags":["error","opensearch","data"],"pid":1,"message":"[ResponseError]: Response Error"}
{"type":"log","@timestamp":"2023-04-13T14:50:44Z","tags":["error","opensearch","data"],"pid":1,"message":"[ResponseError]: Response Error"}
{"type":"log","@timestamp":"2023-04-13T14:50:47Z","tags":["error","opensearch","data"],"pid":1,"message":"[ResponseError]: Response Error"}

logs from pod logstash:

:exception=>LogStash::Outputs::OpenSearch::HttpClient::Pool::BadResponseCodeError, :message=>"Got response code '503' contacting OpenSearch at URL 'https://opensearch-cluster-master:9200/'"}
[2023-04-13T14:51:15,697][WARN ][logstash.outputs.opensearch][main] Attempted to resurrect connection to dead OpenSearch instance, but got an error {:url=>"https://opensearch-cluster-master:9200/", :exception=>LogStash::Outputs::OpenSearch::HttpClient::Pool::BadResponseCodeError, :message=>"Got response code '503' contacting OpenSearch at URL 'https://opensearch-cluster-master:9200/'"}
[2023-04-13T14:51:20,701][WARN ][logstash.outputs.opensearch][main] Attempted to resurrect connection to dead OpenSearch instance, but got an error {:url=>"https://opensearch-cluster-master:9200/", :exception=>LogStash::Outputs::OpenSearch::HttpClient::Pool::BadResponseCodeError, :message=>"Got response code '503' contacting OpenSearch at URL 'https://opensearch-cluster-master:9200/'"}
[2023-04-13T14:51:25,706][WARN ][logstash.outputs.opensearch][main] Attempted to resurrect connection to dead OpenSearch instance, but got an error {:url=>"https://opensearch-cluster-master:9200/", :exception=>LogStash::Outputs::OpenSearch::HttpClient::Pool::BadResponseCodeError, :message=>"Got response code '503' contacting OpenSearch at URL 'https://opensearch-cluster-master:9200/'"}

security config → config.yml:

_meta:
  type: "config"
  config_version: 2
config:
  dynamic:
    http:
      anonymous_auth_enabled: false
    authc:
      basic_internal_auth_domain:
        description: "Authenticate via local DB"
        http_enabled: true
        transport_enabled: true
        order: "0"
        http_authenticator:
          type: basic
          challenge: false
        authentication_backend:
          type: internal

do you know whats going on here ? i suspect wrong certificates but they were recreated and no indications in logs…

Thanks a lot for any hint…

well,

pods does not communicate between themself, ( according this )

so i added:

cluster.initial_master_nodes:
  - opensearch-cluster-master-0
  - opensearch-cluster-master-1
  - opensearch-cluster-master-2

into opensearch.yml, then rollout statefullset,
then, directly in pod opensearch-cluster-master-0 and made “indexing” as in the log was errors regarding indexing…:

/usr/share/opensearch/plugins/opensearch-security/tools/securityadmin.sh -cacert /usr/share/opensearch/config/tls-http/ca.crt -cert /usr/share/opensearch/config/tls-http/admin.crt -key /usr/share/opensearch/config/tls-http/admin.key -cd /usr/share/opensearch/config/opensearch-security/ -h opensearch-cluster-master

after, UI opensearch-dashboard started working and command:

[opensearch@opensearch-cluster-master-0 ~]$ curl -XGET https://localhost:9200/_cluster/health?pretty=true -u 'admin:xxxx' --insecure
{
  "cluster_name" : "opensearch-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "discovered_master" : true,
  "discovered_cluster_manager" : true,
  "active_primary_shards" : 4,
  "active_shards" : 10,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

return what i want to.