Apparent deadlock scenario upon bootstrap

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

operator 2.8.0

opensearch 3.3.2

security plugin 3.3.2

various other plugins 3.3.2

Describe the issue:

While attempting to spin up a cluster, I noticed that OpenSearch k8s-operator (the operator) first stands up a bootstrap pod using the same target image for opensearch. In my case, I target a custom image with security plugin enabled.

During the bootstrap process, I observe what seems like a deadlock scenario:

  1. Bootstrap node will crashloop attempting to read from security index: [2026-01-21T05:33:07,396][ERROR][o.o.s.c.ConfigurationLoaderSecurity7] [test-bootstrap-0] Failure no such index [.opendistro_security] retrieving configuration for [ACTIONGROUPS, ALLOWLIST, AUDIT, CONFIG, INTERNALUSERS, NODESDN, ROLES, ROLESMAPPING, TENANTS] (index=.opendistro_security)
  2. As a result, the service endpoints never become available:

blah@iad8a-rb35-10a:~$ kubectl -n usp-opensearch-stage describe endpoints test
Name: test
Namespace: usp-opensearch-stage
Labels: opster.io/opensearch-cluster=test
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2026-01-21T05:55:17Z
Subsets:
Addresses: <none>
NotReadyAddresses: 10.166.170.102, blah
^ bootstrap ^
Ports:
Name Port Protocol
---- ---- --------
metrics 9600 TCP
rca 9650 TCP
http 9200 TCP
transport 9300 TCP

Events: <none>

  1. On the other hand, the job that is supposed to initialize the security index will hang waiting for the cluster to become available (which never does)

blah@iad8a-rb35-10a:~$ kubectl --namespace usp-opensearch-stage logs test-securityconfig-update-vzfzq -c updater
Waiting to connect to the cluster
Waiting to connect to the cluster
Waiting to connect to the cluster
Waiting to connect to the cluster
Waiting to connect to the cluster
Waiting to connect to the cluster

I want to understand whether my observations are correct and that this is indeed a deadlock. Additionally, I want to know if this is a common enough problem that a best-practice workaround exists?

@jnh_dbx Did securityconfig-update script ever completed the task? The logic of this pod is to wait until all nodes are up including data nodes and then perform initialization of the security plugin. This will create a .opendistro-security index that holds security plugin configuration.

The message is expected and it is not a deadlock. When the healthcheck is executed against each node and security plugin is not fully operational then this error will appear.

It should go away once the cluster is fully formed and security plugin initialized.