Cluster stuck in "Security not initialized" loop after TLS certificate rotation (2.11.1)

v1k1ng0 · January 13, 2026, 2:26pm

Description:

Environment:

OpenSearch Version: 2.11.1
Deployment: OpenSearch Kubernetes Operator
Replicas: 3 Masters (currently trying to recover with 1)

The Issue: My transport and http certificates expired. I attempted to rotate them by deleting the Kubernetes secrets and letting the Operator recreate them. While the secrets were recreated successfully, the cluster is now stuck in a deadlock:

Masters are not Ready: The master pods are running but not “Ready” because the Security Plugin is not initialized.
Quorum Blocked: With only 1 replica active for troubleshooting, the node refuses to elect itself as master because it remembers the old 3-node quorum (requires at least 2 nodes).
Security Initialization Loop: I cannot run securityadmin.sh because the REST API is blocked (Security not initialized), and the script times out because the cluster state is RED/Not Elected.
Circular Dependency: I can’t initialize security because the cluster isn’t up, and the cluster won’t stay up/ready because security isn’t initialized.

What I’ve tried:

Deleting secrets to force certificate regeneration.
Setting discovery.type: single-node (rejected by Operator/Configuration conflicts).
Running securityadmin.sh manually from within the pod (SocketTimeout/Connection refused).

Request: How can I force the Security Plugin to initialize or bypass the quorum check to let securityadmin.sh apply the new certificates to the .opendistro_security index when the cluster is in this state?

synhershko · January 14, 2026, 6:48am

Hi, we are about to release a new operator version. This is one of the issues we got fixed. Will you be able to upgrade your operator version to latest?

v1k1ng0 · January 14, 2026, 9:36am

Thanks for the update.

Before upgrading, I’d like to clarify a couple of points:

When upgrading to the latest OpenSearch Operator, does it also upgrade the OpenSearch cluster version, or only the operator itself?
If the operator allows upgrading OpenSearch, can I upgrade directly from 2.11.1 to the latest version, or are intermediate versions required?
Which exact issue related to this fix matches my situation?

Current state:

Single master pod running but NotReady
No quorum → other masters never start
Security initialization job/script cannot complete because the cluster never becomes healthy

I want to understand whether this operator upgrade is expected to break this deadlock safely in production.

Thanks.

synhershko · January 14, 2026, 11:24am

Without seeing actual logs etc we cannot tell if the deadlock will be fixed for sure. The new operator version due to be released soon (3.0) has fixed many thing that should prevent this from happening in the future in the first place

v1k1ng0 · January 14, 2026, 12:05pm

Thanks for the clarification.

I understand that operator 3.0 mainly prevents this from happening in the future, but my issue is the current production cluster, which is already stuck in a deadlock state.

Before taking any destructive action, could you please advise:

Is there any known recovery procedure to fix an already broken cluster like this without losing existing indexed data?
Can upgrading only the operator help recover a cluster stuck with a single master not ready, or is the fix purely preventive?
Just to confirm: upgrading the operator does not upgrade OpenSearch itself, correct?
What specific logs or outputs would you need from me to better analyze the current situation?

This is a production environment and preserving existing log data is critical.

Thanks in advance.

Anthony · January 15, 2026, 10:17am

@v1k1ng0 have you looked at hot reloading the certificates, see docs

I would also recommend to check if the new certificates were loaded into the pods. Did the masters restart and thats when this happened? Did the CA also expire or only the leaf certificates?

v1k1ng0 · January 15, 2026, 11:32am

Thanks.

Yes, I reviewed the hot-reloading documentation. As far as I understand, hot reload must be enabled before certificate rotation. I did not originally deploy this cluster, and I’m not sure whether hot reload was configured at the time.

What likely happened is:

Transport certificates expired
We deleted/regenerated certificates
The masters were restarted
After restart, the cluster entered the current deadlock state (single master not ready, no quorum)

At this point, hot reload no longer seems applicable since the cluster cannot form and security cannot be initialized.

To clarify:

Only the leaf certificates expired, not the CA
Masters did restart around the time the issue started
Certificates inside the pods are now valid and readable, but the cluster is still stuck

Given the current state (cluster_manager_not_discovered, security not initialized), is there any supported recovery path to bring the cluster back without wiping data, or is this situation unrecoverable once reached?

If you need specific logs or config to assess this, please let me know exactly what to provide.

Topic		Replies	Views
OpenSearch Operator cluster stuck after TLS cert renewal – security not initialized Security	2	44	January 14, 2026
Opensearch cluster : OpenSearch Security not initialized Security	5	1595	December 8, 2024
Transport Client Authentication Error Security troubleshoot , configure	41	675	September 2, 2024
Cluster certificate update OpenSearch	20	324	September 25, 2025
Securityadmin error when initializing the cluster Security discuss , troubleshoot , configure , security-issue	1	825	November 7, 2024

Cluster stuck in "Security not initialized" loop after TLS certificate rotation (2.11.1)

Description:

Related topics