OpenSearch Operator cluster stuck after TLS cert renewal – security not initialized

Hello,

I’m running OpenSearch on Kubernetes using the OpenSearch Operator and I’m currently stuck after renewing TLS certificates. I’d appreciate confirmation on the correct recovery procedure.

Environment

  • OpenSearch version: 2.11.1

  • Deployment method: OpenSearch Operator (Helm)

  • Kubernetes managed via Rancher

  • TLS certificates generated by the operator (security.tls.http.generate: true, security.tls.transport.generate: true)

  • Security configuration provided via securityconfig-secret

  • LDAP + internal auth configured

  • Multiple node pools: masters (3), datas, coordinators, mls

What happened

  • One of the TLS certificates expired (PKIX / CertificateExpired errors).

  • Pods started failing SSL handshakes and cluster communication.

  • I deleted the expired TLS secrets (http/transport), and the operator correctly regenerated them with valid dates.

  • After that, all OpenSearch pods start but remain in Running (not Ready).

  • Logs on the master show repeatedly:

    Not yet initialized (you may need to run securityadmin)
    ClusterManagerNotDiscoveredException
    
    

Current state

  • opensearch-operator-controller-manager is running fine.

  • The Job opensearch-cluster-securityconfig-update exists but is Completed (1/1) and does not rerun.

  • No OpenSearch pod ever becomes Ready.

  • Cluster health remains unknown.

Important detail
The security configuration is fully defined in securityconfig-secret and referenced in the OpenSearchCluster CR:

security:
  config:
    adminCredentialsSecret: admin-credentials-secret
    securityConfigSecret: securityconfig-secret

No dynamic security changes were made outside this secret.

Question
Is the correct and safe recovery step to:

  1. Delete the completed Job opensearch-cluster-securityconfig-update

  2. Let the operator recreate and rerun it to re-apply the same security configuration

  3. Allow the cluster to initialize again

I want to confirm that:

  • Deleting this Job will not wipe users/roles beyond what is defined in securityconfig-secret

  • This is the expected procedure after TLS renewal when the cluster is no longer initialized

Any confirmation or recommended best practice would be very helpful.

Thanks in advance.

Duplicate of Cluster stuck in "Security not initialized" loop after TLS certificate rotation (2.11.1)

1 Like