Environment:
- Product: Wazuh Indexer (based on OpenSearch 2.6.0)
- Cluster Setup: 3 nodes (e.g., indexer01.example.com, indexer02.example.com, indexer03.example.com)
- Issue: Nodes are unable to form a cluster due to SSL handshake failures on the transport layer (port 9300). The openssl s_client test consistently shows the server presenting an old, expired CA certificate as the issuer of its new node certificate.
- Authentication: Using PEM-based certificates for nodes, admin access, and CA.
Problem Description:
Despite a complete regeneration of our internal PKI (Root CA, Node Certificates for all indexers, Admin Certificate) and meticulous reconfiguration, all Wazuh Indexer nodes exhibit the following behavior when tested with openssl s_client -connect <node_ip_or_localhost>:9300:
- depth=0 (Node Certificate): Shows the NEW node certificate with its correct future expiry date (e.g., June 2026).
- depth=1 (Issuer CA): Shows the correct Subject Distinguished Name (DN) for our Root CA (e.g., CN=rootCA.example.com,…) but with an OLD, EXPIRED notAfter date (e.g., May 17, 2025, which is now in the past). This results in a verify error:num=10:certificate has expired.
- The “Acceptable client certificate CA names” list presented by the server during the SSL handshake also reflects this old CA’s DN.
- This issue occurs on both the transport (9300) and HTTP (9200) layers.
- Due to this, nodes cannot communicate, and the cluster cannot form, leading to ClusterManagerNotDiscoveredException in the logs of nodes trying to join.
Certificates Details:
-
NEW Root CA:
-
Subject: CN=rootCA.example.com,… (example DN)
-
Expiry: e.g., May 22, 2026 (VALID)
-
Stored in /etc/wazuh-indexer/certs/root-ca.pem (contains only this single new CA certificate).
-
NEW Node Certificates (e.g., for indexer01.example.com):
-
Subject: CN=indexer01.example.com,…
-
Expiry: e.g., June 28, 2026 (VALID)
-
Issuer: CN=rootCA.example.com,… (Signed by the NEW Root CA)
-
Stored in /etc/wazuh-indexer/certs/indexer.pem.
-
Includes correct SANs (DNS name and IP address).
-
NEW Admin Certificate:
-
Subject: CN=admin,…
-
Expiry: e.g., May 26, 2027 (VALID)
-
Issuer: CN=rootCA.example.com,… (Signed by the NEW Root CA)
-
Stored in /etc/wazuh-indexer/certs/admin.pem.
-
Admin private key converted to PKCS#8.
Troubleshooting Steps Performed (Exhaustively):
- Complete Certificate Regeneration: All certificates (Root CA, all Node certs, Admin cert) were generated fresh using XCA and OpenSSL, ensuring they are signed by the NEW Root CA.
- Verified Certificate Content:
- openssl x509 -in <cert_path> -noout -text -purpose used on all certs to confirm Subject, Issuer, Validity, Key Usage, Extended Key Usage, and SANs are correct.
- Confirmed Root CA private key matches its certificate.
- Confirmed /etc/wazuh-indexer/certs/root-ca.pem contains only the single, new Root CA certificate (grep -c – “-----BEGIN CERTIFICATE-----” output is 1).
- Confirmed node .pem files contain only the single node certificate.
- Cleaned Certificate Directories: Old certificate directories were backed up and removed. New directories created with only the new certificates.
- Corrected opensearch.yml Configuration:
- Paths plugins.security.ssl.[http|transport].pem[cert|key|trustedcas]_filepath correctly point to the new certificate files on each node.
- plugins.security.nodes_dn correctly lists the Subject DNs of all new node certificates.
- plugins.security.authcz.admin_dn correctly lists the Subject DN of the new admin certificate.
- Permissions for opensearch.yml and all certificate files/directories are correct for the wazuh-indexer user.
- Checked Bundled JDK cacerts:
- The Wazuh Indexer uses a bundled JDK (/usr/share/wazuh-indexer/jdk/).
- Its lib/security/cacerts file was inspected and found NOT to contain any old/conflicting versions of our Root CA.
- The NEW Root CA was imported into this bundled cacerts as a precaution.
- Verified OpenSearch Process:
- Confirmed via ps auxww that the wazuh-indexer Java process uses the correct configuration path (-Dopensearch.path.conf=/etc/wazuh-indexer) and bundled JDK.
- No Java system properties overriding SSL truststores/keystores were found in the process arguments.
- The environment file /etc/sysconfig/wazuh-indexer does not exist, so no overrides from there.
- Single-Node Bootstrap and securityadmin.sh:
- Stopped all other indexer nodes.
- Configured indexer01 for single-node operation (cluster.initial_master_nodes to itself, discovery.seed_hosts to 127.0.0.1).
- Started indexer01.
- Successfully ran securityadmin.sh -cd /etc/wazuh-indexer/opensearch-security/ … -icl -nhnv -h 127.0.0.1 --accept-red-cluster. The tool connected using the new admin/CA certs and reported “Done with success,” updating all 10 configuration types.
- Attempted securityadmin.sh … -rl (reload and flush caches), which also reported success after resolving an initial hostname verification issue by targeting the node’s IP and including -nhnv.
- Result After All Above Steps: The openssl s_client -connect 127.0.0.1:9300 (and to IP) against the isolated indexer01 still shows the depth=0 node certificate as NEW, but the depth=1 issuer CA with the OLD expiry date and “certificate has expired” error. The “Acceptable client certificate CA names” also reflects this old CA DN. This behavior is consistent if other nodes are brought up with their new certs.
- Checked for API to force reload certs: /plugins/_security/api/ssl/transport/reloadcerts returned “no handler found.”
- Security Plugin Health: /_plugins/_security/health reports {“status”:“UP”} when indexer01 is running as a single node.
Current Hypothesis:
The issue seems to be a very persistent internal state or cache within OpenSearch/Security Plugin related to the Distinguished Name CN=rootCA.soc.terraeagle.local. It appears to be incorrectly associating this DN with the old CA’s key/expiry data for the transport layer’s SSL context, even when all file-based configurations point exclusively to the new CA.
Request for Help:
- Has anyone encountered such behavior where OpenSearch (specifically version 2.6.0 or similar) presents old CA information in its SSL chain despite all certificate files and configurations being updated to use a new CA with the same Subject DN?
- Are there any known deeper caching mechanisms for SSL/CA information in the OpenSearch Security plugin that are not cleared by securityadmin.sh -rl or service restarts?
- Are there any specific diagnostic steps, beyond general Java SSL debugging, that can pinpoint where OpenSearch is retrieving this old CA information for the transport layer?
- Could this be a bug, and are there any known workarounds?
Any insights or suggestions would be greatly appreciated, as we have exhausted standard troubleshooting procedures. We have production data on this cluster and are looking for solutions that avoid data loss.
Thank you!