Leader certificate validation failed when setting CCR between 2 different K8s clusters

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Dashboard 2.14
Opensearch 2.11

Describe the issue:

I try to setup CCR between 2 K8s clusters; the clusters have Traefik ingress, and I configure the connection with proxy mode.
But when trying to start the replication I get error that the follower cannot connect to the leader, the logs indicate that the leader certificate validation failed (see logs below).
The certificate of the CA that signs the leader Ingress certificate, is provided to the follower via the truststore. Isn’t the truststore used to validate the remote certificate?

Configuration:

Clusters have security enabled with Transport security enabled.
Certificates for transport are self-signed K8s local certificates
The clusters have a Traefik ingress, with a separate ingress point for transport mapped to service on 9300.
The ingress have a certificate signed by a CA. The CA certificate is provided to the follower deployment in the truststore with the

    plugins:
      security:
        disabled: false
        ssl:
          transport:
            enabled: true
            pemcert_filepath: node-certs/node.crt
            pemkey_filepath: node-certs/node.key
            pemtrustedcas_filepath: root-ca-cert/self-signed-ca.crt
            truststore_filepath: /usr/share/opensearch/config/ca-certs/cacerts.jks
            truststore_password: ${TRUSTSTORE_PASSWORD}

The follower deployment has configured the nodes_dn

        nodes_dn:
          - 'CN=opensearch-dev.cluster2.k8s.lab,O=ACME'
          - 'CN=opensearch-master-0'
          - 'CN=opensearch-master-1'
          - 'CN=opensearch-master-2'
          - 'CN=opensearch-dev-ccr.cluster1.k8s.lab'

Replication is configured with

curl -XPUT -k -H 'Content-Type: application/json' -u admin:$ADMINPASS 'https://opensearch-dev.cluster2.k8s.lab/_cluster/settings?pretty' -d '
{
  "persistent": {
    "cluster": {
      "remote": {
        "lab-replication-one-to-two-proxy": {
          "mode": "proxy",
          "proxy_address": "opensearch-dev-ccr.cluster1.k8s.lab:443"
        }
      }
    }
  }
}'

and enabled with

curl -XPUT -k -H 'Content-Type: application/json' -u admin:$ADMINPASS 'https://opensearch-dev.cluster2.k8s.lab/_plugins/_replication/replica-devices/_start?pretty' -d '
{
   "leader_alias": "lab-replication-one-to-two-proxy",
   "leader_index": "devices",
   "use_roles":{
      "leader_cluster_role": "all_access",
      "follower_cluster_role": "all_access"
   }
}'

Relevant Logs or Screenshots:

[2025-01-13T15:55:52,786][WARN ][o.o.t.TcpTransport       ] [opensearch-master-1] exception caught on transport layer [Netty4TcpChannel{localAddress=/10.42.13.132:54172, remoteAddress=opensearch-dev-ccr.cluster1.k8s.lab/10.56.160.122:443}], closing connection
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:499) ~[netty-codec-4.1.100.Final.jar:4.1.100.Final]

Command and error

curl -XPUT -k -H 'Content-Type: application/json' -u admin:$ADMINPASS 'https://opensearch-dev.cluster2.k8s.lab/_plugins/_replication/replica-devices/_start?pretty' -d '
{
   "leader_alias": "lab-replication-one-to-two-proxy",
   "leader_index": "devices",
   "use_roles":{
      "leader_cluster_role": "all_access",
      "follower_cluster_role": "all_access"
   }
}'
{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_state_exception",
        "reason" : "Unable to open any proxy connections to remote cluster [lab-replication-one-to-two-proxy]"
      }
    ],
    "type" : "illegal_state_exception",
    "reason" : "Unable to open any proxy connections to remote cluster [lab-replication-one-to-two-proxy]"
  },
  "status" : 500
}

I managed to make it work.
The issue was with the values provided in nodes_dn in the security definition , the value must match exactly the subject in the certificate, i.e. if the certificate subject includes organizations, countries, etc, these have to be also included in the nodes_dn, otherwise it will not match and it will be “unable to find valid certification path to requested target”, error message which I did not find helpful to pinpoint the exact error I had. A message like “Subject not in allowed nodes’ list” would have been much more helpful.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.