OpenSearch CCR not connecting (proxy mode) between Kubernetes clusters

Versions: Opensearch 3.1.0, Helm Opensearch Operator 2.8.0

Describe the issue:

I’m trying to set up Cross Cluster Replication (CCR) between two OpenSearch clusters running on separate Kubernetes clusters. Both are exposed via LoadBalancer services. The OpenSearch Operator was installed via Helm, but the OpenSearch clusters themselves were deployed using kubectl apply with custom YAML manifests. TLS is enabled and auto-generated (generate: true) on both clusters.

I followed the official documentation from here and adjusted the CN/OU settings in the nodes_dn configuration so that each cluster recognizes the other’s nodes. However, when I try to configure the remote cluster on the follower using this command:

curl -XPUT -k -H 'Content-Type: application/json' -u 'admin:MyPassword' 'https://FOLLOWER-LOADBALANCER-IP:9200/_cluster/settings?pretty' -d '
{
  "persistent": {
    "cluster": {
      "remote": {
        "my-connection-alias": {
          "mode": "proxy",
          "proxy_address": "LEADER-LOADBALANCER-IP:9300"
        }
      }
    }
  }
}'

I get this error:

{
  "my-connection-alias" : {
    "connected" : false,
    "mode" : "proxy",
    "proxy_address" : "LEADER-LOADBALANCER-IP:9300",
    "server_name" : "",
    "num_proxy_sockets_connected" : 0,
    "max_proxy_socket_connections" : 18,
    "initial_connect_timeout" : "30s",
    "skip_unavailable" : false
  }
}

and pod logs give this errors

Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

Caused by: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

I’m stuck at this point and would appreciate any guidance or suggestions.

Configuration:

I installed OpenSearch and OpenSearch Operator on both clusters using the steps below. Because the CN and OU sections weren’t the same, I made the necessary changes to the other cluster’s YAML file. I changed the CN and OU values because I didn’t think it would be appropriate to share them on the forum.

kubectl apply -f my-secrets.yaml -n opensearch

kubectl create namespace opensearch-operator-system

helm install opensearch-operator opensearch-operator/opensearch-operator -n opensearch-operator-system

leader-values.yaml

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: opensearch-leader-cluster
  namespace: opensearch
spec:
  general:
    serviceName: leader-cluster
    version: 3.1.0
    setVMMaxMapCount: true
    additionalConfig:
      plugins.security.nodes_dn: |
        - "CN=follower-cluster-masters-1,OU=follower-cluster"
        - "CN=follower-cluster-nodes-0,OU=follower-cluster

  dashboards:
    enable: true
    opensearchCredentialsSecret:
      name: dashboards-credentials 
    tls:
      enable: true
      generate: true  # Have the operator generate and sign a certificate
    version: 3.1.0
    replicas: 1
    resources:
      requests:
         memory: "515Mi"
         cpu: "200m"
      limits:
         memory: "515Mi"
         cpu: "200m"
    additionalConfig:
      opensearch_security.multitenancy.enabled: "true"
  nodePools:
    - component: masters
      replicas: 2
      diskSize: "5Gi"
      resources:
         requests:
            memory: "1Gi"
            cpu: "500m"
         limits:
            memory: "2Gi"
            cpu: "1000m"
      roles:
        - "cluster_manager"
        - "remote_cluster_client"
    - component: nodes
      replicas: 2
      diskSize: "10Gi"
      resources:
         requests:
            memory: "2Gi"
            cpu: "1005m"
         limits:
            memory: "3Gi"
            cpu: "2000m"
      roles:
        - "data"
        - "remote_cluster_client"
    - component: coordinators
      replicas: 2
      diskSize: "3Gi"
      resources:
         requests:
            memory: "1Gi"
            cpu: "500m"
         limits:
            memory: "2Gi"
            cpu: "1000m"
      roles:
        - "ingest"
        - "remote_cluster_client"
  security:
    config:
      adminCredentialsSecret:
        name: admin-credentials-secret 
      securityConfigSecret:
       name: securityconfig-secret 
    tls:  # Everything related to TLS configuration
      transport:  # Configuration of the transport endpoint
        generate: true  # Have the operator generate and sign certificates
        perNode: true  # Separate certificate per node  
      http:  # Configuration of the http endpoint
          generate: true     

After applying this YAML file, and after all the pods were seen running I used the following command for the LoadBalancer.

kubectl -n opensearch patch svc leader-cluster \
  --type='merge' \
  -p '{
    "spec": {
      "type": "LoadBalancer",
      "ports": [
        {"port": 9200, "targetPort": 9200, "protocol": "TCP", "name": "http"},
        {"port": 9300, "targetPort": 9300, "protocol": "TCP", "name": "transport"},
        {"port": 9600, "targetPort": 9600, "protocol": "TCP", "name": "metrics"},
        {"port": 9650, "targetPort": 9650, "protocol": "TCP", "name": "rca"}
      ]
    }
  }'

follower-values.yaml

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: opensearch-follower-cluster
  namespace: opensearch
spec:
  general:
    serviceName: follower-cluster
    version: 3.1.0
    setVMMaxMapCount: true 
    additionalConfig:   
      plugins.security.nodes_dn: |
        - "CN=opensearch-leader-cluster-masters-0,OU=leader-cluster"
        - "CN=opensearch-leader-cluster-masters-1,OU=leader-cluster"
        - "CN=opensearch-leader-cluster-nodes-0,OU=leader-cluster"
        - "CN=opensearch-leader-cluster-nodes-1,OU=leader-cluster"     
  dashboards:
    enable: true
    opensearchCredentialsSecret:
      name: dashboards-credentials 
    tls:
      enable: true  # Configure TLS
      generate: true  # Have the operator generate and sign a certificate
    version: 3.1.0
    replicas: 1
    resources:
      requests:
         memory: "512Mi"
         cpu: "200m"
      limits:
         memory: "512Mi"
         cpu: "200m"
    additionalConfig:
      opensearch_security.multitenancy.enabled: "true"
  nodePools:
    - component: masters
      replicas: 2
      diskSize: "5Gi"
      resources:
         requests:
            memory: "1Gi"
            cpu: "500m"
         limits:
            memory: "2Gi"
            cpu: "1000m"
      roles:
        - "cluster_manager"
        - "remote_cluster_client"
    - component: nodes
      replicas: 2
      diskSize: "10Gi"
      resources:
         requests:
            memory: "2Gi"
            cpu: "1000m"
         limits:
            memory: "3Gi"
            cpu: "2000m"
      roles:
        - "data"
        - "remote_cluster_client"
    - component: coordinators
      replicas: 2
      diskSize: "3Gi"
      resources:
         requests:
            memory: "1Gi"
            cpu: "500m"
         limits:
            memory: "2Gi"
            cpu: "1000m"
      roles:
        - "ingest"
        - "remote_cluster_client"
  security:
    config:
      adminCredentialsSecret:
        name: admin-credentials-secret  # The secret with the admin credentials for the operator to use
      securityConfigSecret:
       name: securityconfig-secret 
    tls:  # Everything related to TLS configuration
      transport:  # Configuration of the transport endpoint
        generate: true  # Have the operator generate and sign certificates
        perNode: true  # Separate certificate per node  
      http:  # Configuration of the http endpoint
          generate: true      

the remaining parts have the same steps as leader-cluster.

@recoak100 Thank you for the detailed breakdown, can you confirm that you have concatenated the root CA in both clusters? This would be needed to trust the received certificate. So rootCA.pem file in both clusters needs to contain rootCA from cluster1 and cluster2.

@Anthony Thank you so much for your answer. As far as I understand, after setting up the follower and leader clusters, I should use the kubectl get secret -n opensearch command on both clusters.

Then, combine the opensearch-follower-cluster-ca and opensearch-leader-cluster-ca outputs with a command like cat cluster1-root-ca.pem cluster2-root-ca.pem > rootCA.pem. Then, add the root-ca.pem to the additionalVolume of both clusters.

Then, add the following configuration to the values.yaml of both clusters. Frankly, I’m having a hard time with this part. Any help would be great, where should I add it in the yaml?

plugins.security.ssl.transport.truststore_filepath: certs/rootCA.pem
plugins.security.ssl.http.truststore_filepath: certs/rootCA.pem

These are the secrets generated by the leader cluster after deployment:

NAME                                      TYPE                DATA
admin-credentials-secret                  Opaque                2
dashboards-credentials                    Opaque                2
opensearch-leader-cluster-admin-cert        kubernetes.io/tls   
opensearch-leader-cluster-admin-password    Opaque              2
opensearch-leader-cluster-ca                Opaque              2
opensearch-leader-cluster-dashboards-cert   Opaque              3
opensearch-leader-cluster-http-cert         kubernetes.io/tls   3
opensearch-leader-cluster-transport-cert    Opaque              15
securityconfig-secret                     Opaque                8

@Anthony @ccr-devs Do you have any idea?

@recoak100 The configuration should be passed via additionalConfig in the general section:

spec:
  general:
    # ...
    additionalConfig:
      some.config.option: somevalue

These will be loaded as envars into the container, you can check by connecting to containers and running printenv.

@Anthony I’m sorry, I don’t understand what you mean. When I apply with the YAML I provided above, I can see the CN and OU sections from inside the pod using printenv. You told me that Cluster 1 and Cluster 2 should contain the same CA, but I don’t know how to do that. What kind of changes should I make to the YAML?

Here is the Cluster 1 ‘s ( LEADER) printenv output from within the pod:

printenv
discovery.seed_hosts=opensearch-leader-cluster-discovery
plugins.security.nodes_dn=- "CN=opensearch-follow-cluster-coordinators-0,OU=opensearch-follow-cluster"
- "CN=opensearch-follow-cluster-coordinators-1,OU=opensearch-follow-cluster"
- "CN=opensearch-follow-cluster-masters-0,OU=opensearch-follow-cluster"
- "CN=opensearch-follow-cluster-masters-1,OU=opensearch-follow-cluster"
- "CN=opensearch-follow-cluster-nodes-0,OU=opensearch-follow-cluster"
- "CN=opensearch-follow-cluster-nodes-1,OU=opensearch-follow-cluster"

@recoak100 The nodes_dn is not the issue.

If you are using operator and want to setup CCR, I would recommend creating certificates manually and uploading them to the clusters. The details on how to configure this are mentioned here

You can even use the same CA across both clusters or concatenating 2 into 1 if necessary.
You can of course extract from the pods the self-signed certificates generated by the operator and use them with this configuration. I don’t believe there is a feature available that would allow you to upload new CA file. I’d recommend to raise a feature request for this here