Version
2.17.1
Plugins installed:
all-nodes opensearch-alerting 2.17.1.0
all-nodes opensearch-anomaly-detection 2.17.1.0
all-nodes opensearch-asynchronous-search 2.17.1.0
all-nodes opensearch-cross-cluster-replication 2.17.1.0
all-nodes opensearch-custom-codecs 2.17.1.0
all-nodes opensearch-flow-framework 2.17.1.0
all-nodes opensearch-geospatial 2.17.1.0
all-nodes opensearch-index-management 2.17.1.0
all-nodes opensearch-job-scheduler 2.17.1.0
all-nodes opensearch-knn 2.17.1.0
all-nodes opensearch-ml 2.17.1.0
all-nodes opensearch-neural-search 2.17.1.0
all-nodes opensearch-notifications 2.17.1.0
all-nodes opensearch-notifications-core 2.17.1.0
all-nodes opensearch-observability 2.17.1.0
all-nodes opensearch-performance-analyzer 2.17.1.0
all-nodes opensearch-reports-scheduler 2.17.1.0
all-nodes opensearch-security 2.17.1.0
all-nodes opensearch-security-analytics 2.17.1.0
all-nodes opensearch-skills 2.17.1.0
all-nodes opensearch-sql 2.17.1.0
all-nodes opensearch-system-templates 2.17.1.0
all-nodes prometheus-exporter 2.17.1.0
all-nodes query-insights 2.17.1.0
Dear OpenSearch Support,
We are experiencing issues with cross-cluster replication (CCR). We are currently on OpenSearch version 2.17.1.
Cluster Setup
We have two OpenSearch clusters:
- Primary cluster: 15 nodes (3 master, 6 hot, 3 warm, 3 cold)
- Secondary cluster: 5 nodes (1 master, 2 hot, 1 warm, 1 cold)
Each group of node types is on its own VLAN (Master VLAN, Hot VLAN, etc.).
The two clusters are permanently running and reachable over NAT, where:
-
Port
9200
and9300
are open between corresponding node types:- master primar ↔ master secondary
- hot primar ↔ hot secondary
- warm primar ↔ warm secondary
- cold primar ↔ cold secondary
-
NAT maps external IPs for primary nodes (e.g.
nat-master1-ip:9300
) -
Nodes in both clusters use the same internal IPs/hostnames, so NAT is required to resolve between them.
Each node is configured with the same network.publish_host
and network.host
.
Problem Description
We want to replicate data from the primary cluster to the secondary cluster.
We created appropriate users and roles for replication.
We attempted two configurations on the secondary cluster:
1. Seed mode
PUT /_cluster/settings?pretty
{
"persistent" : {
"cluster" : {
"remote" : {
"connection-to-primar" : {
"seeds" : [
"nat-master1-ip:9300",
"nat-master2-ip:9300",
"nat-master3-ip:9300"
],
"transport.compress": true
}
}
}
}
}
→ Result:
"num_nodes_connected": 0,
"max_connections_per_cluster": 3
2. Proxy mode
PUT /_cluster/settings?pretty
{
"persistent": {
"cluster": {
"remote": {
"connection-to-primar-proxy": {
"mode": "proxy",
"proxy_address": "nat-master1-ip:9300",
"transport.compress": true
}
}
}
}
}
→ Result:
"num_proxy_sockets_connected": 18,
"max_proxy_socket_connections": 18
With proxy mode, the connection appears to be established.
Replication Attempt
When we trigger replication, authentication passes, and in logs we see:
- On primary cluster:
Replication setup - Permissions validation successful for Index
- On secondary cluster:
Failed to trigger replication for xxx-test-000006 - ResourceAlreadyExistsException[task with id {replication:index:xxx-test-000006} already exists]
However, the index is not created, and no data is being replicated.
My Main Question
Do all nodes in the secondary cluster need to have direct access to nat-master1-ip:9300
(or other seed/proxy nodes on the leader)?
Or is it sufficient that only the master nodes are connected?
We appreciate your help and any clarification you can provide.
Kind regards,
Vojtech