Cluster with 3 nodes with multiple private IP (transport_address error ?)

Hi, I got a problem to create a cluster with 3 nodes

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Opensearch 2.13.0

Describe the issue:
I got 3 nodes. Each node have 2 private IP linked with the other node.

schema-node

Each node need to use a different IP to reach the other node. Here is my configuration for each node (resume)

Node 1

# Ecouter sur les 3 interfaces
network.bind_host: ['XX.XX.1.1', 'XX.XX.3.2', 'other_private_address']
network.publish_host: ['XX.XX.1.1', 'XX.XX.3.2']

ht tp.bind_host:  ['XX.XX.1.1', 'XX.XX.3.2', 'other_private_address']
ht tp.publish_host: ['XX.XX.1.1', 'XX.XX.3.2']

transport.bind_host:  ['XX.XX.1.1', 'XX.XX.3.2', 'other_private_address']
transport.publish_host: ['XX.XX.1.1', 'XX.XX.3.2']

# Nommer le cluster
cluster.name: opensearch-xxx-cluster

# Node name
node.name: xxx01.xxx

# Discovery host to automaticly join the cluster
discovery.seed_hosts: ['XX.XX.1.2', 'XX.XX.3.1']
cluster.initial_cluster_manager_nodes: ['xxx01.xxx']

# Path to directory where to store the data (separate multiple locations by comma):
path.data: /var/lib/opensearch

# Path to log files:
path.logs: /var/log/opensearch

##### Certificates

plugins.security....

##### Tuning

# Disable JVM heap memory swapping
bootstrap.memory_lock: true

When I start the Node 1, everything is OK I can curl my cluster and check that the node is OK.
When I start the Node 2, it works and I can see that the node join the cluster.
When I start the Node 3, it doesn’t work and log this error.

[2024-05-16T11:22:09,106][WARN ][o.o.d.HandshakingTransportAddressConnector] [xxx03.xxx] [connectToRemoteMasterNode[XX.XX.2.1:9300]] completed handshake with [{xxx02.xxx}{X0GTmdd1R-Wvvo2vFAyYPg}{wRvQxxlUTC-SNHnWFte65g}{XX.XX.1.2}{XX.XX.1.2:9300}{dimr}{shard_indexing_pressure_enabled=true}] but followup connection failed

[2024-05-16T11:22:09,105][WARN ][o.o.d.HandshakingTransportAddressConnector] [xxx03.xxx] [connectToRemoteMasterNode[XX.XX.3.2:9300]] completed handshake with [{xxx01.xxx}{B48Chv2EQbqpdhXHVfNaog}{9UPPR7-jT-auWWtiEfadIA}{XX.XX.1.1}{XX.XX.1.1:9300}{dimr}{shard_indexing_pressure_enabled=true}] but followup connection failed

So what I supposed is that the Node 3 take the transport_address for each node (2 / 3) to try connexion and failed because there transport_address are respectivily define with XX.XX.1.1 and XX.XX.1.2

My question is, what I’m doing wrong ? Why is the transport_address define with the first IP address in the array (I guess from transport.publish_host) whereas I define an array and not a unique IP ?

I checked the transport_address with a curl request to display node information

    "ID": {
      "name": "xxx02.xxx",
      "transport_address": "XX.XX.1.2:9300",
      "host": "XX.XX.1.2",
      "ip": "XX.XX.1.2",
      "version": "2.13.0",
      "build_type": "deb",
      "build_hash": "7ec678d1b7c87d6e779fdef94e33623e1f1e2647",
      "total_indexing_buffer": 13314398617,

Thanks for you help !

Hi @Kayoku ,

Why do you use 2 private IPs for each node?

Hi,

These servers will receive a lot of data and all subnet/vlan of each server are separated. But to me it’s not really the point here, I don’t understand why we can give a list to OpenSearch if it just take one of the IP ^^

Ensure each node has unique IP configurations for transport, bind, and publish hosts to avoid connection conflicts in your cluster setup.

Hi,

Thanks for your answer but like I said before, we need to have the two IP one each server.
As I understand with your answer, it doesn’t seems possible to handle this with multiple IP ?