Cluster with 3 nodes with multiple private IP (transport_address error ?)

Hi, I got a problem to create a cluster with 3 nodes

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Opensearch 2.13.0

Describe the issue:
I got 3 nodes. Each node have 2 private IP linked with the other node.

schema-node

Each node need to use a different IP to reach the other node. Here is my configuration for each node (resume)

Node 1

# Ecouter sur les 3 interfaces
network.bind_host: ['XX.XX.1.1', 'XX.XX.3.2', 'other_private_address']
network.publish_host: ['XX.XX.1.1', 'XX.XX.3.2']

ht tp.bind_host:  ['XX.XX.1.1', 'XX.XX.3.2', 'other_private_address']
ht tp.publish_host: ['XX.XX.1.1', 'XX.XX.3.2']

transport.bind_host:  ['XX.XX.1.1', 'XX.XX.3.2', 'other_private_address']
transport.publish_host: ['XX.XX.1.1', 'XX.XX.3.2']

# Nommer le cluster
cluster.name: opensearch-xxx-cluster

# Node name
node.name: xxx01.xxx

# Discovery host to automaticly join the cluster
discovery.seed_hosts: ['XX.XX.1.2', 'XX.XX.3.1']
cluster.initial_cluster_manager_nodes: ['xxx01.xxx']

# Path to directory where to store the data (separate multiple locations by comma):
path.data: /var/lib/opensearch

# Path to log files:
path.logs: /var/log/opensearch

##### Certificates

plugins.security....

##### Tuning

# Disable JVM heap memory swapping
bootstrap.memory_lock: true

When I start the Node 1, everything is OK I can curl my cluster and check that the node is OK.
When I start the Node 2, it works and I can see that the node join the cluster.
When I start the Node 3, it doesn’t work and log this error.

[2024-05-16T11:22:09,106][WARN ][o.o.d.HandshakingTransportAddressConnector] [xxx03.xxx] [connectToRemoteMasterNode[XX.XX.2.1:9300]] completed handshake with [{xxx02.xxx}{X0GTmdd1R-Wvvo2vFAyYPg}{wRvQxxlUTC-SNHnWFte65g}{XX.XX.1.2}{XX.XX.1.2:9300}{dimr}{shard_indexing_pressure_enabled=true}] but followup connection failed

[2024-05-16T11:22:09,105][WARN ][o.o.d.HandshakingTransportAddressConnector] [xxx03.xxx] [connectToRemoteMasterNode[XX.XX.3.2:9300]] completed handshake with [{xxx01.xxx}{B48Chv2EQbqpdhXHVfNaog}{9UPPR7-jT-auWWtiEfadIA}{XX.XX.1.1}{XX.XX.1.1:9300}{dimr}{shard_indexing_pressure_enabled=true}] but followup connection failed

So what I supposed is that the Node 3 take the transport_address for each node (2 / 3) to try connexion and failed because there transport_address are respectivily define with XX.XX.1.1 and XX.XX.1.2

My question is, what I’m doing wrong ? Why is the transport_address define with the first IP address in the array (I guess from transport.publish_host) whereas I define an array and not a unique IP ?

I checked the transport_address with a curl request to display node information

    "ID": {
      "name": "xxx02.xxx",
      "transport_address": "XX.XX.1.2:9300",
      "host": "XX.XX.1.2",
      "ip": "XX.XX.1.2",
      "version": "2.13.0",
      "build_type": "deb",
      "build_hash": "7ec678d1b7c87d6e779fdef94e33623e1f1e2647",
      "total_indexing_buffer": 13314398617,

Thanks for you help !

Hi @Kayoku ,

Why do you use 2 private IPs for each node?

Hi,

These servers will receive a lot of data and all subnet/vlan of each server are separated. But to me it’s not really the point here, I don’t understand why we can give a list to OpenSearch if it just take one of the IP ^^

Ensure each node has unique IP configurations for transport, bind, and publish hosts to avoid connection conflicts in your cluster setup.

Hi,

Thanks for your answer but like I said before, we need to have the two IP one each server.
As I understand with your answer, it doesn’t seems possible to handle this with multiple IP ?

Hi @Kayoku ,

Please check the traffic between the nodes. It’s possible that, for example, node1 will use both connections to access node2. Try to either block the traffic between the nodes or disconnect node1 and node2. This will break the loop and create a daisy chain.