Cluster with 3 nodes with multiple private IP (transport_address error ?)

Kayoku · May 16, 2024, 10:11am

Hi, I got a problem to create a cluster with 3 nodes

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Opensearch 2.13.0

Describe the issue:
I got 3 nodes. Each node have 2 private IP linked with the other node.

schema-node

Each node need to use a different IP to reach the other node. Here is my configuration for each node (resume)

Node 1

# Ecouter sur les 3 interfaces
network.bind_host: ['XX.XX.1.1', 'XX.XX.3.2', 'other_private_address']
network.publish_host: ['XX.XX.1.1', 'XX.XX.3.2']

ht tp.bind_host:  ['XX.XX.1.1', 'XX.XX.3.2', 'other_private_address']
ht tp.publish_host: ['XX.XX.1.1', 'XX.XX.3.2']

transport.bind_host:  ['XX.XX.1.1', 'XX.XX.3.2', 'other_private_address']
transport.publish_host: ['XX.XX.1.1', 'XX.XX.3.2']

# Nommer le cluster
cluster.name: opensearch-xxx-cluster

# Node name
node.name: xxx01.xxx

# Discovery host to automaticly join the cluster
discovery.seed_hosts: ['XX.XX.1.2', 'XX.XX.3.1']
cluster.initial_cluster_manager_nodes: ['xxx01.xxx']

# Path to directory where to store the data (separate multiple locations by comma):
path.data: /var/lib/opensearch

# Path to log files:
path.logs: /var/log/opensearch

##### Certificates

plugins.security....

##### Tuning

# Disable JVM heap memory swapping
bootstrap.memory_lock: true

When I start the Node 1, everything is OK I can curl my cluster and check that the node is OK.
When I start the Node 2, it works and I can see that the node join the cluster.
When I start the Node 3, it doesn’t work and log this error.

[2024-05-16T11:22:09,106][WARN ][o.o.d.HandshakingTransportAddressConnector] [xxx03.xxx] [connectToRemoteMasterNode[XX.XX.2.1:9300]] completed handshake with [{xxx02.xxx}{X0GTmdd1R-Wvvo2vFAyYPg}{wRvQxxlUTC-SNHnWFte65g}{XX.XX.1.2}{XX.XX.1.2:9300}{dimr}{shard_indexing_pressure_enabled=true}] but followup connection failed

[2024-05-16T11:22:09,105][WARN ][o.o.d.HandshakingTransportAddressConnector] [xxx03.xxx] [connectToRemoteMasterNode[XX.XX.3.2:9300]] completed handshake with [{xxx01.xxx}{B48Chv2EQbqpdhXHVfNaog}{9UPPR7-jT-auWWtiEfadIA}{XX.XX.1.1}{XX.XX.1.1:9300}{dimr}{shard_indexing_pressure_enabled=true}] but followup connection failed

So what I supposed is that the Node 3 take the transport_address for each node (2 / 3) to try connexion and failed because there transport_address are respectivily define with XX.XX.1.1 and XX.XX.1.2

My question is, what I’m doing wrong ? Why is the transport_address define with the first IP address in the array (I guess from transport.publish_host) whereas I define an array and not a unique IP ?

I checked the transport_address with a curl request to display node information

    "ID": {
      "name": "xxx02.xxx",
      "transport_address": "XX.XX.1.2:9300",
      "host": "XX.XX.1.2",
      "ip": "XX.XX.1.2",
      "version": "2.13.0",
      "build_type": "deb",
      "build_hash": "7ec678d1b7c87d6e779fdef94e33623e1f1e2647",
      "total_indexing_buffer": 13314398617,

Thanks for you help !

Eugene7 · May 20, 2024, 4:04pm

Hi @Kayoku ,

Why do you use 2 private IPs for each node?

Kayoku · May 21, 2024, 7:26am

Hi,

These servers will receive a lot of data and all subnet/vlan of each server are separated. But to me it’s not really the point here, I don’t understand why we can give a list to OpenSearch if it just take one of the IP ^^

KateWinslet · May 22, 2024, 1:38pm

Kayoku:

Hi, I got a problem to create a cluster with 3 nodes

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Opensearch 2.13.0

Describe the issue:
I got 3 nodes. Each node have 2 private IP linked with the other node.

Each node need to use a different IP to reach the other node. Here is my configuration for each node (resume)

Node 1
# Ecouter sur les 3 interfaces
network.bind_host: ['XX.XX.1.1', 'XX.XX.3.2', 'other_private_address']
network.publish_host: ['XX.XX.1.1', 'XX.XX.3.2']

ht tp.bind_host:  ['XX.XX.1.1', 'XX.XX.3.2', 'other_private_address']
ht tp.publish_host: ['XX.XX.1.1', 'XX.XX.3.2']

transport.bind_host:  ['XX.XX.1.1', 'XX.XX.3.2', 'other_private_address']
transport.publish_host: ['XX.XX.1.1', 'XX.XX.3.2']

# Nommer le cluster
cluster.name: opensearch-xxx-cluster

# Node name
node.name: xxx01.xxx

# Discovery host to automaticly join the cluster
discovery.seed_hosts: ['XX.XX.1.2', 'XX.XX.3.1']
cluster.initial_cluster_manager_nodes: ['xxx01.xxx']

# Path to directory where to store the data (separate multiple locations by comma):
path.data: /var/lib/opensearch

# Path to log files:
path.logs: /var/log/opensearch

##### Certificates

plugins.security....

##### Tuning

# Disable JVM heap memory swapping
bootstrap.memory_lock: true
When I start the Node 1, everything is OK I can curl my cluster and check that the node is OK.
When I start the Node 2, it works and I can see that the node join the cluster.
When I start the Node 3, it doesn’t work and log this error.

[2024-05-16T11:22:09,106][WARN ][o.o.d.HandshakingTransportAddressConnector] [xxx03.xxx] [connectToRemoteMasterNode[XX.XX.2.1:9300]] completed handshake with [{xxx02.xxx}{X0GTmdd1R-Wvvo2vFAyYPg}{wRvQxxlUTC-SNHnWFte65g}{XX.XX.1.2}{XX.XX.1.2:9300}{dimr}{shard_indexing_pressure_enabled=true}] but followup connection failed

[2024-05-16T11:22:09,105][WARN ][o.o.d.HandshakingTransportAddressConnector] [xxx03.xxx] [connectToRemoteMasterNode[XX.XX.3.2:9300]] completed handshake with [{xxx01.xxx}{B48Chv2EQbqpdhXHVfNaog}{9UPPR7-jT-auWWtiEfadIA}{XX.XX.1.1}{XX.XX.1.1:9300}{dimr}{shard_indexing_pressure_enabled=true}] but followup connection failed

So what I supposed is that the Node 3 take the transport_address for each node (2 / 3) to try connexion and failed because there transport_address are respectivily define with XX.XX.1.1 and XX.XX.1.2

My question is, what I’m doing wrong ? Why is the transport_address define with the first IP address in the array (I guess from transport.publish_host) whereas I define an array and not a unique IP ?

I checked the transport_address with a curl request to display node information
    "ID": {
      "name": "xxx02.xxx",
      "transport_address": "XX.XX.1.2:9300",
      "host": "XX.XX.1.2",
      "ip": "XX.XX.1.2",
      "version": "2.13.0",
      "build_type": "deb",
      "build_hash": "7ec678d1b7c87d6e779fdef94e33623e1f1e2647",
      "total_indexing_buffer": 13314398617,
Thanks for you help !

Ensure each node has unique IP configurations for transport, bind, and publish hosts to avoid connection conflicts in your cluster setup.

Kayoku · May 22, 2024, 2:41pm

Hi,

Thanks for your answer but like I said before, we need to have the two IP one each server.
As I understand with your answer, it doesn’t seems possible to handle this with multiple IP ?

Eugene7 · June 5, 2024, 10:35am

Hi @Kayoku ,

Please check the traffic between the nodes. It’s possible that, for example, node1 will use both connections to access node2. Try to either block the traffic between the nodes or disconnect node1 and node2. This will break the loop and create a daisy chain.

Topic		Replies	Views
Trying to create a Opensearch Public Cluster using 3 Public IP Addresses is Failing Community troubleshoot , install	10	933	September 26, 2023
How is defined "transport_address"? OpenSearch troubleshoot	4	225	June 28, 2024
Opensearch cluster: Node don't see the cluster OpenSearch troubleshoot , configure	10	1111	April 17, 2024
OpenSearch: cluster nodes bound to localhost? OpenSearch	1	1234	July 1, 2022
Transport.publish_host option doesn't work as expected General Feedback troubleshoot , configure	1	1644	February 1, 2023

Cluster with 3 nodes with multiple private IP (transport_address error ?)

Related topics