Transport.publish_host option doesn't work as expected

I am going to setup cluster with 3 VDS nodes:

  • Each openserch node is running inside docker container with docker (not host) network with port-forwarding from host’s ip to internal.
  • With usage of tls, so i was forced to expose the publish_hosts settings as hostnames instead of ips, to be able match node name vs CN from cert during tls handshake.
  • With resolving of node names inside docker containers with /etc/hosts, not directly, but by using extra_hosts option of docker-compose.
    I have installed first a single node cluster with this docked-compose config:
services:
  opensearch:
    image: opensearchproject/opensearch:1.2.3
    container_name: opensearch-prod-1
    environment:
      - cluster.name=cluster-prod
      - node.name=opensearch-prod-1
      - network.host=0.0.0.0
      - network.publish_host=opensearch-prod-1
      - http.publish_host=opensearch-prod-1
      - transport.publish_host=opensearch-prod-1
      - transport.bind_host=0.0.0.0
      - transport.publish_port=9300
      - discovery.seed_hosts=opensearch-prod-1
      - cluster.initial_master_nodes=opensearch-prod-1
      - bootstrap.memory_lock=true
      - "OPENSEARCH_JAVA_OPTS=-Xms1024m -Xmx1024m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - ./data:/usr/share/opensearch/data
      - ./tls:/usr/share/opensearch/config/tls:ro
      - ./configs/node-1.yaml:/usr/share/opensearch/config/opensearch.yml:ro
    ports:
      - 9200:9200
      - 9300:9300
    networks:
      - basenet
    extra_hosts:
      - "opensearch-prod-2:192.168.82.32"
      - "opensearch-prod-3:192.168.82.33"

issued matching (to hostname) certs and initialized security configuration with securityadmin.
I have run the node and can see nodes configuration:

curl -s -k -u 'admin:admin' https://localhost:9200/_nodes/_all/settings | jq
{
  "_nodes": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "cluster_name": "cluster-prod",
  "nodes": {
    "Dzy5J4D9TpKizBsbbjQQ2g": {
      "name": "opensearch-prod-1",
      "transport_address": "10.10.1.2:9300",
      "host": "opensearch-prod-1",
      "ip": "10.10.1.2",
      "version": "1.2.3",
      "build_type": "tar",
      "build_hash": "8a529d77c7432bc45b005ac1c4ba3b2741b57d4a",
      "roles": [
        "data",
        "ingest",
        "master",
        "remote_cluster_client"
      ],
      "attributes": {
        "shard_indexing_pressure_enabled": "true"
      },
      "settings": {
        "cluster": {
          "initial_master_nodes": "opensearch-prod-1",
          "name": "cluster-prod"
        },
        "node": {
          "attr": {
            "shard_indexing_pressure_enabled": "true"
          },
          "name": "opensearch-prod-1"
        },
        "path": {
          "logs": "/usr/share/opensearch/logs",
          "home": "/usr/share/opensearch"
        },
        "discovery": {
          "seed_hosts": "opensearch-prod-1"
        },
        "client": {
          "type": "node"
        },
        "http": {
          "compression": "false",
          "type": "org.opensearch.security.http.SecurityHttpServerTransport",
          "publish_host": "opensearch-prod-1",
          "type.default": "netty4"
        },
        "bootstrap": {
          "memory_lock": "true"
        },
        "transport": {
          "publish_port": "9300",
          "bind_host": "0.0.0.0",
          "type": "org.opensearch.security.ssl.http.netty.SecuritySSLNettyTransport",
          "publish_host": "opensearch-prod-1",
          "type.default": "netty4"
        },
        "network": {
          "host": "0.0.0.0",
          "publish_host": "opensearch-prod-1"
        }
      }
    }
  }
}

Looks like transport settings are correct.
Then i have installed the second node, with this docker-compose config:

services:
  opensearch:
    image: opensearchproject/opensearch:1.2.3
    container_name: opensearch-prod-2
    environment:
      - cluster.name=cluster-prod
      - node.name=opensearch-prod-2
      - network.host=0.0.0.0
      - network.publish_host=opensearch-prod-2
      - http.publish_host=opensearch-prod-2
      - transport.publish_host=opensearch-prod-2
      - transport.bind_host=0.0.0.0
      - transport.publish_port=9300
      - discovery.seed_hosts=opensearch-prod-1
      - cluster.initial_master_nodes=opensearch-prod-1
      - bootstrap.memory_lock=true
      - "OPENSEARCH_JAVA_OPTS=-Xms1024m -Xmx1024m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - ./data:/usr/share/opensearch/data
      - ./tls:/usr/share/opensearch/config/tls:ro
      - ./configs/node-2.yaml:/usr/share/opensearch/config/opensearch.yml:ro
    ports:
      - 9200:9200
      - 9300:9300
    networks:
      basenet:
        ipv4_address: 10.10.1.200
    extra_hosts:
      - "opensearch-prod-1:192.168.82.31"
      - "opensearch-prod-3:192.168.82.33"

(There is a strange/unusual ipv4_address setting in configuration, but it is here just to get internal ips of opensearches different. Because i have the same docker network CIDR on other hosts)
And started the container and run tcpdump on container interface at the same time.
There is an error repeating in opensearch log.

[2022-01-21T05:41:21,968][INFO ][o.o.n.Node               ] [opensearch-prod-2] initialized
[2022-01-21T05:41:21,968][INFO ][o.o.n.Node               ] [opensearch-prod-2] starting ...
[2022-01-21T05:41:22,158][INFO ][o.o.t.TransportService   ] [opensearch-prod-2] publish_address {opensearch-prod-2/10.10.1.200:9300}, bound_addresses {0.0.0.0:9300}
[2022-01-21T05:41:22,345][INFO ][o.o.b.BootstrapChecks    ] [opensearch-prod-2] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2022-01-21T05:41:22,355][INFO ][o.o.c.c.ClusterBootstrapService] [opensearch-prod-2] skipping cluster bootstrapping as local node does not match bootstrap requirements: [opensearch-prod-1]
[2022-01-21T05:41:25,826][WARN ][o.o.d.HandshakingTransportAddressConnector] [opensearch-prod-2] [connectToRemoteMasterNode[192.168.82.31:9300]] completed handshake with [{opensearch-prod-1}{vsTRCCaRSX2Yx_dkmW-dfw}{CuMH11lORXiBVQopVeJVNw}{opensearch-prod-1}{10.10.1.2:9300}{dimr}{shard_indexing_pressure_enabled=true}] but followup connection failed
org.opensearch.transport.ConnectTransportException: [opensearch-prod-1][10.10.1.2:9300] connect_exception
        at org.opensearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:1064) ~[opensearch-1.2.3.jar:1.2.3]
        at org.opensearch.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:213) ~[opensearch-1.2.3.jar:1.2.3]
        at org.opensearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:55) ~[opensearch-core-1.2.3.jar:1.2.3]
        at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) ~[?:?]
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2152) ~[?:?]
        at org.opensearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:70) ~[opensearch-core-1.2.3.jar:1.2.3]
        at org.opensearch.transport.netty4.Netty4TcpChannel.lambda$addListener$0(Netty4TcpChannel.java:81) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) ~[?:?]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:321) ~[?:?]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:337) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:707) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:620) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:583) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) ~[?:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[?:?]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
        at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: No route to host: opensearch-prod-1/10.10.1.2:9300
Caused by: java.net.NoRouteToHostException: No route to host
        at sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
        at sun.nio.ch.Net.pollConnectNow(Net.java:660) ~[?:?]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:875) ~[?:?]
        at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330) ~[?:?]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) ~[?:?]
        ... 7 more

On a top of that message there is a correct transport publish ip of opensearch-prod-1 master node: “[connectToRemoteMasterNode[192.168.82.31:9300]] completed handshake with [{opensearch-prod-1}”
But on the bottom it is wrong: “opensearch-prod-1/10.10.1.2:9300” (this is internal opensearch ip of opensearch-prod-1)
The same repeating pattern appears in tcpdump:

[root@backing-node2 opensearch]# tcpdump -nn -i veth027e42c
...
12:45:41.461303 IP 10.10.1.200.54402 > 192.168.82.31.9300: Flags [S], seq 2584892031, win 29200, options [mss 1460,sackOK,TS val 489883527 ecr 0,nop,wscale 7], length 0
12:45:41.462153 IP 192.168.82.31.9300 > 10.10.1.200.54402: Flags [S.], seq 3232814081, ack 2584892032, win 28960, options [mss 1460,sackOK,TS val 1698019293 ecr 489883527,nop,wscale 7], length 0
12:45:41.462183 IP 10.10.1.200.54402 > 192.168.82.31.9300: Flags [.], ack 1, win 229, options [nop,nop,TS val 489883528 ecr 1698019293], length 0
12:45:41.468187 IP 10.10.1.200.54402 > 192.168.82.31.9300: Flags [P.], seq 1:353, ack 1, win 229, options [nop,nop,TS val 489883534 ecr 1698019293], length 352
12:45:41.468860 IP 192.168.82.31.9300 > 10.10.1.200.54402: Flags [.], ack 353, win 235, options [nop,nop,TS val 1698019299 ecr 489883534], length 0
12:45:41.488333 IP 192.168.82.31.9300 > 10.10.1.200.54402: Flags [P.], seq 1:1555, ack 353, win 235, options [nop,nop,TS val 1698019319 ecr 489883534], length 1554
12:45:41.488436 IP 10.10.1.200.54402 > 192.168.82.31.9300: Flags [.], ack 1555, win 253, options [nop,nop,TS val 489883554 ecr 1698019319], length 0
12:45:41.497892 IP 10.10.1.200.54402 > 192.168.82.31.9300: Flags [P.], seq 353:1661, ack 1555, win 253, options [nop,nop,TS val 489883563 ecr 1698019319], length 1308
12:45:41.501752 IP 192.168.82.31.9300 > 10.10.1.200.54402: Flags [P.], seq 1555:3612, ack 1661, win 258, options [nop,nop,TS val 1698019332 ecr 489883563], length 2057
12:45:41.501906 IP 10.10.1.200.54402 > 192.168.82.31.9300: Flags [.], ack 3612, win 285, options [nop,nop,TS val 489883567 ecr 1698019332], length 0
12:45:41.504572 IP 10.10.1.200.54402 > 192.168.82.31.9300: Flags [P.], seq 1661:1841, ack 3612, win 285, options [nop,nop,TS val 489883570 ecr 1698019332], length 180
12:45:41.506583 IP 192.168.82.31.9300 > 10.10.1.200.54402: Flags [P.], seq 3612:4062, ack 1841, win 278, options [nop,nop,TS val 1698019337 ecr 489883570], length 450
12:45:41.508582 IP 10.10.1.200.54402 > 192.168.82.31.9300: Flags [P.], seq 1841:1881, ack 4062, win 308, options [nop,nop,TS val 489883574 ecr 1698019337], length 40
12:45:41.508715 IP 10.10.1.200.54402 > 192.168.82.31.9300: Flags [F.], seq 1881, ack 4062, win 308, options [nop,nop,TS val 489883574 ecr 1698019337], length 0
12:45:41.509577 IP 192.168.82.31.9300 > 10.10.1.200.54402: Flags [F.], seq 4062, ack 1882, win 278, options [nop,nop,TS val 1698019340 ecr 489883574], length 0
12:45:41.509600 IP 10.10.1.200.54402 > 192.168.82.31.9300: Flags [.], ack 4063, win 308, options [nop,nop,TS val 489883575 ecr 1698019340], length 0
12:45:41.820174 ARP, Request who-has 10.10.1.2 tell 10.10.1.200, length 28
12:45:42.845180 ARP, Request who-has 10.10.1.2 tell 10.10.1.200, length 28
12:45:43.868421 ARP, Request who-has 10.10.1.2 tell 10.10.1.200, length 28
12:45:44.461912 IP 10.10.1.200.54430 > 192.168.82.31.9300: Flags [S], seq 1457013135, win 29200, options [mss 1460,sackOK,TS val 489886527 ecr 0,nop,wscale 7], length 0
12:45:44.462650 IP 192.168.82.31.9300 > 10.10.1.200.54430: Flags [S.], seq 1796279173, ack 1457013136, win 28960, options [mss 1460,sackOK,TS val 1698022293 ecr 489886527,nop,wscale 7], length 0
12:45:44.462685 IP 10.10.1.200.54430 > 192.168.82.31.9300: Flags [.], ack 1, win 229, options [nop,nop,TS val 489886528 ecr 1698022293], length 0
12:45:44.467873 IP 10.10.1.200.54430 > 192.168.82.31.9300: Flags [P.], seq 1:353, ack 1, win 229, options [nop,nop,TS val 489886533 ecr 1698022293], length 352

Successful communication with 192.168.82.31.9300 (those are set to transport publish host and port on master node) and then attempts to connect to 10.10.1.2 - internal ip of opensearch-prod-1 (this triggers arp request because technically 10.10.1.2 look like ip from the same network)

How can i force new node to use node’s external ips 192.168.82.31:9300 for transport communication with master opensearch node?
Is the usage of dst nat is the only option?
I was thinking the transport publish_host is exactly the thing intended for this.

publilsh_host is for the result of a sniff request external to the cluster.

Did you ever make progress on this issue?