The index .opendistro_security remains yellow while adding new pod

Hi folks,

I added new physical machine to the k8s cluster(managed by Rancher). To increase the pod count of OpenSearch-data and the new pod (#104) is running on the new physical machine.

From the OpenSearch console, the opendistro_secrity stays yellow forever.

green  open .opendistro-reports-instances                 ENyTQKMNRtivUx4qYfynWA  1   2           0  0    624b    208b
yellow open .opendistro_security                          FjgR2YEdRYmNlNCKF9FnxA  1 104           9  3  13.6mb  56.7kb
green  open filebeat-swift-v1-account-server-2022.03.01   5MDHHTV-THqmLRbl88ghvQ  1   0     2168709  0 538.9mb 538.9mb

From logs, the existing pod complains the connection failed to new pod. And the new pod complains the same. Is there a known scale-out problem?

#old existing pod
[2022-03-30T13:22:33,242][WARN ][o.o.c.NodeConnectionsService] [opensearch-cluster-data-14] failed to connect to {opensearch-cluster-data-104}{i16bRv4uSmyDIwOxSuYUAA}{gm_eG2IzTsG1wpegZtTerg}{10.42.7.28}{10.42.7.28:9300}{d}{shard_indexing_pressure_enabled=true} (tried [181] times)
org.opensearch.transport.ConnectTransportException: [opensearch-cluster-data-104][10.42.7.28:9300] general node connection failure
at org.opensearch.transport.TcpTransport$ChannelsConnectedListener.lambda$onResponse$2(TcpTransport.java:1052) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.action.ActionListener$1.onFailure(ActionListener.java:86) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.transport.TransportHandshaker$HandshakeResponseHandler.handleLocalException(TransportHandshaker.java:199) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.transport.TransportHandshaker.lambda$sendHandshake$0(TransportHandshaker.java:77) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.action.ActionListener.lambda$wrap$0(ActionListener.java:147) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.action.ActionListener$1.onResponse(ActionListener.java:78) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:211) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:52) ~[opensearch-core-1.2.4.jar:1.2.4]
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) ~[?:?]
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) ~[?:?]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2137) ~[?:?]
at org.opensearch.common.concurrent.CompletableContext.complete(CompletableContext.java:74) ~[opensearch-core-1.2.4.jar:1.2.4]
at org.opensearch.transport.netty4.Netty4TcpChannel.lambda$addListener$0(Netty4TcpChannel.java:74) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:605) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104) ~[?:?]
at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84) ~[?:?]
at io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:1182) ~[?:?]
at io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:773) ~[?:?]
at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:749) ~[?:?]
at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:620) ~[?:?]
at io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChannelPipeline.java:1352) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:622) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:606) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:472) ~[?:?]
at io.netty.handler.ssl.SslUtils.handleHandshakeFailure(SslUtils.java:445) ~[?:?]
at io.netty.handler.ssl.SslHandler$7.run(SslHandler.java:2116) ~[?:?]
at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[?:?]
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) ~[?:?]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[?:?]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) ~[?:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) ~[?:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[?:?]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: org.opensearch.transport.TransportException: handshake failed because connection reset
... 37 more
[2022-03-30T13:22:53,253][WARN ][o.o.t.OutboundHandler ] [opensearch-cluster-data-14] send message failed [channel: Netty4TcpChannel{localAddress=/10.42.0.104:57028, remoteAddress=10.42.7.28/10.42.7.28:9300}]
io.netty.handler.ssl.SslHandshakeTimeoutException: handshake timed out after 10000ms
at io.netty.handler.ssl.SslHandler$7.run(SslHandler.java:2112) [netty-handler-4.1.72.Final.jar:4.1.72.Final]
at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) [netty-common-4.1.72.Final.jar:4.1.72.Final]
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) [netty-common-4.1.72.Final.jar:4.1.72.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.72.Final.jar:4.1.72.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) [netty-common-4.1.72.Final.jar:4.1.72.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) [netty-transport-4.1.72.Final.jar:4.1.72.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [netty-common-4.1.72.Final.jar:4.1.72.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.72.Final.jar:4.1.72.Final]
at java.lang.Thread.run(Thread.java:832) [?:?]
# logs from new pod
{"log":"io.netty.handler.ssl.SslHandshakeTimeoutException: handshake timed out after 10000ms\n","stream":"stdout","time":"2022-03-30T12:12:13.063329995Z"}
{"log":"\u0009at io.netty.handler.ssl.SslHandler$7.run(SslHandler.java:2112) [netty-handler-4.1.72.Final.jar:4.1.72.Final]\n","stream":"stdout","time":"2022-03-30T12:12:13.063333411Z"}
{"log":"\u0009at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) [netty-common-4.1.72.Final.jar:4.1.72.Final]\n","stream":"stdout","time":"2022-03-30T12:12:13.063336597Z"}
{"log":"\u0009at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) [netty-common-4.1.72.Final.jar:4.1.72.Final]\n","stream":"stdout","time":"2022-03-30T12:12:13.063339613Z"}
{"log":"\u0009at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.72.Final.jar:4.1.72.Final]\n","stream":"stdout","time":"2022-03-30T12:12:13.063342659Z"}
{"log":"\u0009at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) [netty-common-4.1.72.Final.jar:4.1.72.Final]\n","stream":"stdout","time":"2022-03-30T12:12:13.063345684Z"}
{"log":"\u0009at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) [netty-transport-4.1.72.Final.jar:4.1.72.Final]\n","stream":"stdout","time":"2022-03-30T12:12:13.06334881Z"}
{"log":"\u0009at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [netty-common-4.1.72.Final.jar:4.1.72.Final]\n","stream":"stdout","time":"2022-03-30T12:12:13.063351826Z"}
{"log":"\u0009at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.72.Final.jar:4.1.72.Final]\n","stream":"stdout","time":"2022-03-30T12:12:13.063355052Z"}
{"log":"\u0009at java.lang.Thread.run(Thread.java:832) [?:?]\n","stream":"stdout","time":"2022-03-30T12:12:13.063358108Z"}
{"log":"[2022-03-30T12:12:13,061][WARN ][o.o.c.NodeConnectionsService] [opensearch-cluster-data-104] failed to connect to {opensearch-cluster-data-20}{rpypK-kNR5-VPixJhtS1ng}{bpxKxi--Qo-oBcHBRBv1xQ}{10.42.6.224}{10.42.6.224:9300}{d}{shard_indexing_pressure_enabled=true} (tried [1] times)\n","stream":"stdout","time":"2022-03-30T12:12:13.063733992Z"}
{"log":"org.opensearch.transport.ConnectTransportException: [opensearch-cluster-data-20][10.42.6.224:9300] general node connection failure\n","stream":"stdout","time":"2022-03-30T12:12:13.063741275Z"}
{"log":"\u0009at org.opensearch.transport.TcpTransport$ChannelsConnectedListener.lambda$onResponse$2(TcpTransport.java:1052) ~[opensearch-1.2.4.jar:1.2.4]\n","stream":"stdout","time":"2022-03-30T12:12:13.063746856Z"}
{"log":"\u0009at org.opensearch.action.ActionListener$1.onFailure(ActionListener.java:86) ~[opensearch-1.2.4.jar:1.2.4]\n","stream":"stdout","time":"2022-03-30T12:12:13.06374891Z"}
{"log":"\u0009at org.opensearch.transport.TransportHandshaker$HandshakeResponseHandler.handleLocalException(TransportHandshaker.java:199) ~[opensearch-1.2.4.jar:1.2.4]\n","stream":"stdout","time":"2022-03-30T12:12:13.063750874Z"}
{"log":"\u0009at org.opensearch.transport.TransportHandshaker.lambda$sendHandshake$0(TransportHandshaker.java:77) ~[opensearch-1.2.4.jar:1.2.4]\n","stream":"stdout","time":"2022-03-30T12:12:13.063752847Z"}
{"log":"\u0009at org.opensearch.action.ActionListener.lambda$wrap$0(ActionListener.java:147) ~[opensearch-1.2.4.jar:1.2.4]\n","stream":"stdout","time":"2022-03-30T12:12:13.063754781Z"}
{"log":"\u0009at org.opensearch.action.ActionListener$1.onResponse(ActionListener.java:78) ~[opensearch-1.2.4.jar:1.2.4]\n","stream":"stdout","time":"2022-03-30T12:12:13.063756705Z"}
{"log":"\u0009at org.opensearch.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:211) ~[opensearch-1.2.4.jar:1.2.4]\n","stream":"stdout","time":"2022-03-30T12:12:13.063758628Z"}
{"log":"\u0009at org.opensearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:52) ~[opensearch-core-1.2.4.jar:1.2.4]\n","stream":"stdout","time":"2022-03-30T12:12:13.063760552Z"}
{"log":"\u0009at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) ~[?:?]\n","stream":"stdout","time":"2022-03-30T12:12:13.063762506Z"}

Thanks // Hugo

The new pod was connected at 00:00 . Anyone knows why it’s 00:00 ?
Just spawn a new one and waiting for it.

root@s8k-sjc3-c08-sup-0001:~# ls /mnt/nvme2n1/nodes/0/indices/FjgR2YEdRYmNlNCKF9FnxA/0/index/  -alh
total 8.0K
drwxrwsr-x 2 1000 1000 4.0K Mar 31 01:02 .
drwxrwsr-x 4 1000 1000 4.0K Mar 31 01:02 ..
-rw-rw-r-- 1 1000 1000    0 Mar 30 06:53 write.lock
root@s8k-sjc3-c08-sup-0001:~# ls /mnt/nvme4n1/nodes/0/indices/FjgR2YEdRYmNlNCKF9FnxA/0/index/  -alh
total 160K
drwxrwsr-x 2 1000 1000 4.0K Mar 31 00:02 .
drwxrwsr-x 5 1000 1000 4.0K Mar 31 00:02 ..
-rw-rw-r-- 1 1000 1000  158 Mar 31 00:02 _n.fdm
-rw-rw-r-- 1 1000 1000 8.8K Mar 31 00:02 _n.fdt
-rw-rw-r-- 1 1000 1000   83 Mar 31 00:02 _n.fdx
-rw-rw-r-- 1 1000 1000 2.1K Mar 31 00:02 _n.fnm
-rw-rw-r-- 1 1000 1000  127 Mar 31 00:02 _n.kdd
-rw-rw-r-- 1 1000 1000   68 Mar 31 00:02 _n.kdi
-rw-rw-r-- 1 1000 1000  143 Mar 31 00:02 _n.kdm
-rw-rw-r-- 1 1000 1000  169 Mar 31 00:02 _n.nvd
-rw-rw-r-- 1 1000 1000  391 Mar 31 00:02 _n.nvm
-rw-rw-r-- 1 1000 1000  583 Mar 31 00:02 _n.si
-rw-rw-r-- 1 1000 1000 2.1K Mar 31 00:02 _n_2.fnm
-rw-rw-r-- 1 1000 1000   91 Mar 31 00:02 _n_2_Lucene80_0.dvd
-rw-rw-r-- 1 1000 1000  160 Mar 31 00:02 _n_2_Lucene80_0.dvm
-rw-rw-r-- 1 1000 1000  520 Mar 31 00:02 _n_Lucene80_0.dvd
-rw-rw-r-- 1 1000 1000 1.1K Mar 31 00:02 _n_Lucene80_0.dvm
-rw-rw-r-- 1 1000 1000   82 Mar 31 00:02 _n_Lucene84_0.doc
-rw-rw-r-- 1 1000 1000  133 Mar 31 00:02 _n_Lucene84_0.pos
-rw-rw-r-- 1 1000 1000  14K Mar 31 00:02 _n_Lucene84_0.tim
-rw-rw-r-- 1 1000 1000   85 Mar 31 00:02 _n_Lucene84_0.tip
-rw-rw-r-- 1 1000 1000 4.4K Mar 31 00:02 _n_Lucene84_0.tmd
-rw-rw-r-- 1 1000 1000  479 Mar 31 00:02 _q.cfe
-rw-rw-r-- 1 1000 1000 4.1K Mar 31 00:02 _q.cfs
-rw-rw-r-- 1 1000 1000  370 Mar 31 00:02 _q.si
-rw-rw-r-- 1 1000 1000  479 Mar 31 00:02 _t.cfe
-rw-rw-r-- 1 1000 1000  17K Mar 31 00:02 _t.cfs
-rw-rw-r-- 1 1000 1000  370 Mar 31 00:02 _t.si
-rw-rw-r-- 1 1000 1000  534 Mar 31 00:02 segments_e
-rw-rw-r-- 1 1000 1000    0 Mar 30 08:16 write.lock