Hi folks,
I added new physical machine to the k8s cluster(managed by Rancher). To increase the pod count of OpenSearch-data and the new pod (#104) is running on the new physical machine.
From the OpenSearch console, the opendistro_secrity stays yellow forever.
green open .opendistro-reports-instances ENyTQKMNRtivUx4qYfynWA 1 2 0 0 624b 208b
yellow open .opendistro_security FjgR2YEdRYmNlNCKF9FnxA 1 104 9 3 13.6mb 56.7kb
green open filebeat-swift-v1-account-server-2022.03.01 5MDHHTV-THqmLRbl88ghvQ 1 0 2168709 0 538.9mb 538.9mb
From logs, the existing pod complains the connection failed to new pod. And the new pod complains the same. Is there a known scale-out problem?
#old existing pod
[2022-03-30T13:22:33,242][WARN ][o.o.c.NodeConnectionsService] [opensearch-cluster-data-14] failed to connect to {opensearch-cluster-data-104}{i16bRv4uSmyDIwOxSuYUAA}{gm_eG2IzTsG1wpegZtTerg}{10.42.7.28}{10.42.7.28:9300}{d}{shard_indexing_pressure_enabled=true} (tried [181] times)
org.opensearch.transport.ConnectTransportException: [opensearch-cluster-data-104][10.42.7.28:9300] general node connection failure
at org.opensearch.transport.TcpTransport$ChannelsConnectedListener.lambda$onResponse$2(TcpTransport.java:1052) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.action.ActionListener$1.onFailure(ActionListener.java:86) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.transport.TransportHandshaker$HandshakeResponseHandler.handleLocalException(TransportHandshaker.java:199) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.transport.TransportHandshaker.lambda$sendHandshake$0(TransportHandshaker.java:77) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.action.ActionListener.lambda$wrap$0(ActionListener.java:147) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.action.ActionListener$1.onResponse(ActionListener.java:78) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:211) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:52) ~[opensearch-core-1.2.4.jar:1.2.4]
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) ~[?:?]
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) ~[?:?]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2137) ~[?:?]
at org.opensearch.common.concurrent.CompletableContext.complete(CompletableContext.java:74) ~[opensearch-core-1.2.4.jar:1.2.4]
at org.opensearch.transport.netty4.Netty4TcpChannel.lambda$addListener$0(Netty4TcpChannel.java:74) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:605) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104) ~[?:?]
at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84) ~[?:?]
at io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:1182) ~[?:?]
at io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:773) ~[?:?]
at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:749) ~[?:?]
at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:620) ~[?:?]
at io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChannelPipeline.java:1352) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:622) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:606) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:472) ~[?:?]
at io.netty.handler.ssl.SslUtils.handleHandshakeFailure(SslUtils.java:445) ~[?:?]
at io.netty.handler.ssl.SslHandler$7.run(SslHandler.java:2116) ~[?:?]
at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[?:?]
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) ~[?:?]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[?:?]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) ~[?:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) ~[?:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[?:?]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: org.opensearch.transport.TransportException: handshake failed because connection reset
... 37 more
[2022-03-30T13:22:53,253][WARN ][o.o.t.OutboundHandler ] [opensearch-cluster-data-14] send message failed [channel: Netty4TcpChannel{localAddress=/10.42.0.104:57028, remoteAddress=10.42.7.28/10.42.7.28:9300}]
io.netty.handler.ssl.SslHandshakeTimeoutException: handshake timed out after 10000ms
at io.netty.handler.ssl.SslHandler$7.run(SslHandler.java:2112) [netty-handler-4.1.72.Final.jar:4.1.72.Final]
at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) [netty-common-4.1.72.Final.jar:4.1.72.Final]
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) [netty-common-4.1.72.Final.jar:4.1.72.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.72.Final.jar:4.1.72.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) [netty-common-4.1.72.Final.jar:4.1.72.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) [netty-transport-4.1.72.Final.jar:4.1.72.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [netty-common-4.1.72.Final.jar:4.1.72.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.72.Final.jar:4.1.72.Final]
at java.lang.Thread.run(Thread.java:832) [?:?]
# logs from new pod
{"log":"io.netty.handler.ssl.SslHandshakeTimeoutException: handshake timed out after 10000ms\n","stream":"stdout","time":"2022-03-30T12:12:13.063329995Z"}
{"log":"\u0009at io.netty.handler.ssl.SslHandler$7.run(SslHandler.java:2112) [netty-handler-4.1.72.Final.jar:4.1.72.Final]\n","stream":"stdout","time":"2022-03-30T12:12:13.063333411Z"}
{"log":"\u0009at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) [netty-common-4.1.72.Final.jar:4.1.72.Final]\n","stream":"stdout","time":"2022-03-30T12:12:13.063336597Z"}
{"log":"\u0009at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) [netty-common-4.1.72.Final.jar:4.1.72.Final]\n","stream":"stdout","time":"2022-03-30T12:12:13.063339613Z"}
{"log":"\u0009at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.72.Final.jar:4.1.72.Final]\n","stream":"stdout","time":"2022-03-30T12:12:13.063342659Z"}
{"log":"\u0009at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) [netty-common-4.1.72.Final.jar:4.1.72.Final]\n","stream":"stdout","time":"2022-03-30T12:12:13.063345684Z"}
{"log":"\u0009at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) [netty-transport-4.1.72.Final.jar:4.1.72.Final]\n","stream":"stdout","time":"2022-03-30T12:12:13.06334881Z"}
{"log":"\u0009at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [netty-common-4.1.72.Final.jar:4.1.72.Final]\n","stream":"stdout","time":"2022-03-30T12:12:13.063351826Z"}
{"log":"\u0009at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.72.Final.jar:4.1.72.Final]\n","stream":"stdout","time":"2022-03-30T12:12:13.063355052Z"}
{"log":"\u0009at java.lang.Thread.run(Thread.java:832) [?:?]\n","stream":"stdout","time":"2022-03-30T12:12:13.063358108Z"}
{"log":"[2022-03-30T12:12:13,061][WARN ][o.o.c.NodeConnectionsService] [opensearch-cluster-data-104] failed to connect to {opensearch-cluster-data-20}{rpypK-kNR5-VPixJhtS1ng}{bpxKxi--Qo-oBcHBRBv1xQ}{10.42.6.224}{10.42.6.224:9300}{d}{shard_indexing_pressure_enabled=true} (tried [1] times)\n","stream":"stdout","time":"2022-03-30T12:12:13.063733992Z"}
{"log":"org.opensearch.transport.ConnectTransportException: [opensearch-cluster-data-20][10.42.6.224:9300] general node connection failure\n","stream":"stdout","time":"2022-03-30T12:12:13.063741275Z"}
{"log":"\u0009at org.opensearch.transport.TcpTransport$ChannelsConnectedListener.lambda$onResponse$2(TcpTransport.java:1052) ~[opensearch-1.2.4.jar:1.2.4]\n","stream":"stdout","time":"2022-03-30T12:12:13.063746856Z"}
{"log":"\u0009at org.opensearch.action.ActionListener$1.onFailure(ActionListener.java:86) ~[opensearch-1.2.4.jar:1.2.4]\n","stream":"stdout","time":"2022-03-30T12:12:13.06374891Z"}
{"log":"\u0009at org.opensearch.transport.TransportHandshaker$HandshakeResponseHandler.handleLocalException(TransportHandshaker.java:199) ~[opensearch-1.2.4.jar:1.2.4]\n","stream":"stdout","time":"2022-03-30T12:12:13.063750874Z"}
{"log":"\u0009at org.opensearch.transport.TransportHandshaker.lambda$sendHandshake$0(TransportHandshaker.java:77) ~[opensearch-1.2.4.jar:1.2.4]\n","stream":"stdout","time":"2022-03-30T12:12:13.063752847Z"}
{"log":"\u0009at org.opensearch.action.ActionListener.lambda$wrap$0(ActionListener.java:147) ~[opensearch-1.2.4.jar:1.2.4]\n","stream":"stdout","time":"2022-03-30T12:12:13.063754781Z"}
{"log":"\u0009at org.opensearch.action.ActionListener$1.onResponse(ActionListener.java:78) ~[opensearch-1.2.4.jar:1.2.4]\n","stream":"stdout","time":"2022-03-30T12:12:13.063756705Z"}
{"log":"\u0009at org.opensearch.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:211) ~[opensearch-1.2.4.jar:1.2.4]\n","stream":"stdout","time":"2022-03-30T12:12:13.063758628Z"}
{"log":"\u0009at org.opensearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:52) ~[opensearch-core-1.2.4.jar:1.2.4]\n","stream":"stdout","time":"2022-03-30T12:12:13.063760552Z"}
{"log":"\u0009at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) ~[?:?]\n","stream":"stdout","time":"2022-03-30T12:12:13.063762506Z"}
Thanks // Hugo