Opensearch is down when getting errors

hi! I’m getting errors from opensearch that I’m not sure what to do with:
2023-12-04T18:40:04.368575000Z [2023-12-04T18:40:04,363][WARN ][r.suppressed ] [opensearch


-research-coordinating-1] path: /daily-/_search, params: {ignore_unavailable=true, preference=1701714973014, index=daily-, timeout=30000ms, track_total_hits=true}
2023-12-04T18:40:04.368885000Z org.opensearch.action.search.SearchPhaseExecutionException: all shards failed
2023-12-04T18:40:04.369108000Z at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:664) [opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.369295000Z at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:372) [opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.369453000Z at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:699) [opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.369618000Z at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:472) [opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.369783000Z at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:294) [opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.369980000Z at org.opensearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:104) [opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.370154000Z at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:74) [opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.370326000Z at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:755) [opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.370558000Z at org.opensearch.transport.TransportService.sendRequest(TransportService.java:824) [opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.370732000Z at org.opensearch.transport.TransportService.sendChildRequest(TransportService.java:890) [opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.370895000Z at org.opensearch.transport.TransportService.sendChildRequest(TransportService.java:878) [opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.371112000Z at org.opensearch.action.search.SearchTransportService.sendExecuteQuery(SearchTransportService.java:248) [opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.371341000Z at org.opensearch.action.search.SearchQueryThenFetchAsyncAction.executePhaseOnShard(SearchQueryThenFetchAsyncAction.java:133) [opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.371549000Z at org.opensearch.action.search.AbstractSearchAsyncAction.lambda$performPhaseOnShard$3(AbstractSearchAsyncAction.java:281) [opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.371719000Z at org.opensearch.action.search.AbstractSearchAsyncAction$PendingExecutions.tryRun(AbstractSearchAsyncAction.java:800) [opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.371961000Z at org.opensearch.action.search.AbstractSearchAsyncAction$PendingExecutions.finishAndRunNext(AbstractSearchAsyncAction.java:794) [opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.372156000Z at org.opensearch.action.search.AbstractSearchAsyncAction$2.doRun(AbstractSearchAsyncAction.java:350) [opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.372329000Z at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.372614000Z at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) [opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.372795000Z at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.372995000Z at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) [opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.373250000Z at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:806) [opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.373422000Z at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.373594000Z at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
2023-12-04T18:40:04.373764000Z at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
2023-12-04T18:40:04.373973000Z at java.lang.Thread.run(Thread.java:833) [?:?]
2023-12-04T18:40:04.374155000Z Caused by: org.opensearch.tasks.TaskCancelledException: The parent task was cancelled, shouldn’t start any child tasks
2023-12-04T18:40:04.374316000Z at org.opensearch.tasks.TaskManager$CancellableTaskHolder.registerChildNode(TaskManager.java:632) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.374472000Z at org.opensearch.tasks.TaskManager.registerChildNode(TaskManager.java:311) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.374741000Z at org.opensearch.transport.TransportService.sendRequest(TransportService.java:783) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.374900000Z at org.opensearch.transport.TransportService.sendChildRequest(TransportService.java:890) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.375100000Z at org.opensearch.transport.TransportService.sendChildRequest(TransportService.java:878) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.375274000Z at org.opensearch.action.search.SearchTransportService.sendExecuteQuery(SearchTransportService.java:248) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.375537000Z at org.opensearch.action.search.SearchQueryThenFetchAsyncAction.executePhaseOnShard(SearchQueryThenFetchAsyncAction.java:133) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.375713000Z at org.opensearch.action.search.AbstractSearchAsyncAction.lambda$performPhaseOnShard$3(AbstractSearchAsyncAction.java:281) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.375893000Z at org.opensearch.action.search.AbstractSearchAsyncAction$PendingExecutions.tryRun(AbstractSearchAsyncAction.java:800) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.376145000Z at org.opensearch.action.search.AbstractSearchAsyncAction.performPhaseOnShard(AbstractSearchAsyncAction.java:322) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.376359000Z at org.opensearch.action.search.AbstractSearchAsyncAction.run(AbstractSearchAsyncAction.java:252) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.376547000Z at org.opensearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:427) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.376714000Z at org.opensearch.action.search.AbstractSearchAsyncAction.start(AbstractSearchAsyncAction.java:218) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.376880000Z at org.opensearch.action.search.TransportSearchAction$5.run(TransportSearchAction.java:1160) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.377053000Z at org.opensearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:427) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.377199000Z at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:421) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.377370000Z at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:699) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.377593000Z at org.opensearch.action.search.AbstractSearchAsyncAction.successfulShardExecution(AbstractSearchAsyncAction.java:581) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.377750000Z at org.opensearch.action.search.AbstractSearchAsyncAction.onShardResultConsumed(AbstractSearchAsyncAction.java:568) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.377925000Z at org.opensearch.action.search.AbstractSearchAsyncAction.lambda$onShardResult$9(AbstractSearchAsyncAction.java:551) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.378078000Z at org.opensearch.action.search.CanMatchPreFilterSearchPhase$CanMatchSearchPhaseResults.consumeResult(CanMatchPreFilterSearchPhase.java:228) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.378232000Z at org.opensearch.action.search.CanMatchPreFilterSearchPhase$CanMatchSearchPhaseResults.consumeResult(CanMatchPreFilterSearchPhase.java:212) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.378411000Z at org.opensearch.action.search.AbstractSearchAsyncAction.onShardResult(AbstractSearchAsyncAction.java:551) [opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.378558000Z at org.opensearch.action.search.AbstractSearchAsyncAction$1.innerOnResponse(AbstractSearchAsyncAction.java:285) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.378756000Z at org.opensearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:59) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.378922000Z at org.opensearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:44) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.379150000Z at org.opensearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:69) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.379288000Z at org.opensearch.transport.TransportService$6.handleResponse(TransportService.java:788) ~[opensearch-2.7.0.jar:2.7.0]
2023-12-04T18:40:04.379411000Z at org.opensearch.security.transport.SecurityInterceptor$RestoringTransportResponseHandler.handleResponse(SecurityInterceptor.java:306) ~[?:?]
2023-12-04T18:40:04.379583000Z at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1404) ~[opensearch-2.7.0.jar:2.7.0]

after getting this errors opensearch is red and we are getting a lag in our kafka
thanks!


@taltsafrir could you please check if there is another exception in the logs that precedes this one?Thank you

Exception during establishing a SSL connection: java.net.SocketException: Connection reset
2023-12-10T15:43:25.177281000Z java.net.SocketException: Connection reset
2023-12-10T15:43:25.201705000Z at sun.nio.ch.SocketChannelImpl.throwConnectionReset(SocketChannelImpl.java:394) ~[?:?]
2023-12-10T15:43:25.201848000Z at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:426) ~[?:?]
2023-12-10T15:43:25.201908000Z at org.opensearch.transport.CopyBytesSocketChannel.readFromSocketChannel(CopyBytesSocketChannel.java:155) ~[transport-netty4-client-2.7.0.jar:2.7.0]
2023-12-10T15:43:25.212446000Z at org.opensearch.transport.CopyBytesSocketChannel.doReadBytes(CopyBytesSocketChannel.java:140) ~[transport-netty4-client-2.7.0.jar:2.7.0]
2023-12-10T15:43:25.212639000Z at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:151) [netty-transport-4.1.91.Final.jar:4.1.91.Final]
2023-12-10T15:43:25.212698000Z at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) [netty-transport-4.1.91.Final.jar:4.1.91.Final]
2023-12-10T15:43:25.244362000Z at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) [netty-transport-4.1.91.Final.jar:4.1.91.Final]
2023-12-10T15:43:25.244600000Z at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) [netty-transport-4.1.91.Final.jar:4.1.91.Final]
2023-12-10T15:43:25.244711000Z at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) [netty-transport-4.1.91.Final.jar:4.1.91.Final]
2023-12-10T15:43:25.244879000Z at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) [netty-common-4.1.91.Final.jar:4.1.91.Final]
2023-12-10T15:43:25.244961000Z at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.91.Final.jar:4.1.91.Final]
2023-12-10T15:43:25.245044000Z at java.lang.Thread.run(Thread.java:833) [?:?]

this error is from one of the data servers