Hi,
Im using opensearch version 2.9
We have enabled Cross-Cluster replication with security enabled and trying to delete the index from the follower cluster by stopping the replication for each of the index. The data pod is restarting with the below error and sequence:
- Install Leader and Follower as helm charts in 2 different k8s cluster. ( 1 data, 1 cluster_manager )
- The indices are getting replicated from leader to follower on regular basis for the pattern “test*” using autofollow API
- In the Follower, we are trying to delete the replicated indices -
a) Stop the replication using stop API
b) Delete the indices now using curator by configuring the curator to delete all the indices with prefix test
We are seeing the below error when it tries to delete the index and the data pod restarts
{"type":"log","level":"INFO","time": "2023-09-20T11:04:32.271Z","logger":"o.o.r.t.i.IndexReplicationTask","marker":"[sa-indexsearch-data-0] [test-2023.09.20] ","log":{"message":"In restoring state for test-2023.09.20"}}
{"type":"log","level":"INFO","time": "2023-09-20T11:04:32.294Z","logger":"o.o.r.t.i.IndexReplicationTask","marker":"[sa-indexsearch-data-0] [test-2023.09.20] ","log":{"message":"Verifying task details - currentTask={isAssigned=true,executorNode=_IVFFZnnQv6HCTJ3hIAS1w}"}}
{"type":"log","level":"INFO","time": "2023-09-20T11:04:32.296Z","logger":"o.o.r.t.i.IndexReplicationTask","marker":"[sa-indexsearch-data-0] [test-2023.09.20] ","log":{"message":"Replication stopped before restore could finish, so removing partial restore.."}}
{"type":"log","level":"INFO","time": "2023-09-20T11:04:32.305Z","logger":"o.o.r.s.RemoteClusterRetentionLeaseHelper","marker":"[sa-indexsearch-data-0] ","log":{"message":"Removed retention lease with id - replication:sanjay-sa:V4SSQstZRvO_JQUYbKwADg:[test-2023.09.20][0]"}}
{"type":"log","level":"INFO","systemid":"BSSC-1234","system":"BSSC","time": "2023-09-20T11:04:32.305Z","logger":"o.o.r.t.i.IndexReplicationTask","timezone":"UTC","marker":"[sa-indexsearch-data-0] [test-2023.09.20] ","log":{"message":"Deleting the index test-2023.09.20"}}
{"type":"log","level":"ERROR","time": "2023-09-20T11:04:32.369Z","logger":"o.o.b.OpenSearchUncaughtExceptionHandler",,"marker":"[sa-indexsearch-data-0] ","log":{"message":"fatal error in thread [opensearch[sa-indexsearch-data-0][replication_follower][T#1]], exiting"}}
java.lang.NoSuchMethodError: 'java.util.List java.util.stream.Stream.toList()'
at org.opensearch.replication.task.index.IndexReplicationTask.doesValidIndexExists(IndexReplicationTask.kt:894) ~[opensearch-cross-cluster-replication-2.9.0.0.jar:2.9.0.0]
at org.opensearch.replication.task.index.IndexReplicationTask.waitForRestore(IndexReplicationTask.kt:860) ~[opensearch-cross-cluster-replication-2.9.0.0.jar:2.9.0.0]
at org.opensearch.replication.task.index.IndexReplicationTask.execute$suspendImpl(IndexReplicationTask.kt:189) ~[opensearch-cross-cluster-replication-2.9.0.0.jar:2.9.0.0]
at org.opensearch.replication.task.index.IndexReplicationTask$execute$1.invokeSuspend(IndexReplicationTask.kt) ~[opensearch-cross-cluster-replication-2.9.0.0.jar:2.9.0.0]
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) [kotlin-stdlib-1.6.0.jar:1.6.0-release-798(1.6.0)]
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106) [kotlinx-coroutines-core-jvm-1.6.0.jar:?]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:849) [opensearch-2.9.0.jar:2.9.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
at java.lang.Thread.run(Unknown Source) [?:?]
fatal error in thread [opensearch[sa-indexsearch-data-0][replication_follower][T#1]], exiting
java.lang.NoSuchMethodError: 'java.util.List java.util.stream.Stream.toList()'
at org.opensearch.replication.task.index.IndexReplicationTask.doesValidIndexExists(IndexReplicationTask.kt:894)
at org.opensearch.replication.task.index.IndexReplicationTask.waitForRestore(IndexReplicationTask.kt:860)
at org.opensearch.replication.task.index.IndexReplicationTask.execute$suspendImpl(IndexReplicationTask.kt:189)
at org.opensearch.replication.task.index.IndexReplicationTask$execute$1.invokeSuspend(IndexReplicationTask.kt)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:849)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
The above error has occurred while deleting indices and suddenly the pod has restarted during the deletion of test-2023.09.20 index and it was able to delete few test* index before this exception.
The expectation is that it should not abruptly stop and restart the pod saying noSuchMethodError.
What is the root cause for this scenario and what is the impact. What happens after this restart? Will there be any issue with replication?