Hi, we have a indexing heaving cluster that for the last few weeks since upgrading to 1.3.0 has been experiencing timeouts a few times a week. During the period where indexing stops I see this in the logs:
org.elasticsearch.transport.RemoteTransportException: [master][1.1.1.1:9300][cluster:admin/ism/update/managedindexmetadata], Caused by: org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (opendistro-ism) within 30s, at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$0(MasterService.java:134) ~[elasticsearch-7.3.2.jar:7.3.2], at java.util.ArrayList.forEach(ArrayList.java:1540) ~[?:?], at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$1(MasterService.java:133) ~[elasticsearch-7.3.2.jar:7.3.2], at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:688) ~[elasticsearch-7.3.2.jar:7.3.2], at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?], at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?], at java.lang.Thread.run(Thread.java:835) ~[?:?]
Not sure if this is anything important but the state of the managed ism indices was “running”.
For a test I removed all policies and managed indices from ISM to see if there is any improvement.
Has anyone else seen issues like this?