Possible issue?

jasonrojas · January 16, 2020, 2:42pm

Hi, we have a indexing heaving cluster that for the last few weeks since upgrading to 1.3.0 has been experiencing timeouts a few times a week. During the period where indexing stops I see this in the logs:

org.elasticsearch.transport.RemoteTransportException: [master][1.1.1.1:9300][cluster:admin/ism/update/managedindexmetadata], Caused by: org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (opendistro-ism) within 30s, at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$0(MasterService.java:134) ~[elasticsearch-7.3.2.jar:7.3.2], at java.util.ArrayList.forEach(ArrayList.java:1540) ~[?:?], at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$1(MasterService.java:133) ~[elasticsearch-7.3.2.jar:7.3.2], at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:688) ~[elasticsearch-7.3.2.jar:7.3.2], at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?], at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?], at java.lang.Thread.run(Thread.java:835) ~[?:?]

Not sure if this is anything important but the state of the managed ism indices was “running”.
For a test I removed all policies and managed indices from ISM to see if there is any improvement.
Has anyone else seen issues like this?

dbbaughe · January 16, 2020, 5:14pm

Hi @jasonrojas,

How many managed indices do you have in ISM?
When you removed the policies/managed indices from ISM did you see the timeouts stop or were they still happening?

Thanks,
Drew

jasonrojas · January 17, 2020, 2:28pm

I think I had manually applied the policy to about 30 indices.

The cluster had no issues last night so only time will tell if this was the issue.

dbbaughe · January 17, 2020, 6:48pm

@jasonrojas

30 indices wouldn’t be enough to cause master cluster state queue to be backed up purely from ISM.
If you could also answer these questions, might be able to help us pinpoint and replicate:

How much data do you have in the cluster
How many indices/shards do you have
What type of instances are you using
What is the ingestion throughput you average

Also is the timeout exceptions only occurring for ISM cluster events or other cluster events too?

Thanks,
Drew

Topic		Replies	Views
ISM / index time out Index Management troubleshoot , index-management	2	189	May 26, 2024
Failed Transition Index Management	24	7320	November 26, 2020
Timeout when listing indices in the dashboard OpenSearch troubleshoot , configure , index-management	6	433	August 10, 2024
Cluster is frequently lost with master not discovered error and Timeout error Open Source Elasticsearch and Kibana	1	1647	February 7, 2023
Index Policy Still initializing Index Management	20	5049	July 1, 2021

Possible issue?

Related topics