I have a simple index management policy defined that deletes an index after 3 days. Over the weekend, I’ve noticed that this policy failed for a small number of the indexes to which it applied but worked for several others. In the Kibana Index Management panel, on the Managed Indices page, the status for the indices for which the policy failed is shown as “failed”. When I click on the link in the INFO column, the dialog displayed shows “message”: “Previous action was not able to update IndexMetaData.”
Looking through the Elasticsearch log messages, I see cluster of messages like the following, which I think correspond to the failed transition attempt:
[2020-03-23T01:16:13,519][WARN ][c.a.o.i.ManagedIndexRunner] [odfe-opendistro-es-data-2] Operation failed. Retrying in 250ms.\n
org.elasticsearch.transport.RemoteTransportException: [odfe-opendistro-es-master-1][10.254.2.161:9300][cluster:admin/ism/update/managedindexmetadata]\n
Caused by: org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (opendistro-ism) within 30s\n
\tat org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$0(MasterService.java:143) ~[elasticsearch-7.4.2.jar:7.4.2]\n
\tat java.util.ArrayList.forEach(ArrayList.java:1540) ~[?:?]\n
\tat org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$1(MasterService.java:142) ~[elasticsearch-7.4.2.jar:7.4.2]\n
\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) ~[elasticsearch-7.4.2.jar:7.4.2]\n
\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]\n
\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]\n
\tat java.lang.Thread.run(Thread.java:835) ~[?:?]\n
Is it possible to increase the time allotted to process the index since the current 30 second limit doesn’t appear to be sufficient?
Will Index Management plugin continue trying? The note indicates that it will and I see multiple clusters of these notes but some of the indexes remain. So, does it “give up” at some point or should I assume it will eventually “catch up”?
The list of of indices shown in the plugin truncates the Index name field, so it is hard for me to be sure which index is which, but the affected indexes appear to be “small”, i.e. a few hundred MB, so it’s unclear why the request would be timing out. Is there additional logging that can be enabled that might be helpful?
I have been successful with manually retrying the action by clicking the RETRY POLICY button on the Managed Indices page of the IM plugin. But, obviously, I defined a policy so this would all be automated.
Can anyone offer some helpful advice/guidance?
Thanks!