The above comments are very interesting as I am facing a similar issue. Can I please first summarise what I’ve understood to ensure I am thinking about things correctly and then I will go onto my problem.
My understanding is that the message “Previous action was not able to update IndexMetaData” is indicating that the outcome of the last operation that was performed on the index is unknown. The operation may have succeeded or failed but the only thing that’s known is that something (e.g. a timeout) prevented the metadata being written. When an index policy is in this state then it will no longer transition automatically.
My situation is, I have just moved from a single massive vanilla Elasticsearch cluster to a few smaller (OpenDistro) clusters and I’m seeing issues where transitions aren’t occurring and it’s serious enough that if left unchecked it could cause real problems.
Each cluster has ~1.5k indices and ingests a few TB’s of data a day. Some indices are very quiet while others are very busy. Indices are distributed to between 1 and 20 primary shards however I’ve got ISM configured so that when an index rolls over the shards should all be around 30GB, i.e. a 1 shard index rolls at 30GB and a 10 shard one at 300GB.
What I’ve been finding is that some shards are not rolling over and the most common message I’m seeing is “Previous action was not able to update IndexMetaData”. I have been manually rolling these over using the API described here. I’ve left some for a while to see if they would eventually roll over but they haven’t, even when they were 3x the rollover size.
Manually retrying isn’t going to work for us due to the number of indices and we can’t just leave things because it wouldn’t take long for some of the shards to fill a disk. At the moment I’m looking for advise on how we resolve the issue.
Do you think the clusters are too large and this (for some reason) is causing timeouts when writing the metadata? I’ve run into an other issue, posted here so not sure if this size of cluster has been tested against much? If you think this is likely to be the case, what is the maximum size you would recommend for a cluster?
Sorry for the long post