Index State Management can not aquire locks

Hello,

I have an issue with ISM in one of our clusters. For a while nothing has been working for no apparent reason. There is no indication whatsoever in the UI about what is going on, all I see is that the Job status is “Running” in Kibana.

After finally figuring out how to enable debug logging for Index State Management I see a bunch of these messaged for each index.

[2022-01-05T12:50:08,155][DEBUG][c.a.o.i.i.ManagedIndexRunner] [es-master-0] Could not acquire lock for <index-name-here>

and for whoever might be googling this, run this in DevTools to enable debug logs for ISM

PUT /_cluster/settings
{
  "persistent" : {
    "logger.com.amazon.opendistroforelasticsearch.indexmanagement" : "DEBUG"
  }
}

Should be this part here:

Essentially it’s trying to acquire a lock on the job to be able to run it (and ensuring no other node runs it at the same time). Usually it failing is a temporary thing (some previous lock not released successfully and waiting for the TTL to expire).

If it’s continuously failing though for a long period of time (hours) then I would assume instead that something is going on w/ writing to the internal job scheduler lock index. Either:
a) There is no lock index and it’s trying to create one but can’t because of some issue in the cluster, you can check with cat indices to see if there is some job scheduler lock index in the cluster.
b) There is an issue w/ writing to the job scheduler lock index, check for any index blocks and try writing to the index yourself w/ some dummy doc to see if there’s an issue.
c) See if there are any other logs related to the lock index to help pinpoint further.