Snapshot locking index files

johnthom · August 14, 2020, 1:52pm

I have a 1.7.0 single node cluster with approx. 50 indices with total size of 250GB running on a Windows 2012 R2 server. Hourly snapshots are scheduled for the entire cluster (including cluster state).

Every few days, during a snapshot, one of the indices turns red and reports that a Lucene commit has failed due to a file being “in use”. Normally I can clear this condition by closing and reopening the index. However today this did not clear the condition and my only recourse was to delete the index and lose a weeks worth of log data.

What can I do to debug this issue and identify the root cause? Is there anything else I can attempt to try to save the index and prevent having to delete it?

johnthom · August 17, 2020, 1:49pm

Update: I have stopped taking snapshots on this cluster and so far have not seen the issue return.

johnthom · August 24, 2020, 1:05pm

update #2: I spoke too soon. I again had a shard fail even without snapshots running. There are 50 indices on this node and the issue only happens to one index.

johnthom · September 2, 2020, 2:15pm

update #3: My issue was related to backup software (CommVault) that was locking large index segment files while Elasticsearch was attempting to commit them. Issue resolved.

Topic		Replies	Views
Recovering from false positive corrupted shard General Feedback troubleshoot	2	618	January 3, 2024
Indexing Causing Index to Enter Red State k-NN	12	1301	September 4, 2020
Problem with the .opendistro-anomaly-checkpoints index OpenSearch	0	64	October 19, 2024
Troubleshoot Snapshots repository OpenSearch troubleshoot	0	284	October 11, 2023
New index went to RED after cleaning up old one OpenSearch troubleshoot , index-management	0	505	October 23, 2023

Snapshot locking index files

Related topics