I have a 1.7.0 single node cluster with approx. 50 indices with total size of 250GB running on a Windows 2012 R2 server. Hourly snapshots are scheduled for the entire cluster (including cluster state).
Every few days, during a snapshot, one of the indices turns red and reports that a Lucene commit has failed due to a file being “in use”. Normally I can clear this condition by closing and reopening the index. However today this did not clear the condition and my only recourse was to delete the index and lose a weeks worth of log data.
What can I do to debug this issue and identify the root cause? Is there anything else I can attempt to try to save the index and prevent having to delete it?