Recovering from false positive corrupted shard

doyle · October 26, 2023, 2:13pm

Hi everyone,

My team is experiencing a situation similar to the one described here.

During bulk indexing activity, the index is put into a red state due to an unassigned shard. At the location of the shard, I find a corrupted_* file that prevents the shard from being assigned. However, running CheckIndex on the Lucene index reports that there were no problems detected with the index. By removing the corrupted_* file and restarting the cluster, the shard gets assigned and the index returns to green.

I believe this is being caused by disk latency (sshfs). If anyone has any experience with heavy indexing on sshfs, I’d be interested to hear any tips for improving performance and preventing this all together.

I’m wondering if there is a way to recover this index without restarting the cluster? What happens at cluster initialization that triggers the assignment? Can I execute this without taking the cluster down?

Thanks,
Derek

ckristo · January 3, 2024, 9:22am

Hi,

I encounter the same issue, but without using sshfs. I have a single-node opensearch cluster (for graylog) that runs in a Docker container. Checking the red index with lucence results in a clean index, removing the currupted_ file and restarting the cluster works for me too. I encountered this issue now 3x in the last 2 weeks, while I never encountered it before running graylog for 6 months. Could this be a bug introduced in the last update?

Cheers,
Chris

ckristo · January 3, 2024, 9:25am

If anyone knows how to downgrade an opensearch cluster, I could test this. I quickly tried by switching back to the old version, did not work.

Topic		Replies	Views
Problem with the .opendistro-anomaly-checkpoints index OpenSearch	0	71	October 19, 2024
UNASSIGNED ALLOCATION_FAILED failed shard on node [t8d551UUTkKLOjukvvsKeA]: shard failure, reason [error sending files], failure CorruptIndexException[checksum failed (hardware problem?) : expected=1udzqw9 actual=1i6i2wk (resource=name [_7y8_Lu OpenSearch discuss	1	428	January 4, 2024
Snapshot locking index files General Feedback	3	741	September 2, 2020
Open search index health RED OpenSearch troubleshoot , index-management	1	208	January 31, 2025
Restarting Opensearch Corrupts Cluster OpenSearch discuss , troubleshoot	1	1756	January 25, 2023

Recovering from false positive corrupted shard

Related topics