Index corruption due to missing .kdi file (intermittent red status)

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch: 2.5.0
OS: rocky 8.6

Describe the issue:
We are experiencing intermittent issues where some OpenSearch indexes turn red unexpectedly.

Upon checking the logs, we consistently see errors related to a missing .kdi file, resulting in the following exception:

org.apache.lucene.index.CorruptIndexException: Problem reading index from store
Caused by: java.io.FileNotFoundException: No sub-file with id .kdi found in compound file "_1nf.cfs"

This leads to index corruption and the shard being marked as failed.

Additional context:

  • The issue occurs sporadically, not always reproducible.
  • The data ingested under field “A” includes unstructured content, such as emojis, incomplete/malformed JSON, etc.
  • This field is stored as a string, and we don’t expect this type of content to cause problems normally.
  • The system is not under heavy load when this occurs.
  • OpenSearch version: 2.5

We’re trying to understand:

  • Why would such content cause index corruption?
  • Is this related to Lucene compound file handling (.cfs/.kdi)?
  • Are there known issues with storing malformed JSON or emoji characters in string fields?
  • How can we prevent this kind of corruption in the future?

Any insights or recommendations (e.g., mappings, ingestion pipeline sanitation, disabling compound files, etc.) would be greatly appreciated.

Configuration:

Relevant Logs or Screenshots:

[2025-05-09T13:28:14,663][WARN ][o.o.i.e.Engine           ] [node-10.125.184.124] [nodelog_20250509][0] failed engine [refresh failed source[schedule]]
org.apache.lucene.index.CorruptIndexException: Problem reading index from store(ByteSizeCachingDirectory(HybridDirectory@/test/data/opensearch/nodes/0/indices/zXAu40TjR0uqBcVoOqpwRg/0/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@82076de)) (resource=store(ByteSizeCachingDirectory(HybridDirectory@/test/data/opensearch/nodes/0/indices/zXAu40TjR0uqBcVoOqpwRg/0/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@82076de)))
	at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:164) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:91) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:179) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.ReadersAndUpdates.getLatestReader(ReadersAndUpdates.java:244) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.SoftDeletesRetentionMergePolicy.keepFullyDeletedSegment(SoftDeletesRetentionMergePolicy.java:81) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.FilterMergePolicy.keepFullyDeletedSegment(FilterMergePolicy.java:118) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.FilterMergePolicy.keepFullyDeletedSegment(FilterMergePolicy.java:118) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.FilterMergePolicy.keepFullyDeletedSegment(FilterMergePolicy.java:118) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.ReadersAndUpdates.keepFullyDeletedSegment(ReadersAndUpdates.java:826) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.IndexWriter.isFullyDeleted(IndexWriter.java:5974) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.IndexWriter.publishFlushedSegment(IndexWriter.java:2868) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.IndexWriter.lambda$publishFlushedSegments$24(IndexWriter.java:5815) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.DocumentsWriterFlushQueue.innerPurge(DocumentsWriterFlushQueue.java:117) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.DocumentsWriterFlushQueue.tryPurge(DocumentsWriterFlushQueue.java:148) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.DocumentsWriter.purgeFlushTickets(DocumentsWriter.java:189) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.IndexWriter.publishFlushedSegments(IndexWriter.java:5791) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.IndexWriter$1.afterSegmentsFlushed(IndexWriter.java:431) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:539) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:672) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:570) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:381) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:355) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:345) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:112) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:170) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.opensearch.index.engine.OpenSearchReaderManager.refreshIfNeeded(OpenSearchReaderManager.java:73) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.index.engine.OpenSearchReaderManager.refreshIfNeeded(OpenSearchReaderManager.java:53) ~[opensearch-2.5.0.jar:2.5.0]
	at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.opensearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:431) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:411) ~[opensearch-2.5.0.jar:2.5.0]
	at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:213) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.opensearch.index.engine.InternalEngine.refresh(InternalEngine.java:1844) [opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.index.engine.InternalEngine.maybeRefresh(InternalEngine.java:1823) [opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.index.shard.IndexShard.scheduledRefresh(IndexShard.java:3928) [opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.index.IndexService.maybeRefreshEngine(IndexService.java:974) [opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.index.IndexService$AsyncRefreshTask.runInternal(IndexService.java:1107) [opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:159) [opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) [opensearch-2.5.0.jar:2.5.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.io.FileNotFoundException: No sub-file with id .kdi found in compound file "_1nf.cfs" (fileName=_1nf.kdi files: [_Lucene90_0.dvm, _Lucene90_0.tip, _Lucene90_0.tmd, .fnm, _Lucene90_0.doc, _Lucene90_0.tim, .fdm, _Lucene90_0.dvd, .fdx, .fdt])
	at org.apache.lucene.codecs.lucene90.Lucene90CompoundReader.openInput(Lucene90CompoundReader.java:170) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.codecs.lucene90.Lucene90PointsReader.<init>(Lucene90PointsReader.java:62) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.codecs.lucene90.Lucene90PointsFormat.fieldsReader(Lucene90PointsFormat.java:74) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:151) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	... 42 more
[2025-05-09T13:28:14,679][WARN ][o.o.i.IndexService       ] [node-10.125.184.124] [nodelog_20250509] failed to run task refresh - suppressing re-occurring exceptions unless the exception changes
org.opensearch.index.engine.RefreshFailedEngineException: Refresh failed
	at org.opensearch.index.engine.InternalEngine.refresh(InternalEngine.java:1864) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.index.engine.InternalEngine.maybeRefresh(InternalEngine.java:1823) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.index.shard.IndexShard.scheduledRefresh(IndexShard.java:3928) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.index.IndexService.maybeRefreshEngine(IndexService.java:974) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.index.IndexService$AsyncRefreshTask.runInternal(IndexService.java:1107) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:159) [opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) [opensearch-2.5.0.jar:2.5.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: org.apache.lucene.index.CorruptIndexException: Problem reading index from store(ByteSizeCachingDirectory(HybridDirectory@/test/data/opensearch/nodes/0/indices/zXAu40TjR0uqBcVoOqpwRg/0/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@82076de)) (resource=store(ByteSizeCachingDirectory(HybridDirectory@/test/data/opensearch/nodes/0/indices/zXAu40TjR0uqBcVoOqpwRg/0/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@82076de)))
	at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:164) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:91) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:179) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.ReadersAndUpdates.getLatestReader(ReadersAndUpdates.java:244) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.SoftDeletesRetentionMergePolicy.keepFullyDeletedSegment(SoftDeletesRetentionMergePolicy.java:81) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.FilterMergePolicy.keepFullyDeletedSegment(FilterMergePolicy.java:118) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.FilterMergePolicy.keepFullyDeletedSegment(FilterMergePolicy.java:118) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.FilterMergePolicy.keepFullyDeletedSegment(FilterMergePolicy.java:118) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.ReadersAndUpdates.keepFullyDeletedSegment(ReadersAndUpdates.java:826) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.IndexWriter.isFullyDeleted(IndexWriter.java:5974) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.IndexWriter.publishFlushedSegment(IndexWriter.java:2868) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.IndexWriter.lambda$publishFlushedSegments$24(IndexWriter.java:5815) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.DocumentsWriterFlushQueue.innerPurge(DocumentsWriterFlushQueue.java:117) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.DocumentsWriterFlushQueue.tryPurge(DocumentsWriterFlushQueue.java:148) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.DocumentsWriter.purgeFlushTickets(DocumentsWriter.java:189) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.IndexWriter.publishFlushedSegments(IndexWriter.java:5791) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.IndexWriter$1.afterSegmentsFlushed(IndexWriter.java:431) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:539) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:672) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:570) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:381) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:355) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:345) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:112) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:170) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.opensearch.index.engine.OpenSearchReaderManager.refreshIfNeeded(OpenSearchReaderManager.java:73) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.index.engine.OpenSearchReaderManager.refreshIfNeeded(OpenSearchReaderManager.java:53) ~[opensearch-2.5.0.jar:2.5.0]
	at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.opensearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:431) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:411) ~[opensearch-2.5.0.jar:2.5.0]
	at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:213) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.opensearch.index.engine.InternalEngine.refresh(InternalEngine.java:1844) ~[opensearch-2.5.0.jar:2.5.0]
	... 9 more
Caused by: java.io.FileNotFoundException: No sub-file with id .kdi found in compound file "_1nf.cfs" (fileName=_1nf.kdi files: [_Lucene90_0.dvm, _Lucene90_0.tip, _Lucene90_0.tmd, .fnm, _Lucene90_0.doc, _Lucene90_0.tim, .fdm, _Lucene90_0.dvd, .fdx, .fdt])
	at org.apache.lucene.codecs.lucene90.Lucene90CompoundReader.openInput(Lucene90CompoundReader.java:170) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.codecs.lucene90.Lucene90PointsReader.<init>(Lucene90PointsReader.java:62) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.codecs.lucene90.Lucene90PointsFormat.fieldsReader(Lucene90PointsFormat.java:74) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:151) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:91) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:179) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.ReadersAndUpdates.getLatestReader(ReadersAndUpdates.java:244) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.SoftDeletesRetentionMergePolicy.keepFullyDeletedSegment(SoftDeletesRetentionMergePolicy.java:81) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.FilterMergePolicy.keepFullyDeletedSegment(FilterMergePolicy.java:118) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.FilterMergePolicy.keepFullyDeletedSegment(FilterMergePolicy.java:118) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.FilterMergePolicy.keepFullyDeletedSegment(FilterMergePolicy.java:118) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.ReadersAndUpdates.keepFullyDeletedSegment(ReadersAndUpdates.java:826) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.IndexWriter.isFullyDeleted(IndexWriter.java:5974) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.IndexWriter.publishFlushedSegment(IndexWriter.java:2868) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.IndexWriter.lambda$publishFlushedSegments$24(IndexWriter.java:5815) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.DocumentsWriterFlushQueue.innerPurge(DocumentsWriterFlushQueue.java:117) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.DocumentsWriterFlushQueue.tryPurge(DocumentsWriterFlushQueue.java:148) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.DocumentsWriter.purgeFlushTickets(DocumentsWriter.java:189) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.IndexWriter.publishFlushedSegments(IndexWriter.java:5791) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.IndexWriter$1.afterSegmentsFlushed(IndexWriter.java:431) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:539) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:672) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:570) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:381) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:355) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:345) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:112) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:170) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.opensearch.index.engine.OpenSearchReaderManager.refreshIfNeeded(OpenSearchReaderManager.java:73) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.index.engine.OpenSearchReaderManager.refreshIfNeeded(OpenSearchReaderManager.java:53) ~[opensearch-2.5.0.jar:2.5.0]
	at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.opensearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:431) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:411) ~[opensearch-2.5.0.jar:2.5.0]
	at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:213) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.opensearch.index.engine.InternalEngine.refresh(InternalEngine.java:1844) ~[opensearch-2.5.0.jar:2.5.0]
	... 9 more
[2025-05-09T13:28:14,682][WARN ][o.o.i.c.IndicesClusterStateService] [node-10.125.184.124] [nodelog_20250509][0] marking and sending shard failed due to [shard failure, reason [refresh failed source[schedule]]]
org.apache.lucene.index.CorruptIndexException: Problem reading index from store(ByteSizeCachingDirectory(HybridDirectory@/test/data/opensearch/nodes/0/indices/zXAu40TjR0uqBcVoOqpwRg/0/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@82076de)) (resource=store(ByteSizeCachingDirectory(HybridDirectory@/test/data/opensearch/nodes/0/indices/zXAu40TjR0uqBcVoOqpwRg/0/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@82076de)))
	at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:164) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:91) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:179) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.ReadersAndUpdates.getLatestReader(ReadersAndUpdates.java:244) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.SoftDeletesRetentionMergePolicy.keepFullyDeletedSegment(SoftDeletesRetentionMergePolicy.java:81) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.FilterMergePolicy.keepFullyDeletedSegment(FilterMergePolicy.java:118) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.FilterMergePolicy.keepFullyDeletedSegment(FilterMergePolicy.java:118) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.FilterMergePolicy.keepFullyDeletedSegment(FilterMergePolicy.java:118) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.ReadersAndUpdates.keepFullyDeletedSegment(ReadersAndUpdates.java:826) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.IndexWriter.isFullyDeleted(IndexWriter.java:5974) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.IndexWriter.publishFlushedSegment(IndexWriter.java:2868) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.IndexWriter.lambda$publishFlushedSegments$24(IndexWriter.java:5815) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.DocumentsWriterFlushQueue.innerPurge(DocumentsWriterFlushQueue.java:117) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.DocumentsWriterFlushQueue.tryPurge(DocumentsWriterFlushQueue.java:148) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.DocumentsWriter.purgeFlushTickets(DocumentsWriter.java:189) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.IndexWriter.publishFlushedSegments(IndexWriter.java:5791) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.IndexWriter$1.afterSegmentsFlushed(IndexWriter.java:431) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:539) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:672) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:570) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:381) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:355) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:345) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:112) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:170) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.opensearch.index.engine.OpenSearchReaderManager.refreshIfNeeded(OpenSearchReaderManager.java:73) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.index.engine.OpenSearchReaderManager.refreshIfNeeded(OpenSearchReaderManager.java:53) ~[opensearch-2.5.0.jar:2.5.0]
	at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.opensearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:431) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:411) ~[opensearch-2.5.0.jar:2.5.0]
	at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:213) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.opensearch.index.engine.InternalEngine.refresh(InternalEngine.java:1844) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.index.engine.InternalEngine.maybeRefresh(InternalEngine.java:1823) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.index.shard.IndexShard.scheduledRefresh(IndexShard.java:3928) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.index.IndexService.maybeRefreshEngine(IndexService.java:974) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.index.IndexService$AsyncRefreshTask.runInternal(IndexService.java:1107) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:159) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) [opensearch-2.5.0.jar:2.5.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.io.FileNotFoundException: No sub-file with id .kdi found in compound file "_1nf.cfs" (fileName=_1nf.kdi files: [_Lucene90_0.dvm, _Lucene90_0.tip, _Lucene90_0.tmd, .fnm, _Lucene90_0.doc, _Lucene90_0.tim, .fdm, _Lucene90_0.dvd, .fdx, .fdt])
	at org.apache.lucene.codecs.lucene90.Lucene90CompoundReader.openInput(Lucene90CompoundReader.java:170) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.codecs.lucene90.Lucene90PointsReader.<init>(Lucene90PointsReader.java:62) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.codecs.lucene90.Lucene90PointsFormat.fieldsReader(Lucene90PointsFormat.java:74) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:151) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	... 42 more

@diqmwl Do you monitor Java heap size on each node?
How did you deploy cluster?
How many nodes in the cluster and how much RAM and CPU is assigned to each node?
What is your storage?

@pablo

Thank you for your answer

Response to Environment & Deployment Questions

1. Do you monitor Java heap size on each node?

Yes.
Each node is configured to use 16GB heap (-Xms16g / -Xmx16g), and we monitor usage via OpenSearch APIs and system metrics.
So far, there is no indication of memory pressure or GC issues during the incidents.


2. How did you deploy cluster?

We are operating two independent environments, each with a single-node cluster:

  • Environment A: OpenStack
    • 8 vCPU / 64 GB RAM
    • OpenSearch runs directly on the
    • Encountered frequent index corruption / red status
  • Environment B: Bare-metal server (Physical Host)
    • 16 CPU cores / 64 GB RAM
    • OpenSearch runs on OS directly
    • Only one corruption incident in total over long-term usage

3. How many nodes in the cluster and how much RAM and CPU is assigned to each node?

  • Each cluster has 1 node only
  • Node specs:
    • Heap: 16GB
    • CPU: 8 cores (OpenStack), 16 cores (Physical)
    • RAM: 64GB in both
  • No other services are sharing the host – OpenSearch is the sole workload on each node.

4. What is your storage?

Both environments use HDD-based local storage


@diqmwl Thank you for sharing your environment details.
Do you monitor your HDD? Is it some kind of NAS 5400 drive or a desktop 7200?