java.lang.InternalError: a fault occurred in an unsafe memory access operation

Ivan.A · May 13, 2024, 7:39am

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

OpenSearch v2.13.

Describe the issue:
We are currently running two OpenSearch clusters on the same Kubernetes (AKS) cluster. The first cluster,with three master nodes and two data nodes works perfect. But, the other cluster which has three master nodes and one data node at the beginning works fine but after some hours of a reindex operation they crash with the error:

[2024-05-10T16:05:09,493][ERROR][o.o.b.OpenSearchUncaughtExceptionHandler] [opensearch-cluster-ops-master-1] fatal error in thread [opensearch[opensearch-cluster-ops-master-1][warmer][T#69]], exiting
java.lang.InternalError: a fault occurred in an unsafe memory access operation
	at org.apache.lucene.codecs.lucene90.IndexedDISI.advanceBlock(IndexedDISI.java:486) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
	at org.apache.lucene.codecs.lucene90.IndexedDISI.advance(IndexedDISI.java:443) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
	at org.apache.lucene.codecs.lucene90.IndexedDISI.nextDoc(IndexedDISI.java:531) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
	at org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$SparseNumericDocValues.nextDoc(Lucene90DocValuesProducer.java:458) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
	at org.apache.lucene.util.BitSet.or(BitSet.java:110) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
	at org.apache.lucene.util.FixedBitSet.or(FixedBitSet.java:326) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
	at org.apache.lucene.util.BitSet.of(BitSet.java:42) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
	at org.opensearch.index.cache.bitset.BitsetFilterCache.bitsetFromQuery(BitsetFilterCache.java:127) ~[opensearch-2.13.0.jar:2.13.0]
	at org.opensearch.index.cache.bitset.BitsetFilterCache.lambda$getAndLoadIfNotPresent$1(BitsetFilterCache.java:173) ~[opensearch-2.13.0.jar:2.13.0]
fatal error in thread [opensearch[opensearch-cluster-ops-master-1][warmer][T#69]], exiting
java.lang.InternalError: a fault occurred in an unsafe memory access operation

Configuration:

Same Helm chart so I don’t know why are crashing after X hours. The difference is half the amount of resources of the nice cluster. The bad cluster has:

resources:
  requests:
    cpu: 4
    memory: 10Gi
  limits:
    cpu: 4
    memory: 10Gi

We are reindexing 20M in 6 days. And some of the nodes crash. There is a ton of space left on the volumes.

Relevant Logs or Screenshots:

[2024-05-10T16:05:09,493][ERROR][o.o.b.OpenSearchUncaughtExceptionHandler] [opensearch-cluster-ops-master-1] fatal error in thread [opensearch[opensearch-cluster-ops-master-1][warmer][T#69]], exiting
java.lang.InternalError: a fault occurred in an unsafe memory access operation
	at org.apache.lucene.codecs.lucene90.IndexedDISI.advanceBlock(IndexedDISI.java:486) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
	at org.apache.lucene.codecs.lucene90.IndexedDISI.advance(IndexedDISI.java:443) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
	at org.apache.lucene.codecs.lucene90.IndexedDISI.nextDoc(IndexedDISI.java:531) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
	at org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$SparseNumericDocValues.nextDoc(Lucene90DocValuesProducer.java:458) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
	at org.apache.lucene.util.BitSet.or(BitSet.java:110) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
	at org.apache.lucene.util.FixedBitSet.or(FixedBitSet.java:326) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
	at org.apache.lucene.util.BitSet.of(BitSet.java:42) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
	at org.opensearch.index.cache.bitset.BitsetFilterCache.bitsetFromQuery(BitsetFilterCache.java:127) ~[opensearch-2.13.0.jar:2.13.0]
	at org.opensearch.index.cache.bitset.BitsetFilterCache.lambda$getAndLoadIfNotPresent$1(BitsetFilterCache.java:173) ~[opensearch-2.13.0.jar:2.13.0]
fatal error in thread [opensearch[opensearch-cluster-ops-master-1][warmer][T#69]], exiting
java.lang.InternalError: a fault occurred in an unsafe memory access operation

Ivan.A · May 22, 2024, 11:12am

Upgrading to v2.14 didn´t fix this issue…

reta · May 22, 2024, 2:05pm

@Ivan.A apologies for the delayed reply, could you please share what JDK version is being used by your OpenSearch installation? (It is printed at the startup time). Also, could you please share if possible the process memory usage (top command), specifically RSS. Thank you.

Ivan.A · May 23, 2024, 9:27am

Thanks for your reply.

We are using the OpenJDK VM/21.0.3/21.0.3+9-LTS. Regarding the “top” command, we are using the OOTB Docker Image from the Helm Chart on the artifacthub and does not come with “top”, “dsmeg”, “htop”… It’s really bare.

¿Any ideas?

reta · May 23, 2024, 5:36pm

Got it, how do you monitor the memory consumption of the container? OOM killer does not kick in (seems like a good sign), but have a hard time understanding what conditions may lead to a fault occurred in an unsafe memory access operation, one hypothesis is being low on free memory.

Ivan.A · May 24, 2024, 7:39am

At the moment we don’t have any monitoring system implemented, we are working on it… Yeah, it doesn’t seem an OOM. By the way, we tried some days ago and increased the resources to:

opensearchJavaOpts: “-Xmx10g -Xms10g”

resources:
requests:
cpu: 5
memory: 24Gi
limits:
cpu: 5
memory: 24Gi

We still have some pod restarts but the indices are not corrupted as we increased the number of primary shards and replicas.

reta · May 24, 2024, 4:55pm

Oh, thank you for insights, seems like there is correlation with available memory, may be once you have an ability to scrape the process / container metrics, we could find out the circumstances when it happens.

Ivan.A · May 28, 2024, 7:30am

Hi, we changed the StorageClass as we saw that we were using some Azure Disk class which was an HDD disk… We changed to the default one which is an SSD and all the pods are up without any restart I will update if something changes but it seems the volumes.

Topic		Replies	Views
Is there any prefered method to monitor Opensearch regarding unsafe memory access faults? OpenSearch	2	131	July 3, 2024
OpenSearch Insufficient Memory Regression after 2.19 Upgrade OpenSearch	5	182	June 11, 2025
{ "statusCode": 500, "error": "Internal Server Error", "message": "An internal server error occurred." } Security	14	451	October 23, 2024
WARNING: An illegal reflective access operation has occurred Security discuss	1	1184	March 14, 2022
Memory Leak/Garbage Colector issues OpenSearch	12	342	March 21, 2025

java.lang.InternalError: a fault occurred in an unsafe memory access operation

Related topics