OpenSearch Insufficient Memory Regression after 2.19 Upgrade

borrowmayo · April 28, 2025, 8:41pm

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

2.19; regression introduced between 2.16 and 2.19.

Describe the issue:

After upgrading from 2.16 to 2.19, our OpenSearch cluster has been periodically restarting (approximately every day) with the following error:

# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 2097152 bytes. Error detail: AllocateHeap
# An error report file with more information is saved as:
# /usr/share/opensearch/hs_err_pid10.log

JVM memory is within normal bounds so we suspect this failure is due to the max map count being reached. We’ve been setting vm.max_map_count to 262144 via sysctl and this has previously not caused issues. Have there been any changes between 2.16 and 2.19 that would have caused a regression here?

Configuration:

Cluster running 4 x {64 cpu, 512GB} nodes backed by GCP persistent disks. 102GB heap.

Relevant Logs or Screenshots:

Kubernetes memory is well below limit. We are unable to get the hs_error_log file that is dumped at crash time.

Let me know if there’s any other information that could help diagnose the issue.

borrowmayo · May 7, 2025, 1:17am

Digging further, via cat /proc/10/maps, 246K of the 252K open maps are deleted .dvd files. Is this indicative of a lucene issue?

zohser23 · May 7, 2025, 9:32am

I am running into the same issue after updating from 2.17.1 to 2.19.1. I was running Java 17 and updated to Java 21. This stabilized my cluster a bit, but I’m still running into that issue after some hours.

I also figured out that most of the open maps are deleted files. So I set vm.max_map_count far higher than default to “solve” that issue.

borrowmayo · May 7, 2025, 4:54pm

I asked the lucene mailing list here: https://lists.apache.org/thread/4lqh5w9mxm4ffr5kxlxhh06d9gdv3gto. Will try what was suggested and get back to this thread.

borrowmayo · May 13, 2025, 1:40am

No luck with their suggestion.

borrowmayo · June 11, 2025, 11:48pm

Closing the loop on this, setting the Java flag -Dorg.apache.lucene.store.MMapDirectory.sharedArenaMaxPermits=1 seemed to have helped. This returns the behaviour to what it was in Lucene 9.10

Topic		Replies	Views
Memory Leak/Garbage Colector issues OpenSearch	12	372	March 21, 2025
Open Search Data Insert issue OpenSearch troubleshoot	4	463	July 6, 2023
OpenSearch 2.15.0 increased jvm heap usage OpenSearch troubleshoot	5	424	July 29, 2024
FATAL ERROR: MarkCompactCollector: young object promotion failed Allocation failed - JavaScript heap out of memory OpenSearch Dashboards	6	1599	January 17, 2025
java.lang.InternalError: a fault occurred in an unsafe memory access operation OpenSearch troubleshoot	7	481	May 28, 2024

OpenSearch Insufficient Memory Regression after 2.19 Upgrade

Related topics