Opensearch warm nodes failing

Versions: opensearch 3.1.0 warm nodes

Describe the issue: Warm nodes running out of memory. warm node are fine until a serach is execute that leverages the searchable snapshots then they crash. both nodes at the same time.

is there a limit to the amount of snapshots?

do i need more warm nodes?

Configuration:21 node cluster with 2 warm nodes

6 physical nodes with 500GB of ram
MiB Mem : 515078.9 total, 8385.6 free, 146265.0 used, 365610.9 buff/cache

Relevant Logs or Screenshots:

There is insufficient memory for the Java Runtime Environment to continue.

# Native memory allocation (malloc) failed to allocate 1048576 bytes. Error detail: AllocateHeap

# Possible reasons:

# The system is out of physical RAM or swap space

# Possible solutions:

# Reduce memory load on the system

# Increase physical memory or swap space

# Check if swap backing store is full

# Decrease Java heap size (-Xmx/-Xms)

# Decrease number of Java threads

# Decrease Java thread stack sizes (-Xss)

# Set larger code cache with -XX:ReservedCodeCacheSize=

# This output file may be truncated or incomplete.

#

# Out of Memory Error (allocation.cpp:44), pid=745370, tid=745686

#

# JRE version: OpenJDK Runtime Environment Temurin-21.0.7+6 (21.0.7+6) (build 21.0.7+6-LTS)

# Java VM: OpenJDK 64-Bit Server VM Temurin-21.0.7+6 (21.0.7+6-LTS, mixed mode, sharing, tiered, compressed class ptrs, g1 gc, linux-amd64

)

# Core dump will be written. Default location: Core dumps may be processed with “/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h” (

or dumping to /usr/share/opensearch/core.745370)

#

--------------- S U M M A R Y ------------

Command Line: -Xshare:auto -Dopensearch.networkaddress.cache.ttl=60 -Dopensearch.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch

-Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -XX:+ShowCodeDetailsInExceptionMessa

ges -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.numDirec

tArenas=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.locale.providers=SPI,CLDR -Xms60g -Xmx60g -Djava.net.preferIP

v4Stack=true -XX:+UseG1GC -XX:G1ReservePercent=25 -XX:InitiatingHeapOccupancyPercent=30 -Djava.io.tmpdir=/tmp/opensearch-88744909110657501

54 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/opensearch -XX:ErrorFile=/var/log/opensearch/hs_err_pid%p.log -Xlog:gc*,gc+ag

e=trace,safepoint:file=/var/log/opensearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m --add-modules=jdk.incubator.vector -javaagent

:agent/opensearch-agent.jar --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED -Dclk.tck=100 -Djdk.attach.allowAttach

Self=true -Djava.security.policy=file:///etc/opensearch/opensearch-performance-analyzer/opensearch_security.policy --add-opens=jdk.attach/

sun.tools.attach=ALL-UNNAMED -XX:MaxDirectMemorySize=32212254720 -Dopensearch.path.home=/usr/share/opensearch -Dopensearch.path.conf=/etc/

opensearch-warm -Dopensearch.distribution.type=rpm -Dopensearch.bundled_jdk=true org.opensearch.bootstrap.OpenSearch -p /var/run/opensearc

h-warm/opensearch.pid --quiet

Host: Intel(R) Xeon(R) 6740E, 192 cores, 503G, AlmaLinux release 9.6 (Sage Margay)

Time: Tue Apr 7 11:08:20 2026 EDT elapsed time: 275.028502 seconds (0d 0h 4m 35s)

--------------- T H R E A D ---------------

@GuyS Can you provide the opensearch.yml file for the warm node? Can you also confirm that you have set vm.max_map_count the described here

cat /etc/sysctl.conf

# sysctl settings are defined through files in

# /usr/lib/sysctl.d/, /run/sysctl.d/, and /etc/sysctl.d/.

#

# Vendors settings live in /usr/lib/sysctl.d/.

# To override a whole file, create a new file with the same in

# /etc/sysctl.d/ and put new settings there. To override

# only specific settings, add a file with a lexically later

# name in /etc/sysctl.d/ and put new settings there.

#

# For more information, see sysctl.conf(5) and sysctl.d(5).

vm.max_map_count = 262144

L ~ % cat opensearch.yml

cluster.name: mccoy

node.name: mccoy05-warm

node.roles: [ warm ]

node.attr.rack: cnl

node.attr.zone: cnl

path.data: /opensearch_warm

path.logs: /var/log/opensearch-warm

path.repo: [“/mnt/opensearch_snapshots”]

network.host: mccoy05.bc.edu

http.port: 9230

transport.port: 9330

discovery.seed_hosts: [“mccoy01.bc.edu:9300”, “mccoy02.bc.edu:9310”, “mccoy03.bc.edu:9310”, “mccoy04.bc.edu:9310”, “mccoy05.bc.edu:9310”, “mccoy06.bc.edu:9310”]

cluster.initial_cluster_manager_nodes: [“mccoy01-manager”, “mccoy02-manager”, “mccoy04-manager”]

cluster.routing.allocation.awareness.attributes: rack

cluster.routing.allocation.enable: all

bootstrap.system_call_filter: false

plugins.security.disabled: false

plugins.security.ssl.transport.pemcert_filepath: /etc/opensearch-warm/node5.pem

plugins.security.ssl.transport.pemkey_filepath: /etc/opensearch-warm/node5-key.pem

plugins.security.ssl.transport.pemtrustedcas_filepath: /etc/opensearch-warm/root-ca.pem

plugins.security.ssl.http.enabled: true

plugins.security.ssl.http.pemcert_filepath: /etc/opensearch-warm/node5.pem

plugins.security.ssl.http.pemkey_filepath: /etc/opensearch-warm/node5-key.pem

plugins.security.ssl.http.pemtrustedcas_filepath: /etc/opensearch-warm/root-ca.pem

plugins.security.allow_default_init_securityindex: true

plugins.security.authcz.admin_dn:

  • ‘CN=A,OU=UNIT,O=ORG,L=BRIGHTON,ST=MASSACHUSETTS,C=US’

plugins.security.nodes_dn:

  • ‘CN=mccoy0*.bc.edu,OU=UNIT,O=ORG,L=BRIGHTON,ST=MASSACHUSETTS,C=US’

plugins.security.audit.type: internal_opensearch

plugins.security.enable_snapshot_restore_privilege: true

plugins.security.check_snapshot_restore_write_privileges: true

plugins.security.restapi.roles_enabled: [“all_access”, “security_rest_api_access”]

Thank you for the help

i increased vm.max_map_count from 262144 to 1048576.

the queiery that where crashing my warm servers are no longer crashing my warm servers. So i think i might have fixed it.