High JVM memory pressure [Data nodes]

Versions OpenSearch: 2.19 / Elasticsearch 7.10 / Lucene: 9.12.1

Describe the issue: High JVM usage

Hello everyone,

I’d really appreciate the community’s help in stabilizing the health of my OpenSearch cluster.

Currently, I have 3 data nodes and a total of 746 shards. I’m experiencing JVM usage spikes reaching up to 94%, and I’m trying to bring this under control. I’ve already been working on reducing the number of shards, but it hasn’t resolved the issue so far.

Has anyone faced a similar situation or can suggest a more definitive approach to address this problem?

Configuration: 3 master nodes: (m7g.medium.search / 2vcpu 4gb ram) and 3 data nodes: (8 cpu e 32gb ram)

Relevant Logs or Screenshots:

@aidevelop46 Is there any pattern in those spikes? Do you know when those spikes started?
How frequent those spikes appear? Do you see more spikes at a specific time of the day?

@aidevelop46

When JVM usage spikes up to 94%, how long does the heap stay at that level? Are GCs triggered frequently during these spikes in an attempt to reclaim occupied objects?

You can check the GC logs at /usr/share/opensearch/logs/gc.log to verify this.