Snapshots Tipping Over Data Nodes

BurntBaboon · May 15, 2024, 11:58pm

Howdy!

We’ve noticed a peculiar issue where, rather sporadically, data nodes drop out from our cluster when our nightly snapshot policy executes. The interesting tidbit is that this is only happening for one out of a handful of clusters.

Currently, we haven’t gleaned much from the OpenSearch logs. The only noticeable errors being GC did not bring memory usage down and failed to list shard. However, if are contributing to the issue, we’re not entirely sure how to rectify the problem and why it’s only happening in one of our clusters.

From,
DH

AmiSMB · May 24, 2024, 1:25pm

We are seeing the same thing happening where the backup is starting and then data nodes are dropping and then we have indexes that are unavailable.

Nilushan · July 4, 2024, 11:20am

@BurntBaboon ,
I have also experienced this issue with snapshots. This happened because the Java Heap went out of memory when taking snapshots. To confirm if it is the same issue that you are experiencing, could you please provide the following information?

Can you share the response body of Get Snapshot - OpenSearch Documentation API call?. Need to check the failures and shards fields as it will provide a clue
Do you see java.lang.OutOfMemoryError: Java heap space errors in the OpenSearch data node logs? This should get printed after the GC did not bring memory usage down errors

Topic		Replies	Views
Heap space in data nodes go out of memory when taking snapshots OpenSearch	5	556	July 12, 2024
Troubleshoot Snapshots repository OpenSearch troubleshoot	0	288	October 11, 2023
Snapshot Issue Connection pool shut down OpenSearch troubleshoot	1	61	July 15, 2024
Dropping 1 node of cluster results unstable cluster and all shards being unassigned OpenSearch troubleshoot	3	721	December 2, 2024
How to reduce the node's heap size? OpenSearch	4	1089	September 12, 2023

Snapshots Tipping Over Data Nodes

Related topics