Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch/Dashboard 3.1.0
Describe the issue:
When taking a snapshot, the cluster performance become very slow.
during this time, observe heap.percent is around 50% to 60%;
however, ram.percent always at 99% or 100%.
check pod log, see many records relate to memory insufficient.
such issue doesn’t exist in opensearch 2.19
Configuration:
snapshotRepositories:
# ceph
- name: CCEE_EUDE1_CEPH_S3_ISMPOLICY
type: s3
settings:
bucket: ssdl-logging-opensearch-s3interface-snapshot-ismpolicy
region: eu-de-1
client: eude1ceph
disable_chunked_encoding: "true"
compress: "true"
storage_class: "standard"
Relevant Logs or Screenshots:
[2025-07-15T03:45:21,689][WARN ][o.o.t.NativeMessageHandler] [ssdl-app-logging-opensearch-data-2] handling inbound transport message [InboundMessage{Header{NATIVE}{121388}{3.1.0}{860586}{true}{false}{false}{false}{indices:data/write/bulk[s]}}] took [15337ms] which is above the warn threshold of [5000ms]
[2025-07-15T03:45:48,699][WARN ][o.o.t.TransportService ] [ssdl-app-logging-opensearch-data-2] Received response for a request that has timed out, sent [25013ms] ago, timed out [0ms] ago, action [internal:coordination/fault_detection/leader_check], node [{ssdl-app-logging-opensearch-manager-0}{tE7UkyCwTqqBihmjPC96zQ}{yI78U3BATJyXgMn8zSkM9w}{100.104.8.73}{100.104.8.73:9300}{m}{shard_indexing_pressure_enabled=true}], id [3640105]
[2025-07-15T03:45:48,720][INFO ][o.o.c.s.ClusterApplierService] [ssdl-app-logging-opensearch-data-2] removed {{ssdl-app-logging-opensearch-data-3}{F1Nk9N8BSqayno0AioJmRg}{Sj8eCMaCR1C70Dd75uXaFg}{100.104.10.129}{100.104.10.129:9300}{d}{shard_indexing_pressure_enabled=true}}, term: 99, version: 116005, reason: ApplyCommitRequest{term=99, version=116005, sourceNode={ssdl-app-logging-opensearch-manager-0}{tE7UkyCwTqqBihmjPC96zQ}{yI78U3BATJyXgMn8zSkM9w}{100.104.8.73}{100.104.8.73:9300}{m}}
[2025-07-15T03:46:46,719][WARN ][i.n.c.AbstractChannelHandlerContext] [ssdl-app-logging-opensearch-data-2] An exception 'OpenSearchSecurityException[The provided TCP channel is invalid.]; nested: DecoderException[javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)]; nested: SSLHandshakeException[Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)]; nested: BadPaddingException[Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)];' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception: