IndexCorruption and the size of /dev/shm

huckebein79 · April 16, 2024, 1:24pm

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
2.4.1

Describe the issue:

Hi

I created OpenSearch cluster with serveral VMs.
Each node are working on docker container in VM.
Also I set hot nodes with local SSD and cold nodes with HDD.

When data were moved from hot → cold or rebalanced among cold nodes, some shards were sometimes corrupted with checksum fail and cluster status becames RED.

I tried to resolve problems while changing configuration, and when I increased the size of /dev/shm in docker container (from 64M to 3G), corruption disappeared and all things became normal.

I wonder the relationship between shard corruption and the size of /dev/shm

Configuration:

VMs (CPU/RAM/DISK)

master nodes: 4CPUs/4GB/50GB * 3 (3GB is allocated in docker)
hot nodes: 4CPUs/8GB/200GB (SSD) * 2 (6GB is allocated in docker)
cold nodes: 2CPUs/8GB/100GB (SSD) and 1.5TB (HDD) * 3 (6GB is allocated in docker)

All nodes are swapped off with swapoff -a

Relevant Logs or Screenshots:

Gsmitt · April 24, 2024, 4:05am

Hey

From what I understand is to increase the shared memory (SHM) allocation on container running workloads. The default is 64MB so you needed more. Basically your running out of resources. As any Virtual Machine/Container/Pod/K8’s it shares the memory from the host node.

Topic		Replies	Views
Ideal Opensearch Cluster? OpenSearch	3	795	June 7, 2023
Improve the data nodes and shards configuration for performance OpenSearch troubleshoot , configure	7	4435	June 7, 2023
Open Search Data Insert issue OpenSearch troubleshoot	4	464	July 6, 2023
How to reduce the node's heap size? OpenSearch	4	1114	September 12, 2023
Need to know impact of increasing primary shards for existing index OpenSearch	1	38	September 20, 2024

IndexCorruption and the size of /dev/shm

Related topics