IndexCorruption and the size of /dev/shm

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
2.4.1

Describe the issue:

Hi

I created OpenSearch cluster with serveral VMs.
Each node are working on docker container in VM.
Also I set hot nodes with local SSD and cold nodes with HDD.

When data were moved from hot → cold or rebalanced among cold nodes, some shards were sometimes corrupted with checksum fail and cluster status becames RED.

I tried to resolve problems while changing configuration, and when I increased the size of /dev/shm in docker container (from 64M to 3G), corruption disappeared and all things became normal.

I wonder the relationship between shard corruption and the size of /dev/shm

Configuration:

VMs (CPU/RAM/DISK)

  • master nodes: 4CPUs/4GB/50GB * 3 (3GB is allocated in docker)
  • hot nodes: 4CPUs/8GB/200GB (SSD) * 2 (6GB is allocated in docker)
  • cold nodes: 2CPUs/8GB/100GB (SSD) and 1.5TB (HDD) * 3 (6GB is allocated in docker)

All nodes are swapped off with swapoff -a

Relevant Logs or Screenshots:

Hey

From what I understand is to increase the shared memory (SHM) allocation on container running workloads. The default is 64MB so you needed more. Basically your running out of resources. As any Virtual Machine/Container/Pod/K8’s it shares the memory from the host node.