Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
2.4.1
Describe the issue:
Hi
I created OpenSearch cluster with serveral VMs.
Each node are working on docker container in VM.
Also I set hot nodes with local SSD and cold nodes with HDD.
When data were moved from hot → cold or rebalanced among cold nodes, some shards were sometimes corrupted with checksum fail and cluster status becames RED.
I tried to resolve problems while changing configuration, and when I increased the size of /dev/shm in docker container (from 64M to 3G), corruption disappeared and all things became normal.
I wonder the relationship between shard corruption and the size of /dev/shm
Configuration:
VMs (CPU/RAM/DISK)
- master nodes: 4CPUs/4GB/50GB * 3 (3GB is allocated in docker)
- hot nodes: 4CPUs/8GB/200GB (SSD) * 2 (6GB is allocated in docker)
- cold nodes: 2CPUs/8GB/100GB (SSD) and 1.5TB (HDD) * 3 (6GB is allocated in docker)
All nodes are swapped off with swapoff -a
Relevant Logs or Screenshots: