Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Opensearch Version 2.6.0
Describe the issue:
I launched a single-node Opensearch on EC2 instance and the service stopped with an this log level WARN and ERROR
[2023-10-16T08:40:21,922][WARN ][o.o.m.f.FsHealthService ] [<node_name>] health check of [/var/lib/opensearch/nodes/0] took [223936ms] which is above the warn threshold of [5s]
[2023-10-16T08:40:22,067][ERROR][o.o.m.f.FsHealthService ] [<node_name>] health check of [/var/lib/opensearch/nodes/0] failed, took [223936ms] which is above the healthy threshold of [1m]
After restarting the EC2 instance, the opensearch resumed again.
Looking at this elasticsearch forum page, who seems to have had a similar problem, it seems that the disks are performing very poorly and perhaps overloaded.
Based on the cat allocation opensearch documentation, I checked the allocation of disk space with this command
GET _cat/allocation?v=true&h=node,shards,disk.*
and received this result
node shards disk.indices disk.used disk.avail disk.total disk.percent
<node_name> 230 7.9gb 39.1gb 40.8gb 79.9gb 48
UNASSIGNED 209
the node seems to be using 48% of disk space.
First, is this the way to find out the cause of the problem? What are the steps I need to take to avoid the error from happening again?
Any help/direction helps, thank you beforehand.