Health check of [/var/lib/opensearch/nodes/0] failed, took [223936ms] which is above the healthy threshold of [1m]

damarmh · October 18, 2023, 7:33am

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Opensearch Version 2.6.0

Describe the issue:
I launched a single-node Opensearch on EC2 instance and the service stopped with an this log level WARN and ERROR

[2023-10-16T08:40:21,922][WARN ][o.o.m.f.FsHealthService  ] [<node_name>] health check of [/var/lib/opensearch/nodes/0] took [223936ms] which is above the warn threshold of [5s]
[2023-10-16T08:40:22,067][ERROR][o.o.m.f.FsHealthService  ] [<node_name>] health check of [/var/lib/opensearch/nodes/0] failed, took [223936ms] which is above the healthy threshold of [1m]

After restarting the EC2 instance, the opensearch resumed again.

Looking at this elasticsearch forum page, who seems to have had a similar problem, it seems that the disks are performing very poorly and perhaps overloaded.

Based on the cat allocation opensearch documentation, I checked the allocation of disk space with this command

GET _cat/allocation?v=true&h=node,shards,disk.*

and received this result

node           shards disk.indices disk.used disk.avail disk.total disk.percent
<node_name>    230        7.9gb    39.1gb     40.8gb     79.9gb           48
UNASSIGNED     209

the node seems to be using 48% of disk space.

First, is this the way to find out the cause of the problem? What are the steps I need to take to avoid the error from happening again?

Any help/direction helps, thank you beforehand.

gaobinlong · October 23, 2023, 6:07am

You can check the IO load of the disk, the IO util maybe too high so that the fs health check of OpenSearch failed, you can run iostat command on the EC2 instance or check the monitor metrics of the attached EBS in AWS console.

Topic		Replies	Views
Increase health-check threshhold OpenSearch troubleshoot , configure	2	372	January 16, 2024
Data skew on opensearch cluster OpenSearch	1	178	July 16, 2024
Opensearch experimental installation all in one host server OpenSearch troubleshoot	2	900	March 11, 2024
Not working cluster health status 503 OpenSearch	4	593	March 25, 2025
How to reduce node's disk usage? OpenSearch	2	631	August 4, 2023

Health check of [/var/lib/opensearch/nodes/0] failed, took [223936ms] which is above the healthy threshold of [1m]

Related topics