Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OS - RHEL8
name version node.role
pescold01-spc 2.8.0 dr
pescold02-spc 2.8.0 dr
pescold03-spc 2.8.0 dr
peshot01-spc 2.8.0 dir
peshot02-spc 2.8.0 dir
peshot03-spc 2.8.0 dir
peshot04-spc 2.8.0 dir
peshot05-spc 2.8.0 dir
peshot06-spc 2.8.0 dir
pesmaster01-spc 2.8.0 mr
pesmaster02-spc 2.8.0 mr
pesmaster03-spc 2.8.0 mr
peswarm01-spc 2.8.0 dr
peswarm02-spc 2.8.0 dr
peswarm03-spc 2.8.0 dr
Describe the issue:
I need to fix this issue I have on cold nodes. There are old HDD disks and when I delete big index on these HDDs healthcheck is failing and then disconned cold nodes.
health check of [/usr/share/opensearch/data/nodes/0] took [5202ms] which is above the warn threshold of [5s]
It happend only on COLD nodes because their disk utilization is 100% when I delete 50GB index by ILM.
MY QUESTION IS CAN I INCREASE THIS THRESHOLD TO 10 OR MORE SECONDS?
Configuration:
ILM policy
{
"policy_id": "HOT-WARM-COLD - 180d",
"description": "BIG data - 7 day rollover, 50 GB\nHOT 1-10 day, warm 10-30day, cold 30-180 day",
"last_updated_time": 1692691334023,
"schema_version": 13,
"error_notification": null,
"default_state": "hot",
"states": [
{
"name": "hot",
"actions": [
{
"retry": {
"count": 3,
"backoff": "exponential",
"delay": "1m"
},
"index_priority": {
"priority": 50
}
},
{
"retry": {
"count": 3,
"backoff": "exponential",
"delay": "1m"
},
"rollover": {
"min_index_age": "7d",
"min_primary_shard_size": "50gb"
}
}
],
"transitions": [
{
"state_name": "warm",
"conditions": {
"min_index_age": "10d"
}
}
]
},
{
"name": "warm",
"actions": [
{
"retry": {
"count": 3,
"backoff": "exponential",
"delay": "1m"
},
"index_priority": {
"priority": 25
}
},
{
"retry": {
"count": 3,
"backoff": "exponential",
"delay": "1m"
},
"allocation": {
"require": {
"temp": "warm"
},
"include": {},
"exclude": {},
"wait_for": false
}
}
],
"transitions": [
{
"state_name": "cold",
"conditions": {
"min_index_age": "30d"
}
}
]
},
{
"name": "cold",
"actions": [
{
"retry": {
"count": 3,
"backoff": "exponential",
"delay": "1m"
},
"index_priority": {
"priority": 10
}
},
{
"retry": {
"count": 3,
"backoff": "exponential",
"delay": "1m"
},
"allocation": {
"require": {
"temp": "cold"
},
"include": {},
"exclude": {},
"wait_for": false
}
}
],
"transitions": [
{
"state_name": "delete",
"conditions": {
"min_index_age": "180d"
}
}
]
},
{
"name": "delete",
"actions": [
{
"retry": {
"count": 3,
"backoff": "exponential",
"delay": "1m"
},
"delete": {}
}
],
"transitions": []
}
],
"ism_template": [
{
"index_patterns": [
SECRET
],
"priority": 10,
"last_updated_time": 1689940479010
}
]
}
Relevant Logs or Screenshots:
pescold02-elastic[7214]: [2023-12-30T16:10:02,030][WARN ][o.o.m.f.FsHealthService ] [pescold02-spc] health check of [/usr/share/opensearch/data/nodes/0] took [5202ms] which is above the warn threshold of [5s]
pescold02-elastic[7214]: [2023-12-30T16:10:12,959][INFO ][o.o.c.c.Coordinator ] [pescold02-spc] cluster-manager node [{pesmaster03-spc}{9jX0k5J6Q5muN5DPn4vu1Q}{LzacVARJRgWVhs0mytyRtA}{10.xx.xx.xx}{10.xx.xx.xx:9300}{mr}{shard_indexing_pressure_enabled=true}] failed, restarting discovery
pescold02-elastic[7214]: org.opensearch.OpenSearchException: node [{pesmaster03-spc}{9jX0k5J6Q5muN5DPn4vu1Q}{LzacVARJRgWVhs0mytyRtA}{10.xx.xx.xx}{10.xx.xx.xx:9300}{mr}{shard_indexing_pressure_enabled=true}] failed [3] consecutive checks
pescold01-elastic[6944]: [2023-12-30T16:08:47,486][WARN ][o.o.m.f.FsHealthService ] [pescold01-spc] health check of [/usr/share/opensearch/data/nodes/0] took [11405ms] which is above the warn threshold of [5s]
pescold01-elastic[6944]: [2023-12-30T16:09:53,189][WARN ][o.o.m.f.FsHealthService ] [pescold01-spc] health check of [/usr/share/opensearch/data/nodes/0] took [5803ms] which is above the warn threshold of [5s]
Dec 30 17:10:12 pescold01-spc pescold01-elastic[6944]: [2023-12-30T16:10:12,859][INFO ][o.o.c.c.Coordinator ] [pescold01-spc] cluster-manager node [{pesmaster03-spc}{9jX0k5J6Q5muN5DPn4vu1Q}{LzacVARJRgWVhs0mytyRtA}{10.xx.xx.xx}{10.xx.xx.xx:9300}{mr}{shard_indexing_pressure_enabled=true}] failed, restarting discovery
pescold01-elastic[6944]: org.opensearch.OpenSearchException: node [{pesmaster03-spc}{9jX0k5J6Q5muN5DPn4vu1Q}{LzacVARJRgWVhs0mytyRtA}{10.xx.xx.xx}{10.xx.xx.xx:9300}{mr}{shard_indexing_pressure_enabled=true}] failed [3] consecutive checks