Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch 2.15.0
Describe the issue:
There is an OpenSearch cluster that we have deployed on an AWS Kubernetes cluster. We use Persistent Volumes to store data of OpenSearch. The Persistent volumes are on AWS Elastic File System.
I noticed the following output for _cluster/allocation/explain
API call.
{
"index": "security-auditlog-2024.10.01",
"shard": 0,
"primary": false,
"current_state": "unassigned",
"unassigned_info": {
"reason": "NODE_LEFT",
"at": "2024-10-04T04:03:10.177Z",
"details": "node_left [Ahacm3NMQui2-2bKSrv6nw]",
"last_allocation_status": "no_attempt"
},
"can_allocate": "throttled",
"allocate_explanation": "allocation temporarily throttled",
"node_allocation_decisions": [
{
"node_id": "Ahacm3NMQui2-2bKSrv6nw",
"node_name": "opensearch-data-0",
"transport_address": "172.17.21.169:9300",
"node_attributes": {
"shard_indexing_pressure_enabled": "true"
},
"node_decision": "throttled",
"store": {
"matching_size_in_bytes": 1425343
},
"deciders": [
{
"decider": "throttling",
"decision": "THROTTLE",
"explanation": "reached the limit of incoming shard recoveries [2], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
}
]
},
{
"node_id": "WS2pQ1DoT-eNVFwelyWn3g",
"node_name": "opensearch-data-2",
"transport_address": "172.17.21.43:9300",
"node_attributes": {
"shard_indexing_pressure_enabled": "true"
},
"node_decision": "no",
"deciders": [
{
"decider": "disk_threshold",
"decision": "NO",
"explanation": "the node has fewer free bytes remaining than the total size of all incoming shards: free space [-11136401408B], relocating shards [0B]"
}
]
},
{
"node_id": "ugn6My5SST6Bf6vOJfAvqQ",
"node_name": "opensearch-data-1",
"transport_address": "172.17.22.108:9300",
"node_attributes": {
"shard_indexing_pressure_enabled": "true"
},
"node_decision": "no",
"store": {
"matching_size_in_bytes": 1425848
},
"deciders": [
{
"decider": "same_shard",
"decision": "NO",
"explanation": "a copy of this shard is already allocated to this node [[security-auditlog-2024.10.01][0], node[ugn6My5SST6Bf6vOJfAvqQ], [P], s[STARTED], a[id=4SLCrYkZRS-5sLmKEscL1A]]"
},
{
"decider": "throttling",
"decision": "THROTTLE",
"explanation": "reached the limit of incoming shard recoveries [2], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
}
]
}
]
}
The strange part of this output is the node has fewer free bytes remaining than the total size of all incoming shards: free space [-11136401408B], relocating shards [0B]
.
What could cause the free space to be detected as negative?
Configuration:
3 master nodes
3 data nodes
Relevant Logs or Screenshots: