Deadloack with 'disk usage exceeded flood-stage watermark'

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): 2.10

Describe the issue:

I’m using OpenSearch through Wazuh. Single node.
I ran out of space on this node. I’ve then stopped and expanded the partition from 30Gb to 100Gb, making the server a 70Gb free disk space.

[o.o.e.NodeEnvironment    ] [node-1] using [1] data paths, mounts [[/ (/dev/sda1)]], net usable_space [72.5gb], net total_space [89.1gb], types [ext4]

Unfortunately, I’m not able to restart the server.

I’m getting this error

[INFO ][o.o.s.c.ConfigurationRepository] [node-1] Wait for cluster to be available ...
[INFO ][o.o.c.s.ClusterSettings  ] [node-1] updating [plugins.index_state_management.template_migration.control] from [0] to [-1]
[INFO ][o.o.a.c.HashRing         ] [node-1] Node added: [p_LMT0O-TOmzGaTlOwEbBg]
[INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [node-1] Detected cluster change event for destination migration
[INFO ][o.o.a.c.HashRing         ] [node-1] Add data node to AD version hash ring: p_LMT0O-TOmzGaTlOwEbBg
[INFO ][o.o.a.c.HashRing         ] [node-1] All nodes with known AD version: {p_LMT0O-TOmzGaTlOwEbBg=ADNodeInfo{version=2.10.0, isEligibleDataNode=true}}
[INFO ][o.o.a.c.HashRing         ] [node-1] Rebuild AD hash ring for realtime AD with cooldown, nodeChangeEvents size 0
[INFO ][o.o.a.c.HashRing         ] [node-1] Build AD version hash ring successfully
[INFO ][o.o.a.c.ADDataMigrator   ] [node-1] Start migrating AD data
[INFO ][o.o.a.c.ADDataMigrator   ] [node-1] AD job index doesn't exist, no need to migrate
[INFO ][o.o.a.c.ADClusterEventListener] [node-1] Init AD version hash ring successfully
[ERROR][o.o.s.l.LogTypeService   ] [node-1] Failed creating LOG_TYPE_INDEX
org.opensearch.cluster.block.ClusterBlockException: index [.opensearch-sap-log-types-config] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];
at org.opensearch.cluster.block.ClusterBlocks.indicesBlockedException(ClusterBlocks.java:243) ~[opensearch-2.10.0.jar:2.10.0]

When I check over the internet, they suggest to pass commands to the cluster, unfortunately, this is not possible for me because the node stops just after the error.

Is there something we can do to make it happy and start again ? :pray:

Configuration:

network.host: 127.0.0.1
node.name: node-1
discovery.type: single-node

cluster.name: wazuh

http.port: 9200-9299
transport.tcp.port: 9300-9399
node.max_local_storage_nodes: "3"
path.data: /var/lib/wazuh-indexer
path.logs: /var/log/wazuh-indexer

...

plugins.security.authcz.admin_dn:
- "CN=admin,OU=Wazuh,O=Wazuh,L=California,C=US"
plugins.security.check_snapshot_restore_write_privileges: true
plugins.security.enable_snapshot_restore_privilege: true
plugins.security.nodes_dn:
- "CN=node-1,OU=Wazuh,O=Wazuh,L=California,C=US"
plugins.security.restapi.roles_enabled:
- "all_access"
- "security_rest_api_access"

plugins.security.system_indices.enabled: true
plugins.security.system_indices.indices: [".opendistro-alerting-config", ".opendistro-alerting-alert*", ".opendistro-anomaly-results*", ".opendistro-anomaly-detector*", ".opendistro-anomaly-checkpoints", ".opendistro-anomaly-detection-state", ".opendistro-reports-*", ".opendistro-notifications-*", ".opendistro-notebooks", ".opensearch-observability", ".opendistro-asynchronous-search-response*", ".replication-metadata-store"]

### Option to allow Filebeat-oss 7.10.2 to work ###
compatibility.override_main_response_version: true

Hi @Mathieu,

Have you tried adjuting the below (note: all settings in this list are dynamic):

cluster.routing.allocation.disk.watermark.low
cluster.routing.allocation.disk.watermark.high
cluster.routing.allocation.disk.watermark.flood_stage

more here:

Please see how to configure your OpenSearch here (in your case opensearch.yml):

Yes I tried, even the cluster.routing.allocation.disk.threshold_enabled: false

The documentation mentions

This will also remove any existing index.blocks.read_only_allow_delete index blocks when disabled

Unfortunately is has no effect on my side :sob:

Is there a way to “cleanup” this index index has read-only-allow-delete block ?

network.host: 127.0.0.1
node.name: node-1
discovery.type: single-node

cluster.name: wazuh

http.port: 9200-9299
transport.tcp.port: 9300-9399
node.max_local_storage_nodes: "3"
path.data: /var/lib/wazuh-indexer
path.logs: /var/log/wazuh-indexer

cluster.routing.allocation.disk.threshold_enabled: false

cluster.routing.allocation.disk.watermark.low: 86%
cluster.routing.allocation.disk.watermark.high: 91%
cluster.routing.allocation.disk.watermark.flood_stage: 96%

plugins.security.ssl.http.pemcert_filepath: /etc/wazuh-indexer/certs/node-1.pem
plugins.security.ssl.http.pemkey_filepath: /etc/wazuh-indexer/certs/node-1-key.pem
plugins.security.ssl.http.pemtrustedcas_filepath: /etc/wazuh-indexer/certs/root-ca.pem
plugins.security.ssl.transport.pemcert_filepath: /etc/wazuh-indexer/certs/node-1.pem
plugins.security.ssl.transport.pemkey_filepath: /etc/wazuh-indexer/certs/node-1-key.pem
plugins.security.ssl.transport.pemtrustedcas_filepath: /etc/wazuh-indexer/certs/root-ca.pem
plugins.security.ssl.http.enabled: true
plugins.security.ssl.transport.enforce_hostname_verification: false
plugins.security.ssl.transport.resolve_hostname: false

plugins.security.authcz.admin_dn:
- "CN=admin,OU=Wazuh,O=Wazuh,L=California,C=US"
plugins.security.check_snapshot_restore_write_privileges: true
plugins.security.enable_snapshot_restore_privilege: true
plugins.security.nodes_dn:
- "CN=node-1,OU=Wazuh,O=Wazuh,L=California,C=US"
plugins.security.restapi.roles_enabled:
- "all_access"
- "security_rest_api_access"

plugins.security.system_indices.enabled: true
plugins.security.system_indices.indices: [".opendistro-alerting-config", ".opendistro-alerting-alert*", ".opendistro-anomaly-results*", ".opendistro-anomaly-detector*", ".opendistro-anomaly-checkpoints", ".opendistro-anomaly-detection-state", ".opendistro-reports-*", ".opendistro-notifications-*", ".opendistro-notebooks", ".opensearch-observability", ".opendistro-asynchronous-search-response*", ".replication-metadata-store"]

### Option to allow Filebeat-oss 7.10.2 to work ###
compatibility.override_main_response_version: true

ok, I finally managed to fix this issue.

For information, I moved away all the plugins in a backup directory, commented out all the related configuration, then I managed to start the server :ok_hand:

I ran the command to remove the read only block.

Shutdown the server then reset the plugins and configuration settings.

It seems that there is a plugin that was not happy that.

Thanks !
M.

1 Like