Cluster down after typo on search backpressure cluster setting

spapadop · July 24, 2023, 3:51pm

Important: Don’t try this on a production cluster.

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch v2.7.0

Describe the issue:
I wanted to try out search backpressure in a test cluster, so to enable it, instead of doing

PUT _cluster/settings
{
  "persistent": {
    "search_backpressure.mode": "enforced"
  }
}

I did (note the typo in enforcedd):

PUT _cluster/settings
{
  "persistent": {
    "search_backpressure.mode": "enforcedd"
  }
}

just to check out how it’ll behave. Instead of rejecting it, like any other cluster setting, e.g.:

PUT _cluster/settings
{
  "transient": {
    "cluster.routing.allocation.enable": "blah"
  }
}

that gives

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Illegal allocation.enable value [BLAH]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "Illegal allocation.enable value [BLAH]"
  },
  "status": 400
}

it brought the cluster down, since it cannot apply value enforcedd.
Now the cluster is completely confused (can’t blame it) as it can’t apply the value to the nodes.
Restarts won’t help as I applied the value in persistent settings.
Nodes won’t talk to each other as they are busy trying to apply that wrong value.

Is there any way to remove this persistent setting to allow the cluster to come up again?
Or shall I say goodbye?

Relevant Logs or Screenshots:

[2023-07-24T17:12:51,082][INFO ][o.o.c.s.ClusterSettings  ] [osarally101-sokratis1_master] updating [search_backpressure.mode] from [monitor_only] to [enforcedd]
[2023-07-24T17:12:51,082][WARN ][o.o.c.s.ClusterSettings  ] [osarally101-sokratis1_master] failed to apply settings
java.lang.IllegalArgumentException: Invalid SearchBackpressureMode: enforcedd
        at org.opensearch.search.backpressure.settings.SearchBackpressureMode.fromName(SearchBackpressureMode.java:50) ~[opensearch-2.7.0.jar:2.7.0]
        at org.opensearch.search.backpressure.settings.SearchBackpressureSettings.lambda$new$0(SearchBackpressureSettings.java:117) ~[opensearch-2.7.0.jar:2.7.0]
        at org.opensearch.common.settings.Setting$Updater.apply(Setting.java:1241) ~[opensearch-2.7.0.jar:2.7.0]
        at org.opensearch.common.settings.AbstractScopedSettings$SettingUpdater.lambda$updater$0(AbstractScopedSettings.java:696) ~[opensearch-2.7.0.jar:2.7.0]
        at org.opensearch.common.settings.AbstractScopedSettings.applySettings(AbstractScopedSettings.java:232) [opensearch-2.7.0.jar:2.7.0]
        at org.opensearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:556) [opensearch-2.7.0.jar:2.7.0]
        at org.opensearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:484) [opensearch-2.7.0.jar:2.7.0]
        at org.opensearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:186) [opensearch-2.7.0.jar:2.7.0]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) [opensearch-2.7.0.jar:2.7.0]
        at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:282) [opensearch-2.7.0.jar:2.7.0]
        at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:245) [opensearch-2.7.0.jar:2.7.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.lang.Thread.run(Thread.java:833) [?:?]

pablo · July 24, 2023, 5:53pm

@spapadop This setting could be reverted by configuring a null value. However, as you’ve noticed, nodes are looped in the reported error which prevents any cluster settings modifications.

OpenSearch node applies the settings in the following order:

Transistent setting
Persistent setting
opensearch.yml
default settings

So if there was a chance of setting the reported cluster setting in the opensearch.yml, it wouldn’t override the persistent setting.

Take a look at this reported bug OpenSearch handling for invalid setting value instead of corrupting the state · Issue #7598 · opensearch-project/OpenSearch · GitHub

There is also a method described which allows to remove an unwanted setting. However, please be aware that the tool reports itself as A CLI tool to do unsafe cluster and index manipulations on current node

github.com/opensearch-project/OpenSearch

OpenSearch handling for invalid setting value instead of corrupting the state

opened 05:19AM - 17 May 23 UTC

shwetathareja

enhancement distributed framework

**Is your feature request related to a problem? Please describe.** In case, a…n invalid setting gets persisted in OpenSearch cluster state, it causes exception and prevents cluster from coming up. There is no way to salvage the cluster. The OpenSearch process would continue to crash and never come up. It happened recently when Search BackPressure mode setting didn't have proper validation for supported values. https://github.com/opensearch-project/OpenSearch/issues/6832 **Describe the solution you'd like** The proposal is to provide node scope setting or system property via JVM options to ignore specific setting parsing exceptions as the mitigation to avoid full cluster downtime. It would fall back to the settings default value. Also, this should allow updating the setting to a new valid value via _cluster/settings API. **Alternative solution** Provide a tool which can fix the cluster state persisted on disk by first terminating the OpenSearch process. This could be more risky as any setting could be updated without any validation. But, we can evaluate if this would be a better solution to override any setting in the cluster state. This tool could help fix not just setting but any other part of the cluster state as well. I guess it would depend if OpenSearch users have faced situations where they needed a tool like this.

spapadop · July 25, 2023, 8:03am

many thanks @pablo for the prompt response and solution, indeed I brought the cluster back to life and I’m happy this bug is being followed up.

rlevitsky · May 14, 2024, 5:24pm

How have you managed to get a cluster back?

spapadop · May 22, 2024, 5:17pm

Hi @rlevitsky, I had to run this:

OPENSEARCH_PATH_CONF=/etc/path/to/my/conf /usr/share/opensearch/bin/opensearch-node remove-settings search_backpressure.mode

Topic		Replies	Views
Unable to start opensearch: loop 'failed to apply settings' and 'rate must be greater than zero' OpenSearch	2	67	August 29, 2024
How do I update the search_backpressure.interval_millis setting? OpenSearch	1	11	July 14, 2025
/_cluster/_settings Payload is not allowed OpenSearch	1	813	August 23, 2024
OpenSearch upgrade leftover persistent settings Index Management	7	1091	June 25, 2025
How can discovery.zen.minimum_master_nodes be unset in 2.0? OpenSearch	4	1213	June 28, 2022

Cluster down after typo on search backpressure cluster setting

Related topics