Disable Shard Allocation between Data Nodes but not for newly created shards

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

Ubuntu 22.04 LTS
Opensearch 2.13

Describe the issue:

Hi,

i tried to answer it myself, but i’m not fit in interpreting setting for shard allocation.

We have four data nodes. Two in each data center.

DC1:
data1
data2

DC2:
data3
data4

I have configure that shards and replicas are never on the same DataCenter. This works well.

At the moment shard allocation is per default enabled “cluster.routing.allocation.enable”: “all”. What i want to reach is, that no automatic allocation happens, when a data node fails. So normally i would say “cluster.routing.allocation.enable”: “none” is the part. But this also stops the allocation of newly created shards.

How can i turn on basic functionality “automatic allocation of new created shards”, but no allocation when some data node fails?

Regards,
Steffen

Hey Steffen,

It looks like you want to assign new index’s shards to a node and then to not have it be reassigned ever.

previously you achieved this by setting “cluster.routing.allocation.enable”: “none”, but this resulted in new shards never getting assigned.

I think the solution you’re after is new_primaries.

PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.enable": "new_primaries"
  }
}

Setting this will ensure new index shards get assigned on creation, but then never get reassigned, even when you take down a node. However this will not work on replicas, they will remain similar to how you see it with setting none and end up never getting assigned.

Are you using replicas? Or only primaries? If so this might be enough,

Leeroy.

Hey Leeroy,

thanks for ur reply. Yes all index have replicas set to 1. I think i tried new_primaries parameter and have seen some unallocaled shards. Didn’t noticed that it could only be replica shard, but that sounds logical.

So yes “Setting this will ensure new index shards get assigned on creation, but then never get reassigned” is what I want, but also for replicas.

Regards

Hey Steffen,

At this moment no simple setting seems to exist to my knowledge that will work with replicas.

However I have thought of a solution you can try:

This will probably work perfectly for your use case, it allows you to delay the timeout when a node goes down. Allowing for shards to remain where they’re and wait until the timer is up before attempting to reassign to a new node. So you could set it for as long as needed, then when the node comes back online all the shards start back up without reassigning.

curl --insecure --cert ./config/kirk.pem --key ./config/kirk-key.pem --cacert ./config/root-ca.pem -H "Content-Type:application/json" -XPUT https://localhost:9200/_all/_settings -d '{"settings":{"index.unassigned.node_left.delayed_timeout": "150m"}}'

Hope this is what you’re after,

Leeroy.

1 Like

Hi Leeroy,

tested it and it looks good. I set the parameter to 30d, in normal conditions problems in one datacenter should be fixed in that time :slight_smile:

Thank you for you help.

Steffen

1 Like