Ideal Opensearch Cluster?

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
I have Opensearch 1.3.8 - I will update this cluster to Opensearch 2 after some Optimization

Describe the issue:
We have like 70% allocation on hot and 90% on warm nodes. We need to expand.

Configuration:
We have 3 warm nodes, 3 hot nodes and 3 master nodes.
On data nodes, we have 32GB heap in docker, 50GB RAM for docker allocation and 64 GB on OS.
I think we have no problem with Heap and GC.

  1. But I need to expand allocations. We have 1 TB for HOT and 6 TB for WARM.
    Can you give me some tips? I need like 6 TB on SSD and 30 TB on WARM, at least, maybe 50 TB.
    Should I add 3 hot and warm nodes with same configuration or can i extend storage from 1 to 2 and from 6 to 12 TB?
  2. I have 3 datacenters and one node in every datacenter. What if I will have maintenance in one datacenter and I will stop this datacenter. If I had 6 hot nodes, 2 of them will be here and I shut them down. I use 1 master and 2 replicas shards. So If I will have only 4 nodes up, they maybe rellocate to other nodes and there would be big traffic and maybe no space left. Can I make somehow that 1 shard will be in every datacenter? If I shutdown 2 nodes in the same datacenter I will have 2 shards of every index still ready. Thanks

Can anyone help me please?

If cpu utilization is moderate, you can add just storage.
You can disable allocation during maintenance to avoid shard relocation.

Judging by the graphs you shared, you don’t need 32GB of heap, I assume 10GB is enough. In fact, 32GB isn’t a great idea because the JVM doesn’t compress pointers, so you have less usable heap than with e.g. 30GB.

And this is with warm nodes, I assume hot nodes are OK with the same amount of heap, too.

I also assume that IO isn’t that much of an issue (maybe you have local SSDs?) because you have much more data than OS cache space.

That leaves us with CPU, as @mkhl suggested. If that’s averaging under 30% or so, you could probably safely squeeze 3x the data on the same amount of nodes. But you need 6x, so probably double the nodes?

The above is just a rule of thumb, maybe you have other bottlenecks. I’m not sure which metrics you currently collect, I’d recommend our OpenSearch monitoring for a comprehensive solution.

If you have 3 datacenters and you only need to support one DC/node failing at a time, I would only have one replica per shard (I mean two copies, primary + replica). I would use forced awareness to make sure shards don’t migrate on other DCs when one DC is down for maintenance. The downside is that if something else fails during that maintenance window, your cluster will be mostly unavailable.

The upside is, with one replica instead of two you can store 50% more data on the same hardware :slight_smile: So maybe if your CPU usage is 20% or so, you can squeeze all the data on existing nodes. Or maybe that’s only the case for the warm tier and you only have to expand the hot tier or something like that.