Opensearch Resource requirements

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Opensearch version 2.8

Describe the issue:
I am trying to calculate the Resource requirements for an opensearch cluster system which will be deployed on k8s.

Configuration:
The total load will be 4 TB and i will use 1 replica. Is it possible to have equations for required number of shard and required vCPU, considering that i will use hot shards for all data and will set to 50 GB.

If your shard size is 50GB (sounds quite large), it means you’ll have 80 shards. If you add replicas, 160 shards in the cluster.

I’m not sure what you mean by “hot shards for all data”, I assume you’ll query all of it at once, it’s not the log data that many people use? Because if it is logs, you may be able to make do with two pods of 8vCPUs and 64GB or RAM with that size, assuming you have local SSDs (hard to pull off in k8s, but not impossible). You’ll also need a tie-breaker master node that can be tiny (2vCPUs, 4GB of RAM).

If you need to query all the data and you have many queries going on, you’ll need fast(er) IO and more CPU. If you have fast local SSDs, memory won’t matter that much, otherwise it will (for caches). But CPUs will depend on how complex/expensive your queries are and how many of them you run at once.

It’s best to try with a slice of data early and see how it goes. You’ll want to monitor your cluster and see what the bottlenecks are. I’m biased, of course, but I’d recommend our tool: OpenSearch Monitoring Integration

It doesn’t only monitor your OpenSearch metrics (can get your OpenSearch logs, too), but you’ll see some tips there next to many significant charts that tell you what to look for and what to do about it. You’ll see an example here: https://sematext.com/product-updates/#/2023/opensearch-monitoring-integration-now-available

I mean active shard. In following guide it suggest 1.5 vCPU per shard. And it makes 240 vCPU . It makes my system quite large since it also increase the total number of pod also increase the ephemeral storage and RAM.

Sorry ı forgot to share the link. Please see Operational best practices for Amazon OpenSearch Service - Amazon OpenSearch Service and Sizing Amazon OpenSearch Service domains - Amazon OpenSearch Service

1.5vCPU per shard is just a rule of thumb. It will depend a lot on the use-case. It it’s logs, for sure it will be less. We’re running a log management service that exposes most of the OpenSearch API and we have many thousands of shards in a cluster that definitely doesn’t have many thousands of cores.