Data nodes and shards sizing recommendation

kitkit · September 22, 2025, 3:28pm

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

AWS self managed opensearch 2.19

Describe the issue:

Recently indexing more fields which spike up to 500GB per index and configure with 20 shards per index

And after that, one day after the increase,

Search rate per min spike up from average 100k to 120k causing all data nodes CPU hit with 97% CPU usage, crashed 2 data nodes and auto recover back.

With total 570 shards ( primary and replicate), does this mean 6 data nodes with 16 CPU is actually not enough based on recommended sizing.

(Recommended sizing would be 1 shard for 1.5CPU) AOSPERF01-BP02 Check shard-to-CPU ratio - Amazon OpenSearch Service Lens

I temporary increased to 10 data nodes to help stop the issue and in monitoring but the questions are

What data nodes sizing should I configure? more data nodes with more CPU but costly as recommended sizing seem way to huge to allocate the right data nodes
Does smaller sharding size like 15 GB per shard is better than 25 per shard? With assumption the smaller shards help smaller search and less CPU usage but more distributed request to the shards?
Should i make the index smaller which mean smaller sharding? like weekly indices rather monthly indices?

Configuration:

6 data nodes - r7g.4xlarge.search
3 master nodes
recent 4 months indices with 500 GB size per index and 20 shards to balance out
the remaining old months 80GB per index with 3 shards for last 3 years

Relevant Logs or Screenshots:

Anthony · September 23, 2025, 10:43am

@kitkit Since this is a search heavy use case I would recommend to keep ~25–30 GiB primaries (don’t shrink to 15 GiB). As this would increase the CPU load.
I would also tie the rollover to shard size as opposed to specific periods.

AWS Well-Architected calls out ~1.5 vCPUs per active shard as a conservative check. With ~570 shards total on 6× r7g.4xlarge.search (16 vCPU each ⇒ 96 vCPU), you are far below that rule of thumb, which explains the ~97% CPU load. In these situations it is advisable to either add vCPUs (scale out / use a more CPU-dense family) or reduce the number of active shards (larger shards or fewer replicas on hot data). It is not recommended to go over 40-50GB shard size however.

kitkit · September 24, 2025, 9:44am

@Anthony

I increased to 10 data nodes and CPU hovering around 50-60%.

I understand that CPU allocation required to increase but with 10 data nodes with 160 total CPU, currently serve well. so not thinking to match the AWS 1.5 vCPU per active shard because the cost to increase the CPU is expensive

You suggested that I should not shrink shard size to 15GB and the reason this would increase the CPU load because of

Higher CPU load: Each query must hit more shards = more CPU work
More coordination overhead: OpenSearch has to merge results from more shards

is that correct?

2nd questions that I would like to know more as current 1 index = 1 month data is 500GB with 20 shards. (25 GB per shard)

I thinking to break into daily/weekly index, rather monthly index to keep index size smaller, which then search smaller size data index that avoid less cpu usage and faster response.

Thinking what would best size per index that I should aim for?

The usage case would be each users will query smaller index and some feature that query all indices also will not impacted with lot of index to search..

Anthony · September 25, 2025, 10:04am

@kitkit Your understanding is correct: more shards → more per-shard threads + more merging, therefore more CPU for search.

Regarding switching to daily/weekly indices, this depends on the queries that the users will be running, if they span days, then moving to weekly should improve the performance. In your case you could try 5 primary shards for weekly index. This would be aprx 25GB on each primary.

Topic		Replies	Views
Openseach too many shards might cause heavy CPU usage General Feedback troubleshoot , configure	1	104	December 11, 2025
Shard Configuration for Optimal Search Performance OpenSearch	4	576	July 2, 2024
Opensearch Resource requirements OpenSearch	4	5443	August 31, 2023
Improve the data nodes and shards configuration for performance OpenSearch troubleshoot , configure	7	5130	June 7, 2023
What should be the Configurations for 3-4TB? OpenSearch configure , index-management	6	347	July 15, 2024

Data nodes and shards sizing recommendation

Related topics