I temporary increased to 10 data nodes to help stop the issue and in monitoring but the questions are
What data nodes sizing should I configure? more data nodes with more CPU but costly as recommended sizing seem way to huge to allocate the right data nodes
Does smaller sharding size like 15 GB per shard is better than 25 per shard? With assumption the smaller shards help smaller search and less CPU usage but more distributed request to the shards?
Should i make the index smaller which mean smaller sharding? like weekly indices rather monthly indices?
Configuration:
6 data nodes - r7g.4xlarge.search
3 master nodes
recent 4 months indices with 500 GB size per index and 20 shards to balance out
the remaining old months 80GB per index with 3 shards for last 3 years
@kitkit Since this is a search heavy use case I would recommend to keep ~25β30 GiB primaries (donβt shrink to 15 GiB). As this would increase the CPU load.
I would also tie the rollover to shard size as opposed to specific periods.
AWS Well-Architected calls out ~1.5 vCPUs per active shard as a conservative check. With ~570 shards total on 6Γ r7g.4xlarge.search (16 vCPU each β 96 vCPU), you are far below that rule of thumb, which explains the ~97% CPU load. In these situations it is advisable to either add vCPUs (scale out / use a more CPU-dense family) or reduce the number of active shards (larger shards or fewer replicas on hot data). It is not recommended to go over 40-50GB shard size however.
I increased to 10 data nodes and CPU hovering around 50-60%.
I understand that CPU allocation required to increase but with 10 data nodes with 160 total CPU, currently serve well. so not thinking to match the AWS 1.5 vCPU per active shard because the cost to increase the CPU is expensive
You suggested that I should not shrink shard size to 15GB and the reason this would increase the CPU load because of
Higher CPU load: Each query must hit more shards = more CPU work
More coordination overhead: OpenSearch has to merge results from more shards
is that correct?
2nd questions that I would like to know more as current 1 index = 1 month data is 500GB with 20 shards. (25 GB per shard)
I thinking to break into daily/weekly index, rather monthly index to keep index size smaller, which then search smaller size data index that avoid less cpu usage and faster response.
Thinking what would best size per index that I should aim for?
The usage case would be each users will query smaller index and some feature that query all indices also will not impacted with lot of index to search..
@kitkit Your understanding is correct: more shards β more per-shard threads + more merging, therefore more CPU for search.
Regarding switching to daily/weekly indices, this depends on the queries that the users will be running, if they span days, then moving to weekly should improve the performance. In your case you could try 5 primary shards for weekly index. This would be aprx 25GB on each primary.