Shard Configuration for Optimal Search Performance

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
2.13

Describe the issue:
I am aiming to optimize the performance of our OpenSearch cluster, specifically looking to improve search performance during peak load times. Currently, during peak operations, the average CPU usage on our data nodes hits 60%, which suggests that our current shard configuration may need adjustment.
I am considering how best to adjust the number of primary and replica shards to manage search load more effectively and reduce CPU strain. Any insights or recommendations on optimal shard configurations for better performance and load distribution would be greatly appreciated.

Configuration:

  • Node Details: 4 data nodes on AWS r6g.large instances, supported by 3 master nodes
  • Index Details: A single index with 7 million documents
  • Current Shard Setup: 2 primary shards with 1 replica shard for each
  • Workload Characteristics: Mainly search-heavy, with peak search requests of 100 req/s

Relevant Logs or Screenshots:
None

Could you check if the 4 shards(primary+replica) in your index is balanced in the 4 data nodes? If not, you can make them balanced firslty.

1 Like

Hi @Taichi,

This might be an interesting read for you:

best,
mj

1 Like

Thank you for the advice.
Each shard is balanced in the 4 data nodes!

(Excerpt from response of GET /_cat/shards?v)

index_name                      0     r      STARTED 3710610 1.5gb x.x.x.x  8bbb2axxxxxxxxxx
index_name                      0     p      STARTED 3710610 1.4gb x.x.x.x a39fdxxxxxxxxxx
index_name                      1     r      STARTED 3708589 1.4gb x.x.x.x f9567xxxxxxxxxx
index_name                      1     p      STARTED 3708590 1.4gb x.x.x.x a2a42xxxxxxxxxx

Thank you for the article, it was helpful. However, I’m not sure if our sharding configuration is optimal. Actually our index data is only 3GB, so according to the article’s guideline of 10-30GiB per shard, perhaps we should have just one primary shard. But I’m concerned that this could result in the indexing process load being concentrated on a single data node.