Shard Configuration for Optimal Search Performance

Taichi · June 29, 2024, 8:59am

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
2.13

Describe the issue:
I am aiming to optimize the performance of our OpenSearch cluster, specifically looking to improve search performance during peak load times. Currently, during peak operations, the average CPU usage on our data nodes hits 60%, which suggests that our current shard configuration may need adjustment.
I am considering how best to adjust the number of primary and replica shards to manage search load more effectively and reduce CPU strain. Any insights or recommendations on optimal shard configurations for better performance and load distribution would be greatly appreciated.

Configuration:

Node Details: 4 data nodes on AWS r6g.large instances, supported by 3 master nodes
Index Details: A single index with 7 million documents
Current Shard Setup: 2 primary shards with 1 replica shard for each
Workload Characteristics: Mainly search-heavy, with peak search requests of 100 req/s

Relevant Logs or Screenshots:
None

gaobinlong · July 1, 2024, 2:57am

Could you check if the 4 shards(primary+replica) in your index is balanced in the 4 data nodes? If not, you can make them balanced firslty.

Mantas · July 1, 2024, 12:44pm

Hi @Taichi,

This might be an interesting read for you:

best,
mj

Taichi · July 2, 2024, 9:16am

Thank you for the advice.
Each shard is balanced in the 4 data nodes!

(Excerpt from response of GET /_cat/shards?v)

index_name                      0     r      STARTED 3710610 1.5gb x.x.x.x  8bbb2axxxxxxxxxx
index_name                      0     p      STARTED 3710610 1.4gb x.x.x.x a39fdxxxxxxxxxx
index_name                      1     r      STARTED 3708589 1.4gb x.x.x.x f9567xxxxxxxxxx
index_name                      1     p      STARTED 3708590 1.4gb x.x.x.x a2a42xxxxxxxxxx

Taichi · July 2, 2024, 9:47am

Thank you for the article, it was helpful. However, I’m not sure if our sharding configuration is optimal. Actually our index data is only 3GB, so according to the article’s guideline of 10-30GiB per shard, perhaps we should have just one primary shard. But I’m concerned that this could result in the indexing process load being concentrated on a single data node.

Topic		Replies	Views
Improve the data nodes and shards configuration for performance OpenSearch troubleshoot , configure	7	4413	June 7, 2023
What should be the Configurations for 3-4TB? OpenSearch configure , index-management	6	157	July 15, 2024
Open search 2.16 performance issues OpenSearch	2	111	October 8, 2024
Search request rate imbalance OpenSearch troubleshoot	2	250	July 10, 2024
Opensearch primary shard allocation OpenSearch configure , index-management	3	1572	April 10, 2024

Shard Configuration for Optimal Search Performance

Related topics