Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): 2.17.1
Describe the issue: We currently manage a 600 GB OpenSearch index across 3 shards on 3 pods (6 vCPUs, 14 GB RAM xmx each, pods has 18gb of ram) and face performance (in time of the day when usage is high) issues with high-intensity searches, including high latencies.
I’m seeking best practices to optimize for low-latency search workloads while ensuring scalability.
Considering scaling to pods with 16 vCPUs, 24 GB Xmx heap, and 34 GB RAM total, supporting 40 primary shards + 1 replica (80 total shards).
Key questions:
-
Is horizontal scaling (more pods) better than or complementary to vertical scaling?
-
How to calculate what is best in my case. My plan is to go with 40 shards, but i do not know what i need to regarding vertical or horizontal scaling? Any benchmarks, formulas, or tools like OpenSearch Rally?
Configuration:
600gb index, 3 shards, 14gb xmx (18ram per POD), vcpu6
Relevant Logs or Screenshots: