Describe the issue
We implemented a search queries load test for the OpenSearch cluster with the knn plugin installed.
During the load test we get a lot of timeouts shortly after the start.
Index with ~22M items is pre-uploaded (144Gb storage, 32 shards, 890 segments). The vectors inside are 512-d, lucene hnsw is used.
load test concurrency: 25
What is the expected behavior?
Either timings are low or any watched metrics shows clearly the reason of the problem (something to be scaled or reconfigured).
What is your host/environment?
The cluster is run on the 16 m6g.xlarge.search data nodes and 3 r6g.large.search master nodes.
Do you have any additional context?
We’re trying to monitor the source of the problem using CPUUtilization, JVMMemoryPressure, Free Storage.
None of those gets close to the limit during the test.
KNNGraphMemoryUsage is always 0, which is different from faiss and nmslib hnsw tests.
Can you please give me any guidance on what metrics or potential problems should I look for?