I have the following cluster Opensearch 2.5
Nodes: 6 (48vCPUs and 384GB memory)
EBS volume: 24TB GP3 type (Provisioned IOPS: 50,000 and 1781 Mb/sec throughput per node)
~11B documents for ~10KB each.
Search queries are script score based exact KNN queries.
When I benchmark the cluster varying clients from 1 to 150 and target-throughput from1 to 200 I see the CPU utilization under 25%. I looked at the EBS read IOPS and throughput are well under the provisioned limits. I am not able to conclude if the search requests are CPU/IO bound but the latency increase as I increase clients and throughput.
|num_clients|target-throughput|Latency p99(p99.9)|service time p99(p99.9)|
Can someone help me understand why the CPU utilization could be low?
Also, I am not able to reason why doubling the number of clients generating the same load increase the service time by 2 times. From service point of view, irrespective of the number of clients generating the same load, the service should handle the same right?
Please share your thoughts.
Monitoring OpenSearch will tell more, but if your index size is way beyond your OS cache (and it sounds like it is, otherwise I don’t see the point of the huge EBS volume), your bottleneck is likely disk latency.
That’s because most search workloads (especially full-text search) will run on one thread doing a ton of IO operations sequentially. So even though your EBS supports a lot more IOPS than you do at peak load, it can’t serve them because the search threads are waiting.
If service time is doubling, it’s likely because you’re still using the same number of threads in the search thread pool (monitoring the thread pool or playing with the number of threads might reveal interesting things). You can also try concurrent segment search but ultimately I think that EBS is your limit. I’d highly recommend ephemeral drives (see storage-optimized instances backed by NVMe SSDs). You’ll likely be able to squeeze a lot more performance for the same budget. Sure, with the nodes you’d also lose data, but they you can have replicas and backups to compensate.
Thanks for the reply.
FYI, I’m using AWS managed OpenSearch Service. I don’t get to play with search threads.
I did look at the ReadLatency metric reported by EBS volumes. It’s less 1ms during benchmark.
JVM heap utilization hovers around 60%. I don’t think JVM GC is playing a role in increase in latency.
I’ve spent quite sometime on this and I still can’t find the bottleneck of my exact KNN search query.
Just try with an NVMe-backed instance
A latency of 1ms means that on a single thread you get 1K IOPS, which would be really bad. I’ve seen many EBS-backed instances suffer from the exact symptoms you’re describing.