Low CPU utilization

dhinesh_r · October 13, 2023, 3:41pm

Hi Folks,

I have the following cluster Opensearch 2.5

Nodes: 6 (48vCPUs and 384GB memory)
Shards: 158
EBS volume: 24TB GP3 type (Provisioned IOPS: 50,000 and 1781 Mb/sec throughput per node)
0 replica.
~11B documents for ~10KB each.
Search queries are script score based exact KNN queries.

When I benchmark the cluster varying clients from 1 to 150 and target-throughput from1 to 200 I see the CPU utilization under 25%. I looked at the EBS read IOPS and throughput are well under the provisioned limits. I am not able to conclude if the search requests are CPU/IO bound but the latency increase as I increase clients and throughput.

|num_clients|target-throughput|Latency p99(p99.9)|service time p99(p99.9)|
|1|1|86.38|82.2195|
|2|10|69.89|68.14|
|20|50|87.046(105.091)|85.286(103.194)|
|35|100|119.2(129.823)|117.295(127.525)|
|75|100|231.84(249.435)|228.489(247.742)|
|75|200|235.66(263.879)|231.384(260.173)|
|150|200|437.97(476.326)|427.227(469.457)|

Can someone help me understand why the CPU utilization could be low?

Also, I am not able to reason why doubling the number of clients generating the same load increase the service time by 2 times. From service point of view, irrespective of the number of clients generating the same load, the service should handle the same right?

Please share your thoughts.
Thanks.

radu.gheorghe · October 26, 2023, 2:45pm

Monitoring OpenSearch will tell more, but if your index size is way beyond your OS cache (and it sounds like it is, otherwise I don’t see the point of the huge EBS volume), your bottleneck is likely disk latency.

That’s because most search workloads (especially full-text search) will run on one thread doing a ton of IO operations sequentially. So even though your EBS supports a lot more IOPS than you do at peak load, it can’t serve them because the search threads are waiting.

If service time is doubling, it’s likely because you’re still using the same number of threads in the search thread pool (monitoring the thread pool or playing with the number of threads might reveal interesting things). You can also try concurrent segment search but ultimately I think that EBS is your limit. I’d highly recommend ephemeral drives (see storage-optimized instances backed by NVMe SSDs). You’ll likely be able to squeeze a lot more performance for the same budget. Sure, with the nodes you’d also lose data, but they you can have replicas and backups to compensate.

dhinesh_r · October 26, 2023, 10:08pm

Thanks for the reply.

FYI, I’m using AWS managed OpenSearch Service. I don’t get to play with search threads.

I did look at the ReadLatency metric reported by EBS volumes. It’s less 1ms during benchmark.

JVM heap utilization hovers around 60%. I don’t think JVM GC is playing a role in increase in latency.

I’ve spent quite sometime on this and I still can’t find the bottleneck of my exact KNN search query.

radu.gheorghe · November 3, 2023, 5:41am

Just try with an NVMe-backed instance

A latency of 1ms means that on a single thread you get 1K IOPS, which would be really bad. I’ve seen many EBS-backed instances suffer from the exact symptoms you’re describing.

Topic		Replies	Views
Slow search performance OpenSearch	0	1010	January 16, 2024
Search request rate imbalance OpenSearch troubleshoot	2	250	July 10, 2024
Understanding OpenSearch Scaling OpenSearch troubleshoot , configure	0	769	March 19, 2024
Open Search cluster is running high cpu and response time is also high OpenSearch discuss	11	2704	November 9, 2023
Intermittent higher Response time from my 1 MB index OpenSearch	0	338	January 26, 2023

Low CPU utilization

Related topics