Open Search cluster is running high cpu and response time is also high

imdbtoimdb · October 2, 2023, 11:35pm

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
version" : {
“distribution” : “opensearch”,
“number” : “1.2.2”,
“build_type” : “tar”,
“build_hash” : “123d41ce4fad54529acd7a290efed848e707b624”,
“build_date” : “2021-12-15T18:03:07.761961Z”,
“build_snapshot” : false,
“lucene_version” : “8.10.1”,
“minimum_wire_compatibility_version” : “6.8.0”,
“minimum_index_compatibility_version” : “6.0.0-beta1”
},

Describe the issue:
Hello All,
We recently migrated our search cluster from ES ( “version” : {
“number” : “6.8.16”,
“build_flavor” : “default”,
“build_type” : “deb”,
“build_hash” : “1f62092”,
“build_date” : “2021-05-21T19:27:57.985321Z”,
“build_snapshot” : false,
“lucene_version” : “7.7.3”,
“minimum_wire_compatibility_version” : “5.6.0”,
“minimum_index_compatibility_version” : “5.0.0”
},

to open search . Wrt the H/W both are same . But we are seeing high cpu spike in open search cluster . not sure what are all the next step . Any help would be appreciated

Configuration:
No of shards : 24 on both OS and ES ( replica count is 1 per primary shard ) .

Relevant Logs or Screenshots:

radu.gheorghe · October 4, 2023, 1:13pm

I think the next step is to monitor OpenSearch and see what’s taking CPU. It could be GC, it could be indexing, queries or maybe some other thread pool that’s doing work. We wrote a metrics guide for Elasticsearch a while ago that mostly applies to OpenSearch.

imdbtoimdb · October 5, 2023, 6:18pm

Thanks for the kind reply. After analyze the profile output seeing most of the call taking more time in the TermQuery . Any suggestions to fix this one ?
Thanks

radu.gheorghe · October 6, 2023, 5:04am

TermQuery is the most basic query that can run, you can’t really optimize that.

Actually you can (with a more aggressive merge policy - the blog post is about Solr, but you have similar options in OpenSearch) but usually the problem is higher up. For example, the number of TermQuery clauses, the layout of your data, number of shards, how well they’re balanced, etc.

imdbtoimdb · October 9, 2023, 11:00pm

Thanks for the kind response . One thing we are seeing compare to ES to OS is from the Hardware perspective both are same machines . But in OS profiler data shows the term query section taking ~240ms but the same data ES is taking ~92 ms . Hence not sure what is the problem . Here both shards are same ( 16 ) and 8 data nodes

radu.gheorghe · October 10, 2023, 8:10am

I don’t know why you’d see that difference besides:

the “random” distribution of documents between shards, if you’re letting OS/ES chose IDs
the “random” nature of merges, when they kick in
the Lucene version

And you can’t do much about any of the above. Which is why I’d generally suggest concentrating on optimizing what you have vs comparing to what you had before. Unless you’re still deciding whether to make the upgrade or not.

imdbtoimdb · October 19, 2023, 3:14am

thanks for the reply . i have increased the shards count to 32 and seeing better performance . Any idea how this shards count plays major role here . still not able to connect these dots .

radu.gheorghe · October 26, 2023, 2:17pm

With more shards you’re parallelizing queries more. But there’s also more overhead in merging per-shard results.

Maybe concurrent segment search will help you? Introducing concurrent segment search in OpenSearch · OpenSearch

imdbtoimdb · November 3, 2023, 12:22am

interesting fact is i created a new cluster with same data ( created a new index and back-filled the data ) . but now i am seeing how response time . Any specific warm ups / warm up period required ? its really super surprising now

radu.gheorghe · November 3, 2023, 5:29am

Yeah, if you run a query right after ingesting a lot of data, it might be that the OS cache doesn’t have everything it needs in the page cache. Plus, all Elasticsearch-specific query-related caches (query cache, request cache) will be cold.

imdbtoimdb · November 6, 2023, 3:19am

Thanks. is there any way can we improve the things? since we didnt use the k-nn option while create the index since its prod index .

imdbtoimdb · November 9, 2023, 7:38pm

Adding more updates :
Here we added query cache to 20% which helped little bit . But still the cluster CPU is not lowering . On the other side the Elastic search cluster which handling the load very efficient. Any more recommendations would be great ?

Topic		Replies	Views
Massive indexing performance degradation after 1.2.4->1.3.1 update OpenSearch troubleshoot	10	1379	April 15, 2022
CPU usage get spiked(100%) intermittently in OpenSearch cluster and it causing all the search operation to fail OpenSearch discuss , troubleshoot , upgrade	2	860	August 29, 2023
High cpu on data nodes OpenSearch troubleshoot	4	389	August 20, 2024
Low CPU utilization OpenSearch	3	812	November 3, 2023
Search request rate imbalance OpenSearch troubleshoot	2	253	July 10, 2024

Open Search cluster is running high cpu and response time is also high

Related topics