High load caused by query

Versions:
OpenSeearch 2.17.1
Debian GNU/Linux 12 (bookworm)
Happens with all browsers

Describe the issue:
When executing queries (boolean/must; nothing too complex) sometimes the server load rises high (up to 18) and then stays there. We can see lots of child processes that never end.

This brings the server down. Only restarting OpenSearch helps.

HINT: This is not during index time, but during query time.

Configuration:
We run a single server installation (12 cores, 48 GB RAM) of OpenSearch. There are several hundred thousand documents in five different indexes. The largest index in size has about 2GB (~20k documents). The largest index in number of documents has ~430k documents.

OpenSearch has been installed via the official package manager.

Relevant Logs or Screenshots:

There are not much logs. The only “unusual” entries seem to be

{"type": "server", "timestamp": "2024-11-07T15:00:20,361+01:00", "level": "ERROR", "component": "o.o.p.c.j.GCMetrics", "cluster.name": "opensearch", "node.name": "nde1", "message": "MX bean missing: G1 Concurrent GC" }

and

{"type": "server", "timestamp": "2024-11-07T15:00:22,967+01:00", "level": "ERROR", "component": "o.o.s.l.BuiltinLogTypeLoader", "cluster.name": "opensearch", "node.name": "nde1", "message": "Failed loading builtin log types from disk!", 

during start up.

This is very interesting, I would assume that you are using the bundled JDK. Could you please share if you set JVM parameters (heap, GC collector, etc). Also, could you capture hot threads [1] so to understand what is on CPU. Thank you.

Thanks for the quick reply on this. Here are the start parameters:

ps ax | grep
	 -i search
1261285 ?        Ssl    0:56 /usr/share/opensearch/jdk/bin/java
	 -Xshare:auto
	 -Dopensearch.networkaddress.cache.ttl=60
	 -Dopensearch.networkaddress.cache.negative.ttl=10
	 -XX:+AlwaysPreTouch
	 -Xss1m
	 -Djava.awt.headless=true
	 -Dfile.encoding=UTF-8
	 -Djna.nosys=true
	 -XX:-OmitStackTraceInFastThrow
	 -XX:+ShowCodeDetailsInExceptionMessages
	 -Dio.netty.noUnsafe=true
	 -Dio.netty.noKeySetOptimization=true
	 -Dio.netty.recycler.maxCapacityPerThread=0
	 -Dio.netty.allocator.numDirectArenas=0
	 -Dlog4j.shutdownHookEnabled=false
	 -Dlog4j2.disable.jmx=true
	 -Djava.security.manager=allow
	 -Djava.locale.providers=SPI,COMPAT
	 -Xms16g
	 -Xmx16g
	 -XX:+UseG1GC
	 -XX:G1ReservePercent=25
	 -XX:InitiatingHeapOccupancyPercent=30
	 -Djava.io.tmpdir=/tmp/opensearch-8518488793491711705
	 -XX:+HeapDumpOnOutOfMemoryError
	 -XX:HeapDumpPath=/var/lib/opensearch
	 -XX:ErrorFile=/var/log/opensearch/hs_err_pid%p.log
	 -Xlog:gc*,gc+age=trace,safepoint:file=/var/log/opensearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m
	 -Djava.security.manager=allow
	 --add-modules=jdk.incubator.vector
	 -Djava.util.concurrent.ForkJoinPool.common.threadFactory=org.opensearch.secure_sm.SecuredForkJoinWorkerThreadFactory
	 -Dclk.tck=100
	 -Djdk.attach.allowAttachSelf=true
	 -Djava.security.policy=file:///etc/opensearch/opensearch-performance-analyzer/opensearch_security.policy
	 --add-opens=jdk.attach/sun.tools.attach=ALL-UNNAMED
	 -XX:MaxDirectMemorySize=8589934592
	 -Dopensearch.path.home=/usr/share/opensearch
	 -Dopensearch.path.conf=/etc/opensearch
	 -Dopensearch.distribution.type=deb
	 -Dopensearch.bundled_jdk=true
	 -cp /usr/share/opensearch/lib/* org.opensearch.bootstrap.OpenSearch
	 -p /var/run/opensearch/opensearch.pid
	 --quiet

Except for memory, we didn’t do any changes to what the Debian package manager sets up.

This is what the hot_threads API shows.

~$ curl http://localhost:9200/_nodes/nde1/hot_threads
::: {nde1}{qwsxpxLoSHivki3EhtWqvg}{JN3zX6BXSeWIHwJ2PcHkHA}{127.0.0.1}{127.0.0.1:9300}{dimr}{shard_indexing_pressure_enabled=true}
   Hot threads at 2024-11-08T12:15:46.139Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
   86.6% (433.1ms out of 500ms) cpu usage by thread 'opensearch[nde1][search][T#18]'
     5/10 snapshots sharing following 16 elements
       app//org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:676)
       app//org.opensearch.search.SearchService$2$$Lambda/0x00007fe109225c80.get(Unknown Source)
       app//org.opensearch.search.SearchService$$Lambda/0x00007fe109225ea0.get(Unknown Source)
       app//org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
       app//org.opensearch.action.ActionRunnable$$Lambda/0x00007fe1090bf800.accept(Unknown Source)
       app//org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59)
       app//org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1005)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
       java.base@21.0.4/java.lang.Thread.runWith(Thread.java:1596)
       java.base@21.0.4/java.lang.Thread.run(Thread.java:1583)
   
   72.7% (363.5ms out of 500ms) cpu usage by thread 'opensearch[nde1][search][T#2]'
     3/10 snapshots sharing following 40 elements
       app//org.apache.lucene.search.AbstractMultiTermQueryConstantScoreWrapper$RewritingWeight$1.get(AbstractMultiTermQueryConstantScoreWrapper.java:269)
       app//org.apache.lucene.search.DisjunctionMaxQuery$DisjunctionMaxWeight$1.get(DisjunctionMaxQuery.java:154)
       app//org.apache.lucene.search.Boolean2ScorerSupplier.opt(Boolean2ScorerSupplier.java:252)
       app//org.apache.lucene.search.Boolean2ScorerSupplier.getInternal(Boolean2ScorerSupplier.java:139)
       app//org.apache.lucene.search.Boolean2ScorerSupplier.get(Boolean2ScorerSupplier.java:111)
       app//org.apache.lucene.search.Boolean2ScorerSupplier.req(Boolean2ScorerSupplier.java:184)
       app//org.apache.lucene.search.Boolean2ScorerSupplier.getInternal(Boolean2ScorerSupplier.java:161)
       app//org.apache.lucene.search.Boolean2ScorerSupplier.get(Boolean2ScorerSupplier.java:111)
       app//org.apache.lucene.search.Weight.bulkScorer(Weight.java:173)
       app//org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:449)
       app//org.opensearch.search.internal.ContextIndexSearcher$1.bulkScorer(ContextIndexSearcher.java:394)
       app//org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:335)
       app//org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:289)
       app//org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:560)
       app//org.opensearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:355)
       app//org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:462)
       app//org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:450)
       app//org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWith(QueryPhase.java:432)
       app//org.opensearch.search.query.QueryPhaseSearcherWrapper.searchWith(QueryPhaseSearcherWrapper.java:60)
       org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWith(HybridQueryPhaseSearcher.java:61)
       app//org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:282)
       app//org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:155)
       app//org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:643)
       app//org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:707)
       app//org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:676)
       app//org.opensearch.search.SearchService$2$$Lambda/0x00007fe109225c80.get(Unknown Source)
       app//org.opensearch.search.SearchService$$Lambda/0x00007fe109225ea0.get(Unknown Source)
       app//org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
       app//org.opensearch.action.ActionRunnable$$Lambda/0x00007fe1090bf800.accept(Unknown Source)
       app//org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59)
       app//org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1005)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
       java.base@21.0.4/java.lang.Thread.runWith(Thread.java:1596)
       java.base@21.0.4/java.lang.Thread.run(Thread.java:1583)
     5/10 snapshots sharing following 11 elements
       java.base@21.0.4/jdk.internal.misc.Unsafe.park(Native Method)
       java.base@21.0.4/java.util.concurrent.locks.LockSupport.park(LockSupport.java:371)
       java.base@21.0.4/java.util.concurrent.LinkedTransferQueue$DualNode.await(LinkedTransferQueue.java:458)
       java.base@21.0.4/java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:613)
       java.base@21.0.4/java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1257)
       app//org.opensearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:178)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1070)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
       java.base@21.0.4/java.lang.Thread.runWith(Thread.java:1596)
       java.base@21.0.4/java.lang.Thread.run(Thread.java:1583)
     2/10 snapshots sharing following 16 elements
       app//org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:676)
       app//org.opensearch.search.SearchService$2$$Lambda/0x00007fe109225c80.get(Unknown Source)
       app//org.opensearch.search.SearchService$$Lambda/0x00007fe109225ea0.get(Unknown Source)
       app//org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
       app//org.opensearch.action.ActionRunnable$$Lambda/0x00007fe1090bf800.accept(Unknown Source)
       app//org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59)
       app//org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1005)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
       java.base@21.0.4/java.lang.Thread.runWith(Thread.java:1596)
       java.base@21.0.4/java.lang.Thread.run(Thread.java:1583)
   
   68.0% (339.9ms out of 500ms) cpu usage by thread 'opensearch[nde1][search][T#3]'
     7/10 snapshots sharing following 33 elements
       app//org.apache.lucene.search.Boolean2ScorerSupplier.get(Boolean2ScorerSupplier.java:111)
       app//org.apache.lucene.search.Weight.bulkScorer(Weight.java:173)
       app//org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:449)
       app//org.opensearch.search.internal.ContextIndexSearcher$1.bulkScorer(ContextIndexSearcher.java:394)
       app//org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:335)
       app//org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:289)
       app//org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:560)
       app//org.opensearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:355)
       app//org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:462)
       app//org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:450)
       app//org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWith(QueryPhase.java:432)
       app//org.opensearch.search.query.QueryPhaseSearcherWrapper.searchWith(QueryPhaseSearcherWrapper.java:60)
       org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWith(HybridQueryPhaseSearcher.java:61)
       app//org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:282)
       app//org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:155)
       app//org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:643)
       app//org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:707)
       app//org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:676)
       app//org.opensearch.search.SearchService$2$$Lambda/0x00007fe109225c80.get(Unknown Source)
       app//org.opensearch.search.SearchService$$Lambda/0x00007fe109225ea0.get(Unknown Source)
       app//org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
       app//org.opensearch.action.ActionRunnable$$Lambda/0x00007fe1090bf800.accept(Unknown Source)
       app//org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59)
       app//org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1005)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
       java.base@21.0.4/java.lang.Thread.runWith(Thread.java:1596)
       java.base@21.0.4/java.lang.Thread.run(Thread.java:1583)
     3/10 snapshots sharing following 11 elements
       java.base@21.0.4/jdk.internal.misc.Unsafe.park(Native Method)
       java.base@21.0.4/java.util.concurrent.locks.LockSupport.park(LockSupport.java:371)
       java.base@21.0.4/java.util.concurrent.LinkedTransferQueue$DualNode.await(LinkedTransferQueue.java:458)
       java.base@21.0.4/java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:613)
       java.base@21.0.4/java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1257)
       app//org.opensearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:178)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1070)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
       java.base@21.0.4/java.lang.Thread.runWith(Thread.java:1596)
       java.base@21.0.4/java.lang.Thread.run(Thread.java:1583)

Ok, it seems like you are using neural-search plugin and hybrid query, right?

That’s an interesting observation. Well, apparently neural-search is loaded/used, but that’s not intentionally. We use OpenSearch as a very boring search engine. Just storing documents and using fulltext search to retrieve them.

The query we use is pretty simple: just a bool.must.query_string.query alongside with a few bool.should.terms, no additonal filter or similar.

I’ll try to disable neural-search, as I don’t think we need it.

1 Like

Hi,
I’m working witrh Osnard on the same issue.
I removed neural-search and restarted opensearch but i don’t see any changes on this topic. What should be the expected outcome?
Greetings


It looks like it did not any good on the load.

This is what the hot_threads API shows:

root@nde1:/usr/share/opensearch# curl http://localhost:9200/_nodes/nde1/hot_threads
::: {nde1}{qwsxpxLoSHivki3EhtWqvg}{gq_66lZ2SXemLS8Q9LygaA}{127.0.0.1}{127.0.0.1:9300}{dimr}{shard_indexing_pressure_enabled=true}
   Hot threads at 2024-11-11T11:05:07.004Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
   75.1% (375.4ms out of 500ms) cpu usage by thread 'opensearch[nde1][search][T#9]'
     2/10 snapshots sharing following 40 elements
       app//org.apache.lucene.util.Sorter.binarySort(Sorter.java:212)
       app//org.apache.lucene.util.InPlaceMergeSorter.mergeSort(InPlaceMergeSorter.java:42)
       app//org.apache.lucene.util.InPlaceMergeSorter.sort(InPlaceMergeSorter.java:37)
       app//org.apache.lucene.util.automaton.Automaton.finishCurrentState(Automaton.java:239)
       app//org.apache.lucene.util.automaton.Automaton.addTransition(Automaton.java:158)
       app//org.apache.lucene.util.automaton.Operations.removeDeadStates(Operations.java:1031)
       app//org.apache.lucene.util.automaton.LevenshteinAutomata.toAutomaton(LevenshteinAutomata.java:219)
       app//org.apache.lucene.search.FuzzyAutomatonBuilder.buildAutomatonSet(FuzzyAutomatonBuilder.java:63)
       app//org.apache.lucene.search.FuzzyTermsEnum$AutomatonAttributeImpl.init(FuzzyTermsEnum.java:391)
       app//org.apache.lucene.search.FuzzyTermsEnum.<init>(FuzzyTermsEnum.java:149)
       app//org.apache.lucene.search.FuzzyTermsEnum.<init>(FuzzyTermsEnum.java:126)
       app//org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:208)
       app//org.apache.lucene.search.MultiTermQuery$RewriteMethod.getTermsEnum(MultiTermQuery.java:68)
       app//org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:57)
       app//org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67)
       app//org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:326)
       app//org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:326)
       app//org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:326)
       app//org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:777)
       app//org.opensearch.search.internal.ContextIndexSearcher.rewrite(ContextIndexSearcher.java:197)
       app//org.opensearch.search.DefaultSearchContext.preProcess(DefaultSearchContext.java:388)
       app//org.opensearch.search.query.QueryPhase.preProcess(QueryPhase.java:127)
       app//org.opensearch.search.SearchService.createContext(SearchService.java:1117)
       app//org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:703)
       app//org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:676)
       app//org.opensearch.search.SearchService$2$$Lambda/0x00007f12951df4d0.get(Unknown Source)
       app//org.opensearch.search.SearchService$$Lambda/0x00007f12951df6f0.get(Unknown Source)
       app//org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
       app//org.opensearch.action.ActionRunnable$$Lambda/0x00007f129508c660.accept(Unknown Source)
       app//org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59)
       app//org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1005)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
       java.base@21.0.4/java.lang.Thread.runWith(Thread.java:1596)
       java.base@21.0.4/java.lang.Thread.run(Thread.java:1583)
     8/10 snapshots sharing following 10 elements
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59)
       app//org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1005)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
       java.base@21.0.4/java.lang.Thread.runWith(Thread.java:1596)
       java.base@21.0.4/java.lang.Thread.run(Thread.java:1583)
   
   73.6% (368ms out of 500ms) cpu usage by thread 'opensearch[nde1][search][T#17]'
     2/10 snapshots sharing following 36 elements
       app//org.apache.lucene.util.automaton.Operations.getLiveStates(Operations.java:928)
       app//org.apache.lucene.util.automaton.Operations.removeDeadStates(Operations.java:1009)
       app//org.apache.lucene.util.automaton.LevenshteinAutomata.toAutomaton(LevenshteinAutomata.java:219)
       app//org.apache.lucene.search.FuzzyAutomatonBuilder.buildAutomatonSet(FuzzyAutomatonBuilder.java:63)
       app//org.apache.lucene.search.FuzzyTermsEnum$AutomatonAttributeImpl.init(FuzzyTermsEnum.java:391)
       app//org.apache.lucene.search.FuzzyTermsEnum.<init>(FuzzyTermsEnum.java:149)
       app//org.apache.lucene.search.FuzzyTermsEnum.<init>(FuzzyTermsEnum.java:126)
       app//org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:208)
       app//org.apache.lucene.search.MultiTermQuery$RewriteMethod.getTermsEnum(MultiTermQuery.java:68)
       app//org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:57)
       app//org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67)
       app//org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:326)
       app//org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:326)
       app//org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:326)
       app//org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:777)
       app//org.opensearch.search.internal.ContextIndexSearcher.rewrite(ContextIndexSearcher.java:197)
       app//org.opensearch.search.DefaultSearchContext.preProcess(DefaultSearchContext.java:388)
       app//org.opensearch.search.query.QueryPhase.preProcess(QueryPhase.java:127)
       app//org.opensearch.search.SearchService.createContext(SearchService.java:1117)
       app//org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:703)
       app//org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:676)
       app//org.opensearch.search.SearchService$2$$Lambda/0x00007f12951df4d0.get(Unknown Source)
       app//org.opensearch.search.SearchService$$Lambda/0x00007f12951df6f0.get(Unknown Source)
       app//org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
       app//org.opensearch.action.ActionRunnable$$Lambda/0x00007f129508c660.accept(Unknown Source)
       app//org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59)
       app//org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1005)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
       java.base@21.0.4/java.lang.Thread.runWith(Thread.java:1596)
       java.base@21.0.4/java.lang.Thread.run(Thread.java:1583)
     5/10 snapshots sharing following 22 elements
       app//org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:326)
       app//org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:777)
       app//org.opensearch.search.internal.ContextIndexSearcher.rewrite(ContextIndexSearcher.java:197)
       app//org.opensearch.search.DefaultSearchContext.preProcess(DefaultSearchContext.java:388)
       app//org.opensearch.search.query.QueryPhase.preProcess(QueryPhase.java:127)
       app//org.opensearch.search.SearchService.createContext(SearchService.java:1117)
       app//org.opensearch.search.SearchService.lambda$executeFetchPhase$4(SearchService.java:876)
       app//org.opensearch.search.SearchService$$Lambda/0x00007f12952129c0.get(Unknown Source)
       app//org.opensearch.search.SearchService$$Lambda/0x00007f12951df6f0.get(Unknown Source)
       app//org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
       app//org.opensearch.action.ActionRunnable$$Lambda/0x00007f129508c660.accept(Unknown Source)
       app//org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59)
       app//org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1005)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
       java.base@21.0.4/java.lang.Thread.runWith(Thread.java:1596)
       java.base@21.0.4/java.lang.Thread.run(Thread.java:1583)
     2/10 snapshots sharing following 16 elements
       app//org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:676)
       app//org.opensearch.search.SearchService$2$$Lambda/0x00007f12951df4d0.get(Unknown Source)
       app//org.opensearch.search.SearchService$$Lambda/0x00007f12951df6f0.get(Unknown Source)
       app//org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
       app//org.opensearch.action.ActionRunnable$$Lambda/0x00007f129508c660.accept(Unknown Source)
       app//org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59)
       app//org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1005)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
       java.base@21.0.4/java.lang.Thread.runWith(Thread.java:1596)
       java.base@21.0.4/java.lang.Thread.run(Thread.java:1583)
     unique snapshot
       app//org.apache.lucene.index.LeafReader.<init>(LeafReader.java:49)
       app//org.apache.lucene.index.FilterLeafReader.<init>(FilterLeafReader.java:325)
       app//org.opensearch.common.lucene.index.SequentialStoredFieldsLeafReader.<init>(SequentialStoredFieldsLeafReader.java:59)
       app//org.opensearch.search.internal.ExitableDirectoryReader$ExitableLeafReader.<init>(ExitableDirectoryReader.java:105)
       app//org.opensearch.search.internal.ExitableDirectoryReader$1.wrap(ExitableDirectoryReader.java:82)
       app//org.apache.lucene.index.FilterDirectoryReader$SubReaderWrapper.wrap(FilterDirectoryReader.java:61)
       app//org.apache.lucene.index.FilterDirectoryReader.<init>(FilterDirectoryReader.java:91)
       app//org.opensearch.search.internal.ExitableDirectoryReader.<init>(ExitableDirectoryReader.java:79)
       app//org.opensearch.search.internal.ContextIndexSearcher.<init>(ContextIndexSearcher.java:145)
       app//org.opensearch.search.internal.ContextIndexSearcher.<init>(ContextIndexSearcher.java:123)
       app//org.opensearch.search.DefaultSearchContext.<init>(DefaultSearchContext.java:251)
       app//org.opensearch.search.SearchService.createSearchContext(SearchService.java:1167)
       app//org.opensearch.search.SearchService.createContext(SearchService.java:1100)
       app//org.opensearch.search.SearchService.lambda$executeFetchPhase$4(SearchService.java:876)
       app//org.opensearch.search.SearchService$$Lambda/0x00007f12952129c0.get(Unknown Source)
       app//org.opensearch.search.SearchService$$Lambda/0x00007f12951df6f0.get(Unknown Source)
       app//org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
       app//org.opensearch.action.ActionRunnable$$Lambda/0x00007f129508c660.accept(Unknown Source)
       app//org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59)
       app//org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1005)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
       java.base@21.0.4/java.lang.Thread.runWith(Thread.java:1596)
       java.base@21.0.4/java.lang.Thread.run(Thread.java:1583)
   
   67.1% (335.5ms out of 500ms) cpu usage by thread 'opensearch[nde1][search][T#7]'
     2/10 snapshots sharing following 34 elements
       app//org.apache.lucene.util.automaton.CompiledAutomaton.<init>(CompiledAutomaton.java:236)
       app//org.apache.lucene.util.automaton.CompiledAutomaton.<init>(CompiledAutomaton.java:135)
       app//org.apache.lucene.search.FuzzyAutomatonBuilder.buildAutomatonSet(FuzzyAutomatonBuilder.java:63)
       app//org.apache.lucene.search.FuzzyTermsEnum$AutomatonAttributeImpl.init(FuzzyTermsEnum.java:391)
       app//org.apache.lucene.search.FuzzyTermsEnum.<init>(FuzzyTermsEnum.java:149)
       app//org.apache.lucene.search.FuzzyTermsEnum.<init>(FuzzyTermsEnum.java:126)
       app//org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:208)
       app//org.apache.lucene.search.MultiTermQuery$RewriteMethod.getTermsEnum(MultiTermQuery.java:68)
       app//org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:57)
       app//org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67)
       app//org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:326)
       app//org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:326)
       app//org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:326)
       app//org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:777)
       app//org.opensearch.search.internal.ContextIndexSearcher.rewrite(ContextIndexSearcher.java:197)
       app//org.opensearch.search.DefaultSearchContext.preProcess(DefaultSearchContext.java:388)
       app//org.opensearch.search.query.QueryPhase.preProcess(QueryPhase.java:127)
       app//org.opensearch.search.SearchService.createContext(SearchService.java:1117)
       app//org.opensearch.search.SearchService.lambda$executeFetchPhase$4(SearchService.java:876)
       app//org.opensearch.search.SearchService$$Lambda/0x00007f12952129c0.get(Unknown Source)
       app//org.opensearch.search.SearchService$$Lambda/0x00007f12951df6f0.get(Unknown Source)
       app//org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
       app//org.opensearch.action.ActionRunnable$$Lambda/0x00007f129508c660.accept(Unknown Source)
       app//org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59)
       app//org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1005)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
       java.base@21.0.4/java.lang.Thread.runWith(Thread.java:1596)
       java.base@21.0.4/java.lang.Thread.run(Thread.java:1583)
     3/10 snapshots sharing following 30 elements
       app//org.apache.lucene.search.FuzzyTermsEnum.<init>(FuzzyTermsEnum.java:126)
       app//org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:208)
       app//org.apache.lucene.search.MultiTermQuery$RewriteMethod.getTermsEnum(MultiTermQuery.java:68)
       app//org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:57)
       app//org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67)
       app//org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:326)
       app//org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:326)
       app//org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:326)
       app//org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:777)
       app//org.opensearch.search.internal.ContextIndexSearcher.rewrite(ContextIndexSearcher.java:197)
       app//org.opensearch.search.DefaultSearchContext.preProcess(DefaultSearchContext.java:388)
       app//org.opensearch.search.query.QueryPhase.preProcess(QueryPhase.java:127)
       app//org.opensearch.search.SearchService.createContext(SearchService.java:1117)
       app//org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:703)
       app//org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:676)
       app//org.opensearch.search.SearchService$2$$Lambda/0x00007f12951df4d0.get(Unknown Source)
       app//org.opensearch.search.SearchService$$Lambda/0x00007f12951df6f0.get(Unknown Source)
       app//org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
       app//org.opensearch.action.ActionRunnable$$Lambda/0x00007f129508c660.accept(Unknown Source)
       app//org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59)
       app//org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1005)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
       java.base@21.0.4/java.lang.Thread.runWith(Thread.java:1596)
       java.base@21.0.4/java.lang.Thread.run(Thread.java:1583)
     2/10 snapshots sharing following 29 elements
       app//org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnum.next(IntersectTermsEnum.java:377)
       app//org.opensearch.search.internal.ExitableDirectoryReader$ExitableTermsEnum.next(ExitableDirectoryReader.java:197)
       app//org.apache.lucene.search.FuzzyTermsEnum.next(FuzzyTermsEnum.java:230)
       app//org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:65)
       app//org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67)
       app//org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:326)
       app//org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:326)
       app//org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:326)
       app//org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:777)
       app//org.opensearch.search.internal.ContextIndexSearcher.rewrite(ContextIndexSearcher.java:197)
       app//org.opensearch.search.DefaultSearchContext.preProcess(DefaultSearchContext.java:388)
       app//org.opensearch.search.query.QueryPhase.preProcess(QueryPhase.java:127)
       app//org.opensearch.search.SearchService.createContext(SearchService.java:1117)
       app//org.opensearch.search.SearchService.lambda$executeFetchPhase$4(SearchService.java:876)
       app//org.opensearch.search.SearchService$$Lambda/0x00007f12952129c0.get(Unknown Source)
       app//org.opensearch.search.SearchService$$Lambda/0x00007f12951df6f0.get(Unknown Source)
       app//org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
       app//org.opensearch.action.ActionRunnable$$Lambda/0x00007f129508c660.accept(Unknown Source)
       app//org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59)
       app//org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1005)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
       java.base@21.0.4/java.lang.Thread.runWith(Thread.java:1596)
       java.base@21.0.4/java.lang.Thread.run(Thread.java:1583)
     3/10 snapshots sharing following 10 elements
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       app//org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59)
       app//org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1005)
       app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
       java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
       java.base@21.0.4/java.lang.Thread.runWith(Thread.java:1596)
       java.base@21.0.4/java.lang.Thread.run(Thread.java:1583)


The only conclusion i could come up with (based on the hot treads) is that there are fuzzy queries running continuously on the node. This kind of queries is considered expensive [1] so that may explain the high CPU usage.

[1] Query DSL | Elasticsearch Guide [7.10] | Elastic

Yes thats right. We are running fuzzy Queries to catch typos (for “did you mean:”)
But it is a text search so we cannot just deactivate that without giving up a major feature.

I think you may consider scaling up (add more servers or CPUs)