Cluster response time for queries getting high

We have a single node opendistro cluster which has around 34 indices. Each index has 2 shards and 0 replicas. The index rolls over every week and is around 20 GB.
The latest index grew to be around 24GB when we started seeing issue. The cluster started getting slower. Below are the hot threads. Can someone help with understanding what the hot threads are doing?
(We were able to resolve the issue by creating a new index and used it as the hot index)
::: {VIG8-ELA1}{qoqhGIOjSyuGU4OyzJ9Amw}{E2HwwjGtT9uAkzgDul8k7Q}{...80}{...80:9300}{dimr}
Hot threads at 2023-11-27T21:24:52.797Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
18.2% (91.1ms out of 500ms) cpu usage by thread ‘elasticsearch[VIG8-ELA1][search][T#11]’
7/10 snapshots sharing following 10 elements
org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:58)
org.elasticsearch.action.ActionRunnable$$Lambda$2929/952421299.accept(Unknown Source)
org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:743)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
3/10 snapshots sharing following 10 elements
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:737)
java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:647)
java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1269)
org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:165)
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
17.0% (85.1ms out of 500ms) cpu usage by thread ‘elasticsearch[VIG8-ELA1][warmer][T#4]’
2/10 snapshots sharing following 27 elements
org.apache.lucene.util.packed.Packed64SingleBlock.(Packed64SingleBlock.java:52)
org.apache.lucene.util.packed.Packed64SingleBlock$Packed64SingleBlock1.(Packed64SingleBlock.java:255)
org.apache.lucene.util.packed.Packed64SingleBlock.create(Packed64SingleBlock.java:220)
org.apache.lucene.util.packed.PackedInts.getMutable(PackedInts.java:952)
org.apache.lucene.util.packed.PackedInts.getMutable(PackedInts.java:941)
org.apache.lucene.util.packed.PackedLongValues$Builder.pack(PackedLongValues.java:264)
org.apache.lucene.util.packed.PackedLongValues$Builder.pack(PackedLongValues.java:242)
org.apache.lucene.util.packed.PackedLongValues$Builder.add(PackedLongValues.java:225)
org.apache.lucene.index.OrdinalMap.(OrdinalMap.java:269)
org.apache.lucene.index.OrdinalMap.build(OrdinalMap.java:168)
org.apache.lucene.index.OrdinalMap.build(OrdinalMap.java:147)
org.elasticsearch.index.fielddata.ordinals.GlobalOrdinalsBuilder.build(GlobalOrdinalsBuilder.java:64)
org.elasticsearch.index.fielddata.plain.AbstractIndexOrdinalsFieldData.loadGlobalDirect(AbstractIndexOrdinalsFieldData.java:151)
org.elasticsearch.index.fielddata.plain.AbstractIndexOrdinalsFieldData.loadGlobalDirect(AbstractIndexOrdinalsFieldData.java:44)
org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache.lambda$load$1(IndicesFieldDataCache.java:174)
org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache$$Lambda$3543/1800704155.load(Unknown Source)
org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:433)
org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache.load(IndicesFieldDataCache.java:171)
org.elasticsearch.index.fielddata.plain.AbstractIndexOrdinalsFieldData.loadGlobalInternal(AbstractIndexOrdinalsFieldData.java:139)
org.elasticsearch.index.fielddata.plain.AbstractIndexOrdinalsFieldData.loadGlobal(AbstractIndexOrdinalsFieldData.java:105)
org.elasticsearch.index.fielddata.plain.AbstractIndexOrdinalsFieldData.loadGlobal(AbstractIndexOrdinalsFieldData.java:44)
org.elasticsearch.index.IndexWarmer$FieldDataWarmer.lambda$warmReader$2(IndexWarmer.java:139)
org.elasticsearch.index.IndexWarmer$FieldDataWarmer$$Lambda$3531/1904313045.run(Unknown Source)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:684)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
2/10 snapshots sharing following 24 elements
org.apache.lucene.util.packed.PackedLongValues$Builder.pack(PackedLongValues.java:261)
org.apache.lucene.util.packed.DeltaPackedLongValues$Builder.pack(DeltaPackedLongValues.java:88)
org.apache.lucene.util.packed.MonotonicLongValues$Builder.pack(MonotonicLongValues.java:89)
org.apache.lucene.util.packed.PackedLongValues$Builder.pack(PackedLongValues.java:242)
org.apache.lucene.util.packed.PackedLongValues$Builder.add(PackedLongValues.java:225)
org.apache.lucene.index.OrdinalMap.(OrdinalMap.java:270)
org.apache.lucene.index.OrdinalMap.build(OrdinalMap.java:168)
org.apache.lucene.index.OrdinalMap.build(OrdinalMap.java:147)
org.elasticsearch.index.fielddata.ordinals.GlobalOrdinalsBuilder.build(GlobalOrdinalsBuilder.java:64)
org.elasticsearch.index.fielddata.plain.AbstractIndexOrdinalsFieldData.loadGlobalDirect(AbstractIndexOrdinalsFieldData.java:151)
org.elasticsearch.index.fielddata.plain.AbstractIndexOrdinalsFieldData.loadGlobalDirect(AbstractIndexOrdinalsFieldData.java:44)
org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache.lambda$load$1(IndicesFieldDataCache.java:174)
org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache$$Lambda$3543/1800704155.load(Unknown Source)
org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:433)
org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache.load(IndicesFieldDataCache.java:171)
org.elasticsearch.index.fielddata.plain.AbstractIndexOrdinalsFieldData.loadGlobalInternal(AbstractIndexOrdinalsFieldData.java:139)
org.elasticsearch.index.fielddata.plain.AbstractIndexOrdinalsFieldData.loadGlobal(AbstractIndexOrdinalsFieldData.java:105)
org.elasticsearch.index.fielddata.plain.AbstractIndexOrdinalsFieldData.loadGlobal(AbstractIndexOrdinalsFieldData.java:44)
org.elasticsearch.index.IndexWarmer$FieldDataWarmer.lambda$warmReader$2(IndexWarmer.java:139)
org.elasticsearch.index.IndexWarmer$FieldDataWarmer$$Lambda$3531/1904313045.run(Unknown Source)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:684)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
6/10 snapshots sharing following 18 elements
org.apache.lucene.index.OrdinalMap.build(OrdinalMap.java:168)
org.apache.lucene.index.OrdinalMap.build(OrdinalMap.java:147)
org.elasticsearch.index.fielddata.ordinals.GlobalOrdinalsBuilder.build(GlobalOrdinalsBuilder.java:64)
org.elasticsearch.index.fielddata.plain.AbstractIndexOrdinalsFieldData.loadGlobalDirect(AbstractIndexOrdinalsFieldData.java:151)
org.elasticsearch.index.fielddata.plain.AbstractIndexOrdinalsFieldData.loadGlobalDirect(AbstractIndexOrdinalsFieldData.java:44)
org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache.lambda$load$1(IndicesFieldDataCache.java:174)
org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache$$Lambda$3543/1800704155.load(Unknown Source)
org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:433)
org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache.load(IndicesFieldDataCache.java:171)
org.elasticsearch.index.fielddata.plain.AbstractIndexOrdinalsFieldData.loadGlobalInternal(AbstractIndexOrdinalsFieldData.java:139)
org.elasticsearch.index.fielddata.plain.AbstractIndexOrdinalsFieldData.loadGlobal(AbstractIndexOrdinalsFieldData.java:105)
org.elasticsearch.index.fielddata.plain.AbstractIndexOrdinalsFieldData.loadGlobal(AbstractIndexOrdinalsFieldData.java:44)
org.elasticsearch.index.IndexWarmer$FieldDataWarmer.lambda$warmReader$2(IndexWarmer.java:139)
org.elasticsearch.index.IndexWarmer$FieldDataWarmer$$Lambda$3531/1904313045.run(Unknown Source)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:684)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
15.3% (76.2ms out of 500ms) cpu usage by thread ‘elasticsearch[VIG8-ELA1][search][T#20]’
4/10 snapshots sharing following 26 elements
org.apache.lucene.search.FieldValueHitQueue.(FieldValueHitQueue.java:32)
org.apache.lucene.search.FieldValueHitQueue$MultiComparatorsFieldValueHitQueue.(FieldValueHitQueue.java:99)
org.apache.lucene.search.FieldValueHitQueue.create(FieldValueHitQueue.java:174)
org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:503)
org.apache.lucene.search.TopFieldCollector$1.newCollector(TopFieldCollector.java:534)
org.apache.lucene.search.TopFieldCollector$1.newCollector(TopFieldCollector.java:527)
org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:180)
org.elasticsearch.search.query.QueryPhase.searchWithCollectorManager(QueryPhase.java:397)
org.elasticsearch.search.query.QueryPhase.executeInternal(QueryPhase.java:294)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:148)
org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:372)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:431)
org.elasticsearch.search.SearchService.access$500(SearchService.java:141)
org.elasticsearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:401)
org.elasticsearch.search.SearchService$2$$Lambda$3510/199368869.get(Unknown Source)
org.elasticsearch.search.SearchService$$Lambda$3511/1075842239.get(Unknown Source)
org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:58)
org.elasticsearch.action.ActionRunnable$$Lambda$2929/952421299.accept(Unknown Source)
org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:743)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
2/10 snapshots sharing following 30 elements
java.util.Collections$SynchronizedCollection.add(Collections.java:2037)
org.apache.lucene.index.IndexReader.registerParentReader(IndexReader.java:140)
org.apache.lucene.index.FilterLeafReader.(FilterLeafReader.java:314)
org.elasticsearch.common.lucene.index.SequentialStoredFieldsLeafReader.(SequentialStoredFieldsLeafReader.java:44)
org.elasticsearch.search.internal.ExitableDirectoryReader$ExitableLeafReader.(ExitableDirectoryReader.java:89)
org.elasticsearch.search.internal.ExitableDirectoryReader$ExitableLeafReader.(ExitableDirectoryReader.java:84)
org.elasticsearch.search.internal.ExitableDirectoryReader$1.wrap(ExitableDirectoryReader.java:67)
org.apache.lucene.index.FilterDirectoryReader$SubReaderWrapper.wrap(FilterDirectoryReader.java:62)
org.apache.lucene.index.FilterDirectoryReader.(FilterDirectoryReader.java:91)
org.elasticsearch.search.internal.ExitableDirectoryReader.(ExitableDirectoryReader.java:64)
org.elasticsearch.search.internal.ContextIndexSearcher.(ContextIndexSearcher.java:95)
org.elasticsearch.search.internal.ContextIndexSearcher.(ContextIndexSearcher.java:88)
org.elasticsearch.search.DefaultSearchContext.(DefaultSearchContext.java:182)
org.elasticsearch.search.SearchService.createSearchContext(SearchService.java:768)
org.elasticsearch.search.SearchService.createContext(SearchService.java:721)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:428)
org.elasticsearch.search.SearchService.access$500(SearchService.java:141)
org.elasticsearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:401)
org.elasticsearch.search.SearchService$2$$Lambda$3510/199368869.get(Unknown Source)
org.elasticsearch.search.SearchService$$Lambda$3511/1075842239.get(Unknown Source)
org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:58)
org.elasticsearch.action.ActionRunnable$$Lambda$2929/952421299.accept(Unknown Source)
org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:743)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
2/10 snapshots sharing following 9 elements
org.elasticsearch.action.search.FetchSearchPhase.access$000(FetchSearchPhase.java:47)
org.elasticsearch.action.search.FetchSearchPhase$1.doRun(FetchSearchPhase.java:95)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:743)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
2/10 snapshots sharing following 10 elements
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:737)
java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:647)
java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1269)
org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:165)
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

It seems that the hot threads are rebuilding global ordinals which is used to accelerate aggregations, do you perform aggregations on a text field or a keyword field? When the index becomes large, rebuilding global ordinals will be slower, and every refresh operation on the index can make the global ordinals invalid can cause rebuilding.

It might be better if you rotate by size instead of by week. This way, you should have more consistent index and search performance. Also, building global ordinals on smaller indices will be cheaper.

Also, is it possible to increase the refresh time? That should help. And since I see Warmer threads, I assume you have eager building of global ordinals? Can you experiment with and without and see what works best? Eager building will impact refresh performance but the first query will be faster, while without you should not spend that much time warming but first query after a refresh (with writes) will be slower.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.