Avoid re-sorting when initializing TermInSetQuery

david1 · March 20, 2025, 9:25pm

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

Opensearch version 2.16

Describe the issue:

In our Opensearch cluster, we’ve noticed a significant portion cpu time spent on sorting terms when initializing TermInSetQuery objects (specifically this sort call in Lucene’s TermInSetQuery.packTerms() function). However we make sure to presort the terms before constructing our retrieval query, so this is unexpected behavior.

Looking through the code a bit more, I see Lucene will skip sorting if the terms are passed as a SortedSet object (see code here), but it doesn’t look like Opensearch has any option to do this. I see we always pass a BytesRef here.

I wanted to confirm that my understanding here is correct. Is there any way to skip re-sorting terms if we’ve presorted them in the retrieval query, or would it require a code change to add this behavior?

Configuration:

Relevant Logs or Screenshots:

mkhl · March 27, 2025, 11:22am

Hi,
I’m afraid it requires code change. Also it’s worth to check a particular Lucene version from 2.16. It might be a valuable improvement for OS, I suppose.

mkhl · March 27, 2025, 6:51pm

hold my beer pass in order terms as sorted to TermInSetQuery() by mkhludnev · Pull Request #17714 · opensearch-project/OpenSearch · GitHub

mkhl · April 2, 2025, 8:04pm

One more idea for optimization for the certain edge case Reuse packedTerms between two TermInSetQuery what combined with IndexOrDocValuesQuery · Issue #14425 · apache/lucene · GitHub

Topic		Replies	Views
Query returns wrong sorting when sorting with scaled_float field OpenSearch discuss , troubleshoot	0	23	April 9, 2025
Intermittent slow queries due to sort condition OpenSearch	0	64	January 21, 2025
Losing top documents when query reaches the `terminate_after` limit OpenSearch troubleshoot	3	463	November 3, 2023
Term aggregations - cannot order by keyword field OpenSearch	2	658	August 1, 2022
Create a DSL query to return content with special character OpenSearch	1	1526	September 7, 2023

Avoid re-sorting when initializing TermInSetQuery

Related topics