Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Opensearch version 2.16
Describe the issue:
In our Opensearch cluster, we’ve noticed a significant portion cpu time spent on sorting terms when initializing TermInSetQuery objects (specifically this sort call in Lucene’s TermInSetQuery.packTerms() function). However we make sure to presort the terms before constructing our retrieval query, so this is unexpected behavior.
Looking through the code a bit more, I see Lucene will skip sorting if the terms are passed as a SortedSet object (see code here), but it doesn’t look like Opensearch has any option to do this. I see we always pass a BytesRef here.
I wanted to confirm that my understanding here is correct. Is there any way to skip re-sorting terms if we’ve presorted them in the retrieval query, or would it require a code change to add this behavior?
Configuration:
Relevant Logs or Screenshots: