Versions (relevant - OpenSearch): 2.11
Describe the issue:
Background
We have an index of ~50K documents, each representing a patient case. Each document has a nested field called data containing ~150 items on average, each with a key (keyword) and a current_value object (val_num, val_dt, val_str, val_bool).
Users define dynamic query schemas at runtime — each schema has a composite filter (cross-item nested conditions like “key X has value x1 AND key Y has value y1”) and a designated sort key (a specific data.key whose current_value to sort by). Multiple schemas are combined into a single query, and the result set is a union of all schema matches, sorted by each schema’s own sort key with schema priority as the primary tier.
The core problem
We need to, in a single query pass:
-
Match documents against N dynamic schemas (each with cross-item nested filter conditions)
-
Determine which schema each matched document belongs to
-
Use that schema membership to select the correct sort key for that document
-
Sort the full result set by those per-document sort values
The fundamental difficulty is that schema membership is a query-time concept that doesn’t survive into sort-time. matched_queries — which would give us schema membership per document — is assembled after the sort phase and is not available to any sort mechanism, native or scripted.
What we’ve tried
Painless _script sort reading _source — our original approach. Works logically but has severe performance implications: the script deserializes the full _source for every candidate document during the sort phase, iterating all ~150 nested items per doc across the entire filtered result set. On our hardware (2-node r6g.large, 4 shards) this causes JVM heap pressure and degrades performance significantly beyond a result window of ~700 docs.
Native nested sort with per-schema filter clauses — replacing Painless with native nested sort clauses, one per schema, each scoped to that schema’s sort key via a term filter. This is fast and avoids _source traversal entirely, but the sort clauses have no awareness of score bands — a document that has another schema’s sort key in its data array produces a real value in the wrong clause instead of _last, corrupting sort order across schemas.
_score as band separator + native nested sort — using constant_score / function_score with distinct boost values (100, 200, 300…) to create non-overlapping score bands per schema, then using _score desc as the primary sort key with per-schema nested sort clauses as tiebreakers. This works correctly when schemas are mutually exclusive (a document matches at most one schema) and sort keys don’t overlap across schemas. It breaks down when documents in a lower priority band happen to carry a higher priority band’s sort key, causing the wrong nested clause to fire as a tiebreaker.
The question
Is there any native OpenSearch mechanism to conditionally apply a nested sort clause based on which named query or score band a document matched? Any suggestions are welcome.
We’re running OpenSearch on AWS managed (2-node r6g.large, 4 shards, ~50K docs, nested data averaging 150 items per document with ~13K distinct keys across the indexed documents).