Why MMR is not supported in hybrid queries

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): 3.3

Describe the issue: Why MMR is not supported in hybrid queries

It is due to limitations in hybrid search executor and significant performance risks.

  1. The current hybrid queries executor runs independently and:
    • Merges lexical and vector signals early in execution.

    • Leaves no clear stage to apply MMR on a clean, semantic-only candidate set.

    • MMR must be applied to semantic search results before any score fusion.

  2. Although hybrid search preserves per-channel scores, it:
    • Does not retain the underlying vector embeddings required for MMR.
  3. The Hybrid Search executor:
    • Does not provide a subquery-level reranking API.

    • Cannot rerank semantic results independently.

  4. Adding MMR support inside hybrid queries would:
    • Introduce significant performance risks.

    • Increase memory usage and OOM issues due to additional pairwise vector computations on expanded candidate sets.

Can you confirm the above are the reasons for not supporting hybrid queries?

Performance and memory usage are not the primary blockers. Supporting MMR rerank for hybrid queries will inevitably introduce some additional latency and memory overhead, but this should be manageable in practice as long as the candidate set size is reasonably bounded. It only becomes a serious concern if users configure very large oversampling factors.

The main challenge is architectural rather than performance-related.

Today the flows are:

knn / neural query
→ query phase (oversampled candidates)
→ fetch phase (fetch vectors)
→ MMR rerank
→ response

hybrid query
→ query phase (lexical + vector candidates)
→ normalization & score combination
→ fetch phase
→ response

If we want to apply the MMR rerank before the normalization & score combination the new flow would be like:

hybrid query
→ query phase (lexical + vector candidates)
→ [new] fetch vectors for semantic candidates
→ [new] MMR rerank (semantic sub-query only)
→ normalization & score combination
→ fetch phase
→ response

Theoretically, this is doable, but it would require introducing new components. Applying MMR before score normalization would also require fetching vectors earlier than today’s fetch phase, which would add latency. Because this execution model is significantly different from how we currently support kNN/neural queries, we have not supported MMR for hybrid queries so far.

In parallel, we are also exploring an alternative approach that preserves the existing execution model and reuses current components. Today, MMR already allows users to configure which vector field is used for reranking. We could potentially extend this to hybrid queries by oversampling candidates as usual and fetching the configured vector field during the fetch phase. MMR would then be applied after normalization and score combination. While this differs from the ideal semantic-only reranking stage, it can still help improve result diversity and is likely easier to support since it leverages existing functionality.

hybrid query
→ query phase (lexical + vector candidates)
→ normalization & score combination
→ fetch phase
→ MMR rerank
→ response

That said, we have considered supporting MMR for hybrid queries. Please feel free to open a feature request in the Neural plugin repository to help us prioritize this work. We’d also appreciate your feedback on the two potential approaches for supporting MMR reranking in hybrid queries.

Thanks,

Bo

Hi @bozhang RFC: Native MMR Support for Sub-queries in Hybrid Search · Issue #3106 · opensearch-project/k-NN · GitHub