Navneet
November 10, 2025, 11:32pm
1
Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
3.4 and above
Describe the issue :
OpenSearch recently added support for rescoring with late interaction models using painless script scoring. Meanwhile, Lucene 10.3 includes native support for rescoring results using late interaction models. Lucene’s implementation exposes a new LateInteractionField that can index multi-vectors and rescore results for any query using maxSim similarity against a provided target multi-vector.
There are advantages in leveraging the Lucene based support in OpenSearch – 1) it works for users who cannot run painless scripts, 2) any optimizations or vectorization improvements made in Lucene directly feed into OpenSearch. This issue proposes to add the required support in OpenSearch.
We see two main buckets of work:
Add support to index multi-vectors into Lucene’s LateInteractionField.
Add support to rescore query results using configured similarity (maxSim by default) between provided a target query multi-vector and indexed multi-vectors.
More Details can be found here:
opened 06:16PM - 09 Oct 25 UTC
Features
OpenSearch recently added support for rescoring with late interaction models usi… ng painless script scoring. Meanwhile, Lucene 10.3 includes native support for rescoring results using late interaction models. Lucene's implementation exposes a new `LateInteractionField` that can index multi-vectors and rescore results for any query using maxSim similarity against a provided target multi-vector.
There are advantages in leveraging the Lucene based support in OpenSearch – 1) it works for users who cannot run painless scripts, 2) any optimizations or vectorization improvements made in Lucene directly feed into OpenSearch. This issue proposes to add the required support in OpenSearch.
I see two main buckets of work:
1. Add support to index multi-vectors into Lucene's `LateInteractionField`.
2. Add support to rescore query results using configured similarity (maxSim by default) between provided a target query multi-vector and indexed multi-vectors.
__
### 1) Indexing Multi-Vectors
I believe this requires us to add a new `ParametrizedFieldMapper` that can accept multi-vector floats and index them into Lucene's `LateInteractionField`.
### 2) Rescoring
From what I can make out, OpenSearch today only supports rescoring using full precision vector similarity on the **same** knn field that was used for the knn query. With the late interaction field, we want to rescore using similarity scores between query and document multi-vectors that are different from the initial single vector knn query.
There is value in making this support more generic beyond the late interaction field. With a properly designed API, users can rescore knn results from one vector field using similarity scores from a different vector field (and target vector). This has utility in search workflows that have embeddings trained on different targets e.g. rescoring using vectors that are trained on personalization data. Of course, rescoring on a different field will require a different target vector for similarity, which the API should support.
**Proposed API:**
```ruby
GET /my-vector-index/_search
{
"size": 2,
"query": {
"knn": {
"{target-field}": {
"vector": [2, 3, 5, 6],
"k": 2,
"rescore" : {
"oversample_factor": 3,
"{rescore-field}": { # late interaction document field
"vector": [[1,2,3], [3,4,5], [1,3,5],...], # late interaction query target
"similarity": "maxSimDotProduct", # [optional] can specify different similarity fns. should we call it "space"?
...
}
}
}
}
}
}
```
__
### Rescoring v/s Reranking
OpenSearch also supports rerank processors using the ML commons project. My mental model here with these different rescoring and reranking options is more along the lines of going from no-interaction to full-interaction spectrum..
1. **Rescoring with full-precision on the same field** – Primary utility is to save some memory by using quantized vectors (disk mode) during approximate vector search phase, but using full precision vectors to recsore and create an optimal final ordering.
2. **Rescoring with full-precision on a different single-vector field** – This is still the standard bi-encoder setup where documents and queries are encoded independently. It lets you rescore your initial match set using embeddings trained on a different target function. e.g. use a standard set of embeddings to create a good initial query result set, and a second pair of embeddings to reorder them based on some business specific targets.
3. **Rescoring with late interaction vectors** – This let's you capture more nuanced relationships between queries and documents using a multi-vector representation from special late interaction models like ColBERT and ColPali. While slightly more compute intensive than single vector representations, they still allow offline generation of document multi-vectors, and are less compute intensive than full interaction cross-encoders. Late interaction multi-vectors are indexed in a separate field and will use the new API options added by this effort.
4. **Reranking preprocessors** – My understanding is that this is the other end of spectrum where we invoke an ML model with the query and document, the model does inference (like cross-attention) on both query and document tokens, and outputs a relevance score that we use to rerank results. This is most expensive computationally, but it does capture the power of cross-attention heads (maybe multi-heads depending on your model) across query and document tokens. This should provide the highest accuracy, but at highest cost.
Would love inputs from the community on proposed functionality, API changes, and areas that will require changes.
Author: vigyasharma (Vigya Sharma) · GitHub
system
Closed
January 9, 2026, 11:33pm
2
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.