Sparse model prompt

nattaylor · June 18, 2025, 2:31am

For the sparse search models (e.g. amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill) is it possible improve the embeddings by including a bit of context with the sequence?

Say I know all my documents are about computer programming. I’m experimenting with things like: mytext = 'In the following sequence focus on computer programming topics [SEP] ' + myactualtext

This impacts the rank vector primarily in 2 ways:

It finds some new non-zero terms
It includes the terms from my prepended context and associated expanded terms

For #2, can I delete these from the rank vector somehow, or prevent them from getting ranked in the first place?

Am I just boiling the ocean and should I accept the performance without any such tinkering, which is already very good?

Thank you!

zhichao-aws · June 19, 2025, 6:41am

Hi @nattaylor, I think it can’t help improve the performance. During the training of these sparse models, we didn’t add any context related prompts. Usually we’d suggest to keep the test-time settings identical to training times. And the performance with prompts hasn’t been evaluated

zhichao-aws · June 19, 2025, 6:42am

BTW we’ve released v3-gte model at HF recently: https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte. And we’ll strongly suggest to have a try

nattaylor · June 19, 2025, 12:46pm

Thank you for the insight! I hadnt considered train time - test time alignment, but that makes sense.

I will give the new model a try too

Still I want to emphasize how delighted I am with the excellent out-of-the-box performance!

Topic		Replies	Views
[RFC] neural sparse models improvement plan General Feedback	6	310	July 17, 2024
How should we handle updating ML/Sparse Encoding Models? Machine Learning	3	292	May 19, 2024
Building Applications with Neural Searches using Gen AI and OpenSearch Machine Learning	3	280	February 12, 2024
Neural / neural sparse - any way for it to understand negatives? Machine Learning	2	134	June 25, 2024
[Feedback] Machine Learning Model Serving Framework - Experimental Release General Feedback releases	48	3009	July 12, 2023

Sparse model prompt

Related topics