For the sparse search models (e.g. amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill) is it possible improve the embeddings by including a bit of context with the sequence?
Say I know all my documents are about computer programming. I’m experimenting with things like: mytext = 'In the following sequence focus on computer programming topics [SEP] ' + myactualtext
This impacts the rank vector primarily in 2 ways:
- It finds some new non-zero terms
- It includes the terms from my prepended context and associated expanded terms
For #2, can I delete these from the rank vector somehow, or prevent them from getting ranked in the first place?
Am I just boiling the ocean and should I accept the performance without any such tinkering, which is already very good?
Thank you!