Need to provide very specific results and incorporate user feedback, but reranking is slow

Hello OpenSearch Forum Community,

I’m currently working with a local cluster where I’ve installed all-mini-lm, but I’m facing performance issues as it’s running slower than expected. I have around 1 million documents indexed (using OpenAI embeddings), and I need to incorporate user feedback into the system as quickly as possible. My use case involves retrieving approximately 1,000 results at a time, and I also need to adjust the ranking based on user feedback efficiently.

Here are the options I’ve considered:

  1. Using a fine-tuned custom reranker seems too slow and costly, particularly when retrieving a large number of records. I noticed there were discussions about implementing top-k reranking, but it doesn’t appear to be available yet (Improving Search relevancy through Generic Second stage reranker · Issue #248 · opensearch-project/neural-search · GitHub).
  2. Another option I considered was reranking with a customized, fine-tuned model specific to my domain, rather than just using cross-encoders, but reranking based on a simple embedding model does not seem supported.

To avoid frequent reindexing, I thought about training a transformation of the embedding query. However, I’m concerned about the performance, as it doesn’t seem to be optimal (Finetuning an Adapter on Top of any Black-Box Embedding Model - LlamaIndex).

Do you have any suggestions or alternative approaches that might help improve performance and efficiency in this context? Any insights would be greatly appreciated.

Thank you!

Have you tried using something like BAAI/bge-reranker-v2-m3 · Hugging Face? Do you have access to a GPU?