Hello OpenSearch Forum Community,
I’m currently working with a local cluster where I’ve installed all-mini-lm, but I’m facing performance issues as it’s running slower than expected. I have around 1 million documents indexed (using OpenAI embeddings), and I need to incorporate user feedback into the system as quickly as possible. My use case involves retrieving approximately 1,000 results at a time, and I also need to adjust the ranking based on user feedback efficiently.
Here are the options I’ve considered:
- Using a fine-tuned custom reranker seems too slow and costly, particularly when retrieving a large number of records. I noticed there were discussions about implementing top-k reranking, but it doesn’t appear to be available yet (Improving Search relevancy through Generic Second stage reranker · Issue #248 · opensearch-project/neural-search · GitHub).
- Another option I considered was reranking with a customized, fine-tuned model specific to my domain, rather than just using cross-encoders, but reranking based on a simple embedding model does not seem supported.
To avoid frequent reindexing, I thought about training a transformation of the embedding query. However, I’m concerned about the performance, as it doesn’t seem to be optimal (Finetuning an Adapter on Top of any Black-Box Embedding Model - LlamaIndex).
Do you have any suggestions or alternative approaches that might help improve performance and efficiency in this context? Any insights would be greatly appreciated.
Thank you!