Hi all,
Our system is using hybrid search (bm25+dense semantic) for a while now. We have lately been aware of sparse vectors from opensearch but not so clear on how it might fit on a hybrid search and reranking setup.
- in terms of relevancy, how exactly is it different from dense embedding? I can only see some reduced memory based on the documentation. And some vector related stuff is unclear to me as I am not an ML engineer. Perhaps someone can explain it more simply?
- I’ve read from some sources that sparse vectors excels in keyword-based search, but we are already using bm25 as keyword search. How is sparse vectors compare to bm25 or lexical search?
- finally, what is the recommended way to do hybrid search? lexical+dense, sparse+dense, or lexical+sparse+dense?
Thanks all!