Normalisation in Hybrid Search

What is the best way to normalise scores of BM25 and ANN results. I am trying to build a hybrid search system where the results are ranked by combining the scores of keyword search and Neural search in a linear way.

To achieve this combination, the scores should be normalised on a same scale. I tried min-max normalisation of BM25 scores but that involves additional query to find the max score first and then use script scoring to normalise the BM25 scores (score/max-score) and sum them with respective KNN cosine similarity score in the subsequent query. Any other better ways to achieve this ?

2 Likes

Hi Praveen. We are actively working on this problem and we put RFC out recently. [RFC] High Level Approach and Design For Normalization and Score Combination · Issue #126 · opensearch-project/neural-search · GitHub

Your inputs will be helpful. Please feel free to provide feedback in the above RFC

2 Likes

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.