Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch 2.17 on AWS
Describe the issue:
Could someone explain in detail what is the combination in normalization processor?
For our case we have an index with keyword fields and a vector field of 256 dimension. We want to look for only the first 100 vector results and all keyword results N from the index, with the count and aggregation on 100 + N results. I think the hybrid search with normalization-processor can’t do it, so we created a processor similar to normalization-processor with chunk texts, normalization with l2 norm and rank with a LLM cross-encoder. Otherwise I don’t understand the combination of normalization-processor.
If I retrive the top 100 documents for neural query and 10000 for BM25, how to calculate the combined score si if i > 100 ? si = ~bi ? Some documents of these two search are the same others no. After l2 normalization, the BM25 score is in (0,1), the vector score is in (0.1,0.5)
How to configure weights? I don’t know the number of quries we will run.
The combination is a must for hybrid search?
Configuration:
Relevant Logs or Screenshots: