What's the combination in normalization processor?

Garance · November 19, 2024, 5:19pm

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch 2.17 on AWS

Describe the issue:
Could someone explain in detail what is the combination in normalization processor?

For our case we have an index with keyword fields and a vector field of 256 dimension. We want to look for only the first 100 vector results and all keyword results N from the index, with the count and aggregation on 100 + N results. I think the hybrid search with normalization-processor can’t do it, so we created a processor similar to normalization-processor with chunk texts, normalization with l2 norm and rank with a LLM cross-encoder. Otherwise I don’t understand the combination of normalization-processor.

If I retrive the top 100 documents for neural query and 10000 for BM25, how to calculate the combined score si if i > 100 ? si = ~bi ? Some documents of these two search are the same others no. After l2 normalization, the BM25 score is in (0,1), the vector score is in (0.1,0.5)
How to configure weights? I don’t know the number of quries we will run.

The combination is a must for hybrid search?

Configuration:

Relevant Logs or Screenshots:

system · January 18, 2025, 5:20pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
RFC: High Level Approach and Design For Normalization and Score Combination Request For Comments feature-request	1	400	March 2, 2023
Normalisation in Hybrid Search k-NN	2	1533	May 6, 2023
Hybrid search and normalization processor k-NN	1	266	May 19, 2024
Hybrid Search Normalization for Nested Queries OpenSearch troubleshoot , configure	3	105	March 10, 2025
Normalization Preprocessor does not work with Nested, Hybrid queries OpenSearch discuss , troubleshoot	0	31	March 10, 2025

What's the combination in normalization processor?

Related topics