Versions:
OpenSearch 3.4.0 (Docker)
Dashboard 3.4.0 (Docker)
Describe the issue:
When running a hybrid query composed of a query_string and a Lucene knn query (cosine similarity) in explain mode, I get the following structure:
{
"_explanation": {
"value": 1,
"description": "arithmetic_mean, weights [0.3, 0.7] combination of:",
"details": [
{
"value": 1,
"description": "min_max normalization of:",
"details": [
{
"value": 3.8993959426879883,
"description": "combined score of:",
"details": [
{
"value": 3.899396,
"description": "weight(name:wind in 10513) [PerFieldSimilarity], result of:",
"details": []
},
{
"value": 0.78645265,
"description": "within top 10 docs",
"details": []
}
]
}
]
}
]
}
}
Note that there is a single “min_max normalization of“ but I’d expect two: one for each query score that are then combined. This differs from the example from the docs:
{
"_explanation": {
"value": 0.9251075,
"description": "arithmetic_mean combination of:",
"details": [
{
"value": 1.0,
"description": "min_max normalization of:",
"details": []
},
{
"value": 0.8503647,
"description": "min_max normalization of:",
"details": [
{
"value": 0.015177966,
"description": "within top 5",
"details": []
}
]
}
]
}
}
The docs use a different hybrid query composed of a match and a neural query (embeddings are calculated by OpenSearch). In my case, the embeddings are calculated externally and are then provided with the query.
Does this mean that hybrid queries’ scores with a knn part are not calculated as usually? The usual way being (see also this post):
- min-max-normalization for each hybrid query part’s score (0-1)
- combination of these scores to the overall score (weighted arithmetic mean: multiply each query part’s normalized score with the respective weight and sum up)
If the score is calculated differently for knn query parts (i.e. both scores are summed up first and are then normalized), wouldn’t that mean that the knn score would be underrepresented as opposed to the text query part’s score since cosine similarity ranges from 0-1 in OpenSearch even before normalization, see docs? If yes, is there a way to enforce a different behavior?
Configuration:
Mapping (knn part):
"embedding": {
"type": "knn_vector",
"dimension": 256,
"method": {
"engine": "lucene",
"space_type": "cosinesimil",
"name": "hnsw",
"parameters": {
"ef_construction": 128,
"m": 16,
"encoder": {
"name": "sq",
"parameters": {
"confidence_interval": 0.9
}
}
}
}
}
Normalization pipeline:
PUT /_search/pipeline/nlp-search-pipeline
{
"description": "Post processor for hybrid search with custom weights",
"phase_results_processors": [
{
"normalization-processor": {
"normalization": {
"technique": "min_max"
},
"combination": {
"technique": "arithmetic_mean",
"parameters": {
"weights": [0.3, 0.7]
}
}
}
}
],
"response_processors": [
{
"hybrid_score_explanation": {}
}
]
}
Query:
GET myindices/_search?search_pipeline=nlp-search-pipeline&explain=true&explain=true
{
"fields": ["name"],
"_source": {
"excludes": ["*"]
},
"query": {
"hybrid": {
"queries": [
{
"query_string": {
"query": "wind",
"default_field": "name",
"_name": "text_branch"
}
},
{
"knn": {
"embedding": {
"vector": [...],
"k": 10,
"_name": "vector_branch"
}
}
}
]
}
},
"size": 10,
"from": 0
}
Relevant Logs or Screenshots: