Hybrid Score Explain Output's Structure Diverges from Docs

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

OpenSearch 3.0.0 (Docker)

Describe the issue:

I used the explain feature for my hybrid query to understand how the subqueries’ scores are normalized and combined. However, I find the response JSON’s structure to be different from the docs.

In the docs, I find this structure:

{
  "_explanation": {
    "value": 0.9251075,
    "description": "arithmetic_mean combination of:",
    "details": [
      {
        "value": 1.0,
        "description": "min_max normalization of:",
        "details": [ {
                                    "value": 1.2336599,
                                    "description": "weight(text:horse in 0) [PerFieldSimilarity], result of:",
                                    "details": []
      }
      {
        "value": 0.8503647,
        "description": "min_max normalization of:",
        "details": [
          {
            "value": 0.015177966,
            "description": "within top 5",
            "details": []
          }
        ]
      }
    ]
  }
}

This makes sense to me since first the subqueries’ original scores are min-max-normalized: 1.2336599 → 1, 0.015177966 → 0.8503647, and then the mean is calculated from these normalized values, resulting in 0.9251075. This is consistent with this great article.

On my local OS instance, I get:

 {     
   "_explanation": {
          "value": 0.7651082,
          "description": "arithmetic_mean, weights [0.7, 0.3] combination of:",
          "details": [
            {
              "value": 0.75093794,
              "description": "min_max normalization of:",
              "details": [
                {
                  "value": 4.207993030548096,
                  "description": "combined score of:",
                  "details": [
                    {
                      "value": 0.7004002,
                      "description": "within top 10 docs",
                      "details": []
                    },
                    {
                      "value": 4.207993,
                      "description": "weight(name:wind in 234522) [PerFieldSimilarity], result of:",
                      "details": [
                        {
                          "value": 4.207993,
                          "description": "score(freq=2.0), computed as boost * idf * tf from:",
                          "details": [
                            {
                              "value": 6.7454805,
                              "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                              "details": [
                                {
                                  "value": 202,
                                  "description": "n, number of documents containing term",
                                  "details": []
                                },
                                {
                                  "value": 172166,
                                  "description": "N, total number of documents with field",
                                  "details": []
                                }
                              ]
                            },
                            {
                              "value": 0.623824,
                              "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                              "details": [
                                {
                                  "value": 2,
                                  "description": "freq, occurrences of term within document",
                                  "details": []
                                },
                                {
                                  "value": 1.2,
                                  "description": "k1, term saturation parameter",
                                  "details": []
                                },
                                {
                                  "value": 0.75,
                                  "description": "b, length normalization parameter",
                                  "details": []
                                },
                                {
                                  "value": 12,
                                  "description": "dl, length of field",
                                  "details": []
                                },
                                {
                                  "value": 11.920106,
                                  "description": "avgdl, average length of field",
                                  "details": []
                                }
                              ]
                            }
                          ]
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        }
}

I my query’s response above, I only see a single entry for min-max normalization while there are two in the example from the docs. I would have expected two entries for min_max normalization since each of the two queries use their own score (lexical and knn). So each score should be normalized first and then the results are multiplied with the weights and the the mean is calculated for that.

Configuration:

This is my query:

GET my_index/_search?explain=true


{
  "fields": [
    "name"
  ],
  "_source": {
    "excludes": [
      "*"
    ]
  },
  "search_pipeline": {
    "phase_results_processors": [
      {
        "normalization-processor": {
          "normalization": {
            "technique": "min_max"
          },
          "combination": {
            "technique": "arithmetic_mean",
            "parameters": {
              "weights": [
                0.7,
                0.30000000000000004
              ]
            }
          }
        }
      }
    ],
    "response_processors": [
      {
        "hybrid_score_explanation": {}
      }
    ]
  },
  "query": {
    "hybrid": {
      "queries": [
        {
          "knn": {
            "embedding": {
              "vector": [...],
              "k": 10
            }
          }
        },
        {
          "match": {
            "name": {
              "query": "wind"
            }
          }
        }
      ]
    }
  },
  "size": 10,
  "from": 0,
  "track_total_hits": true
}

Relevant Logs or Screenshots:

I did set up OS 3.0.0 docker on another system with the same data and now I get the structure I actually expected. I have no idea why …

"_explanation": {
          "value": 0.73567706,
          "description": "arithmetic_mean, weights [0.7, 0.3] combination of:",
          "details": [
            {
              "value": 0.81662357,
              "description": "min_max normalization of:",
              "details": [
                {
                  "value": 0.69145346,
                  "description": "within top 100 docs",
                  "details": []
                }
              ]
            },
            {
              "value": 0.54727983,
              "description": "min_max normalization of:",
              "details": [
                {
                  "value": 3.0499675,
                  "description": "weight(name:wind in 1221) [PerFieldSimilarity], result of:",
                  "details": [
                    {
                      "value": 3.0499675,
                      "description": "score(freq=2.0), computed as boost * idf * tf from:",
                      "details": [
                        {
                          "value": 6.5695195,
                          "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                          "details": [
                            {
                              "value": 18,
                              "description": "n, number of documents containing term",
                              "details": []
                            },
                            {
                              "value": 13190,
                              "description": "N, total number of documents with field",
                              "details": []
                            }
                          ]
                        },
                        {
                          "value": 0.46426034,
                          "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                          "details": [
                            {
                              "value": 2,
                              "description": "freq, occurrences of term within document",
                              "details": []
                            },
                            {
                              "value": 1.2,
                              "description": "k1, term saturation parameter",
                              "details": []
                            },
                            {
                              "value": 0.75,
                              "description": "b, length normalization parameter",
                              "details": []
                            },
                            {
                              "value": 15,
                              "description": "dl, length of field",
                              "details": []
                            },
                            {
                              "value": 6.723351,
                              "description": "avgdl, average length of field",
                              "details": []
                            }
                          ]
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        }

Combined score: 0.81662357 * 0.7 + 0.54727983 * 0.3
However, shouldn’t the combined score be half of 0.73567706 (divided by 2)?

hello @tobe,
Numbers that you’re getting are correct. My understanding it that you’re expect the score to be divided by the number of queries (2). But the actual implementation (correctly matching the documentation) divides by the sum of weights.

The OpenSearch documentation for the normalization processor with arithmetic_mean combination technique correctly describes the formula as:

score = (weight1*score1 + weight2*score2 +...+ weightN*scoreN)/(weight1 + weight2 + ... + weightN)

Using the values from the user’s example:

  • 0.81662357 * 0.7 + 0.54727983 * 0.3 = 0.735820448
  • Since weights sum to 1.0 (as required by the implementation), this equals 0.735820448

The calculated value matches the reported value of 0.73567706 (small difference due to floating-point precision).

This is a weighted arithmetic mean formula that divides by the sum of weights, not by the number of queries.

Since the weights must sum to 1.0 (enforced by validation code), the division doesn’t change the final value in this case, which may have contributed to the confusion.
The hybrid score explanation is functioning correctly according to both the documented and implemented formula. The observed behavior is as designed.

1 Like