Hybrid query to be combined with function score

I used the hybrid query as given in the documentation and it works fine when used independently.
I would like to combine Hybrid query with the function_score query as below:

{
  "function_score": {
    "functions": [],
    "query": {
      "hybrid": {
        "queries": [
          {
            "match": {
              "title": "test article"
            }
          },
          {
            "knn": {
              "text_vector": {
                "vector": [3,5,6],
                "k": 10
              }
            }
          }
        ]
      }
    }
  }
}

Currently this does not return the normalised scores. The expectation here is that the normalised scores returned from hybrid query are later altered based on some other queries defined in function’s array.
How can Hybrid Query be combined here?

This won’t be possible with hybrid query because how hybrid query works.

Hybrid query clause should not be wrapped inside other query clauses. As main aim if hybrid query clause is to run calculate scores of queries independently and then normalize and combine at coordinator.

Is there any other way to achieve this? Currently no boostings can be applied if we try to use hybrid query which is a requirement.
Can this be considered as a feature enhancement.

@Ankita did you try wrapping function scores inside of hybrid query clause?

Because the array of queries which is taken by the hybrid query clause accept any valid query clause.

The way you should be thinking about ‘hybrid’ query clause is it takes a list of queries that are executed separately on the shards and the scores are finally normalized and combined at coordinator.

So its more like you ran X separate queries against Opensearch and then normalized and combined scores in your application.

For your question, ideally the function score queries should be wrapped inside hybrid query clause.

Can you share an example.
Also if we wrap function scores inside hybrid query will the functions be applied on top of bm25+knn query’s normalised scores?

Also if we wrap function scores inside hybrid query will the functions be applied on top of bm25+knn query’s normalised scores?

It depends how you are creating function score query. This is a simple example I can come up with above provided query. Now you can boost the text match query how ever you want. At the end of the day what will happen is text and vector search queries will be executed separately and their scores will be normalized and combined at coordinator node.

{
  "query": {
    "hybrid": {
      "queries": [
        "function_score" : {
          "functions": [],
            "query": {
              "match": {
                "title": "test article"
              }
            }
        },
        {
          "knn": {
              "text_vector": {
                "vector": [3,5,6],
                "k": 10
              }
            }
        }
      ]
    }
  }
}

Thanks @Navneet. It worked.
Scores are being normalised for function_score and kNN query.

But there’s another doubt around this.

Running these in query(without Hybrid) returns the following number of results:
BM25: 143
kNN: 246
BM25 + kNN: 263

But while using Hybrid query, the results returned are never more than 50.
BM25: 50
kNN: 50
BM25 + kNN: 50

Is there a limitation here?

There is not limitation that we are aware of. We have tested with size:100.

Can you paste your query here? Also, try increasing value of K and size parameter to what number of results you need and see if issue persists.

Changing the size and k does impact the result count.

I wanted to show results using pagination for which I have been using from and size parameters.
Setting from:0 and size: 10; returns 10 results and total value as 50.

{
  "hits": {
    "total": {
      "value": 50,
      "relation": "eq"
    },
    "max_score": 0.9458592,
    "hits": [<10 results>]
  }
}

On changing from to 10 changes the total value as well

{
  "hits": {
    "total": {
      "value": 100,
      "relation": "eq"
    },
    "max_score": 0.95436335
    "hits": [<10 results>]
  }
}

Isn’t this the total docs matching the query which should remain same irrespective of from and size params?

Also, I’m unable to use aggregations, filtering, sorting along with this.
Is this something in pipeline?

@Ankita
Few things to note here:
Pagination, aggregations, sorting are not supported with this query clause.

Filtering is supported but you have to put filters with both the queries inside the queries array of hybrid query clause

@Navneet Are these features aligned in the roadmap with Hybrid Query?
If yes, any ETA you could share.

For pagination there is already an issue created. For others no.

Issue: [FEATURE] Implement pagination for Hybrid Search · Issue #280 · opensearch-project/neural-search · GitHub

Feel free to create feature requests for others.

@Ankita

Thanks @Navneet. Added for the others
[FEATURE] Aggregations to be supported with Hybrid Search
[FEATURE] post_filter to be supported with Hybrid Search
[FEATURE] Results returned from Hybrid Search should respect sorting applied

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.