Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
2.19
Describe the issue:
Please advise on the optimal k
and pagination_depth
settings for a hybrid search that blends a lexical match
query with KNN
?
Configuration:
Relevant Logs or Screenshots:
Reply from @varun4996 -
Hybrid query works in a different way.
Let say there are 10 shards and user provided match and knn query in the hybrid query clause. When you provide pagination_depth = 100
, if can cater up to 10 x 100 = 1000 results to the coordinator node for each query. So, 1000 results for match and 1000 results for knn query.
Amongst those 2000 total results, consider 500 results are part of both the subquery results.So total unique results are 1500 results.Now, with 100 as pagination_depth
the coordinator node has 1500 results.
With the help of tweaking from
and size
values you can get 1500/100 = 15 pages.
Here, one important thing to note is we have assumed that each shard result contains at least 100 results for each query as we provided pagination_depth = 100
.
Let’s say if a shard only returns 20 results for match query and knn query gets 100 results then effectively this shard will only return 100 + 20 = 120 results
to the coordinator.Therefore, if by using pagination_depth =100,
from = 900
and size = 100
you are not able to see page 10 of the search results then it means you need to increase the pagination_depth
.
Now, comes the k
value. If you are providing size = 100
then k
value should at least be 100 so that each shard return at least many documents.
But, if you providek = 500
then also the shard can cater sent 100
results to the coordinator node. It is because the pagination_depth
decides how many results it needs for hybridization.
Pro tip: focus on pagination_depth
and keep k
and size
same