Is it possible to use a pre-filter (on a keyword field for example) and then do approximate k-NN on the remaining documents so that I always get back exactly k documents? The documentation mentions a pre-filter together with custom scoring, but not with approximate k-NN. I guess I could try it, but I wanted to ask first, and if it’s possible then the documentation should mention that.
Thanks for Open Distro by the way!
@jojo apologies for the delay in responding. Unfortunately it is not possible to do approximate-knn on filtered documents. Generally kNN is evaluated first and rest of the filtered queries are executed on top of the returned k results. If k is very small, it is possible to get less than k results ( sometimes empty ) after applying all filters . Hence, we recommend to use custom scoring exactly for this scenario. We can share some more insights on possible approaches if you could give us details about your dataset , k value and what is the filtered documents size(min/max/average) before applying knn algorithm . You can also check here for performance considerations.
Hello @Vijay thanks for your reply. Yeah, looking at ANN implementations it’s clear to me why pre-filtering isn’t implemented this way. My problem is that depending on the pre-filter I might only search 50% of my indexed documents (or fewer), so the chances of getting back an empty set of results are high.
What I’m doing now is binarising the vector and then indexing it as regular terms and using sampled subsets of bits for approximating NN with floats. So this doesn’t utilise the kNN plugin at all, just thought I’d mention it as an alternative for people who might land here.
Hey @Vijay , Is there a ticket opened for supporting pre-filter with ANN? If not then how do we open one?