Approximate k-NN with pre-filter

Hello Team,

Is it possible to use a pre-filter (on a keyword field for example) and then do approximate k-NN on the remaining documents so that I always get back exactly k documents? The documentation mentions a pre-filter together with custom scoring, but not with approximate k-NN. I guess I could try it, but I wanted to ask first, and if it’s possible then the documentation should mention that.

Thanks for Open Distro by the way!

@jojo apologies for the delay in responding. Unfortunately it is not possible to do approximate-knn on filtered documents. Generally kNN is evaluated first and rest of the filtered queries are executed on top of the returned k results. If k is very small, it is possible to get less than k results ( sometimes empty ) after applying all filters . Hence, we recommend to use custom scoring exactly for this scenario. We can share some more insights on possible approaches if you could give us details about your dataset , k value and what is the filtered documents size(min/max/average) before applying knn algorithm . You can also check here for performance considerations.

Hello @Vijay thanks for your reply. Yeah, looking at ANN implementations it’s clear to me why pre-filtering isn’t implemented this way. My problem is that depending on the pre-filter I might only search 50% of my indexed documents (or fewer), so the chances of getting back an empty set of results are high.

What I’m doing now is binarising the vector and then indexing it as regular terms and using sampled subsets of bits for approximating NN with floats. So this doesn’t utilise the kNN plugin at all, just thought I’d mention it as an alternative for people who might land here.

Hey @Vijay , Is there a ticket opened for supporting pre-filter with ANN? If not then how do we open one?

Hi @varuns did you check the latest releases of Opensearch? Prefilter is supported in ANN with Lucene engine.

Hi, Can you please explain why pre-filtering will not work? could you please point to documentation or code as you mentioned. Trying my luck, I know its quite old post. I am facing similar issue

@Navneet The shared link talks about efficient k-NN filtering which clearly states

When you specify a Lucene filter for a k-NN search, the Lucene algorithm decides whether to perform an exact k-NN search with pre-filtering or an approximate search with modified post-filtering.

Could you share how does this indicate that lucene supports pre-filtering on ANN search?