Approximate k-NN with pre-filter

jojo · January 1, 2021, 8:00pm

Hello Team,

Is it possible to use a pre-filter (on a keyword field for example) and then do approximate k-NN on the remaining documents so that I always get back exactly k documents? The documentation mentions a pre-filter together with custom scoring, but not with approximate k-NN. I guess I could try it, but I wanted to ask first, and if it’s possible then the documentation should mention that.

Thanks for Open Distro by the way!

Vijay · January 19, 2021, 10:05pm

@jojo apologies for the delay in responding. Unfortunately it is not possible to do approximate-knn on filtered documents. Generally kNN is evaluated first and rest of the filtered queries are executed on top of the returned k results. If k is very small, it is possible to get less than k results ( sometimes empty ) after applying all filters . Hence, we recommend to use custom scoring exactly for this scenario. We can share some more insights on possible approaches if you could give us details about your dataset , k value and what is the filtered documents size(min/max/average) before applying knn algorithm . You can also check here for performance considerations.

jojo · January 31, 2021, 6:46pm

Hello @Vijay thanks for your reply. Yeah, looking at ANN implementations it’s clear to me why pre-filtering isn’t implemented this way. My problem is that depending on the pre-filter I might only search 50% of my indexed documents (or fewer), so the chances of getting back an empty set of results are high.

What I’m doing now is binarising the vector and then indexing it as regular terms and using sampled subsets of bits for approximating NN with floats. So this doesn’t utilise the kNN plugin at all, just thought I’d mention it as an alternative for people who might land here.

varuns · December 7, 2021, 8:31pm

Hey @Vijay , Is there a ticket opened for supporting pre-filter with ANN? If not then how do we open one?

Navneet · April 11, 2023, 2:14am

Hi @varuns did you check the latest releases of Opensearch? Prefilter is supported in ANN with Lucene engine.

gprabhashmal · August 23, 2023, 7:09pm

Hi, Can you please explain why pre-filtering will not work? could you please point to documentation or code as you mentioned. Trying my luck, I know its quite old post. I am facing similar issue

prakharc · January 30, 2025, 5:32am

@Navneet The shared link talks about efficient k-NN filtering which clearly states

When you specify a Lucene filter for a k-NN search, the Lucene algorithm decides whether to perform an exact k-NN search with pre-filtering or an approximate search with modified post-filtering.

Could you share how does this indicate that lucene supports pre-filtering on ANN search?

Topic		Replies	Views
How filter works on Approx KNN openserach k-NN	2	480	October 24, 2023
Exact KNN / Approx KNN k-NN	6	864	October 28, 2023
Efficient k-NN filtering with Neural Search OpenSearch	0	64	February 5, 2025
Early look at k_NN pre filtering? k-NN	6	527	November 2, 2022
[BUG] Insufficient number of hits for nested knn queries with efficient filter #2347 k-NN	3	89	February 18, 2025

Approximate k-NN with pre-filter

Related topics