Exact KNN / Approx KNN

Can anyone please help to differentiate that how Exact KNN and aprox KNN ( using HNSW) can be implemented? Can we use function score with both KNN’s? OpenSearch documentation is not clear.
I want to use approx knn but I want to pass filter like document id’s to filter few id from millions of document and then do approx knn search only on filtered documents.

OpenSearch version 1.3
Thanks!

@gprabhashmal

I want to use approx knn but I want to pass filter like document id’s to filter few id from millions of document and then do approx knn search only on filtered documents.
OpenSearch version 1.3

Filtering support with Approximate KNN is not present with OpenSearch version 1.3. Check this latest documentation: k-NN search with filters - OpenSearch documentation

Can anyone please help to differentiate that how Exact KNN and aprox KNN ( using HNSW) can be implemented?

So, you can use script_score to use exact KNN(doc), Approximate K-NN be easily like this

GET my-knn-index-1/_search
{
  "size": 2,
  "query": {
    "knn": {
      "my_vector2": {
        "vector": [2, 3, 5, 6],
        "k": 2
      }
    }
  }
}

More details present here: Approximate k-NN search - OpenSearch documentation

Can we use function score with both KNN’s? OpenSearch documentation is not clear.

Yes you can use function_score query with Approximate Search for sure. For Exact search, as it is performed via Script_Score query and I have never tried it. So not sure about that. May be you can give it a shot. I don’t see a reason why it won’t be possible.

Thanks Navneet for your reply, much appreciated! I have few follow up questions if you can please, let me share me some of the code snippets as well.

  1. How does approx knn and exact KNN is differentiated in terms of implementation, based on my understand and documentation, One different is how its defined in mapping. For example appprox knn, below is the mapping I have and index.knn as true in settings

“embedding”: {
“type”: “knn_vector”,
“dimension”: 384,
“method”: {
“name”: “hnsw”,
“space_type”: “cosinesimil”,
“engine”: “nmslib”,
“parameters”: {
“ef_construction”: 512,
“m”: 16
}
}
For Exact KNN, below is the mapping I have and index.knn as true in settings
“embedding”: {
“type”: “knn_vector”,
“dimension”: 384
}
}
Second different is how serach is called, for approx knn, my serach template is like below
{
“size”: “{{SIZE}}{{^SIZE}}10{{/SIZE}}”,
“from”: “{{FROM}}{{^FROM}}0{{/FROM}}”,
“_source”: {
“exclude”: [
“embedding”
]
},
“query”: {
“bool”: {
“filter”: “FILTER_QUERY”,
“must”: [
{
“knn”: {
“embedding”: {
“vector”: “QUERY_VECTOR”,
“k”: “{{K}}{{^K}}10{{/K}}”
}
}
}
]
}
}
}

for exact KNN, my serach template is like below using function script:
{
“size”: “{{SIZE}}{{^SIZE}}10{{/SIZE}}”,
“from”: “{{FROM}}{{^FROM}}0{{/FROM}}”,
“_source”: {
“exclude”: [
“embedding”
]
},
“query”: {
“function_score”: {
“query”: {
“bool”: {
“filter”: “FILTER_QUERY”,
“must”:
}
},
“functions”: [
{
“script_score”: {
“script”: {
“source”: “knn_score”,
“lang”: “knn”,
“params”: {
“field”: “embedding”,
“query_value”: “QUERY_VECTOR”,
“space_type”: “cosinesimil”
}
}
}
}
],
“score_mode”: “sum”,
“boost_mode”: “replace”
}
}
}

Is this the right way to differentiate and corrcet way to implement or Did I miss anything here?

Question 2) I am using below serach template for approx knn with boolean filter, Boolean filter is where I am passing document id’s to perform approx knn serach only on those document id’s. Let’s say I have 1000 document id’s and they are not uniquie id’s. Now I am performing serach on passed document id, let’s say 1,2 and 3 and I can have 100 records related to 1,2 and 3 document id’s. I am getting empty result for some serach queries but for some I am getting results.
Can you please share if that’s the corrcet way to implement approx knn? or is there any other way? I am using OpenSearch 1.3 version, I know in latest version we have filters for ANN, but would that fit for my use case?

{
“size”: “{{SIZE}}{{^SIZE}}10{{/SIZE}}”,
“from”: “{{FROM}}{{^FROM}}0{{/FROM}}”,
“_source”: {
“exclude”: [
“embedding”
]
},
“query”: {
“bool”: {
“filter”: “FILTER_QUERY”,
“must”: [
{
“knn”: {
“embedding”: {
“vector”: “QUERY_VECTOR”,
“k”: “{{K}}{{^K}}10{{/K}}”
}
}
}
]
}
}
}

Question 3) Based on my understanding, approx KNN HNSW graph is build during indexing time and that’s the reason pre-filter doesn’t works? Is this the right understanding, can you please more on this? Also, Exact KNN is serach time operation, no datastructure is build during indexing time for Exact KNN, is it correct understanding?

Apologies for the long message, I would really appreciate your help on this. Thanks!

Let me try to put some details here, what happens when you set or unset index.knn setting.

  1. index.knn = true: This setting tells the OpenSearch k-NN plugin that this index will be used for doing ANN search. Hence it will ensure that right Codec is used which will make sure that code is going ahead and making the right data structures (in case of HNSW graph files) per segment. This will ensure that ANN search is possible.
  2. index.knn = false: In this case, we use the default codec provided by OpenSearch and no graph files are generated and hence only Exact search can work here.

FYI: You can do both exact search and ANN search for an index with index.knn = true. But if you are going to use the index only for doing exact search then index.knn = true is not recommended as graphs files will be generated but will be never used. Hence waste of space.

Bool filters are not pre-filters. They act as a post filter for k-NN(Refer the documentation: k-NN search with filters - OpenSearch documentation ). You are getting no results because it can happen that ANN search is not resulting in documents with id 1,2 and 3. Hence when the results of ANN search is combined with bool filters no results will come.

If you want to make your ANN search work on the document Ids which you are passing in the query, try using the Efficient Filters(launched in 2.9, earlier name for these filters is Lucene filters as they were applicable with Lucene engine only before 2.9 version of OpenSearch.)

Efficient Filter Documentation: k-NN search with filters - OpenSearch documentation

The above understanding is not 100% correct. The reason why bool filters are not working with you query is because the way queries are executed. While doing ANN search with bool filters, the ANN search has no understanding what were the filtered documents. To make ANN search understand that it needs to run on some filtered docs, please use Efficient Filters. I already added the docs link above.

The way exact search works is during the index creation time, we store the vectors as binary doc values and use that to do the exact search. But it works on the documents which are filtered from the query. You see the example and explanation here: k-NN search with filters - OpenSearch documentation.

I hope this clarifies.

Thanks Navneet, its super helpful. One question, Can I use Lucene filter availiable in 2.7 because AWS hasn’t released 2.9 OS version yet. Thanks again and appreciate for sharing the details.

Yes you can use Lucene filters. But make sure that you are using Lucene engine while creating the index.

@gprabhashmal

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.