Thanks Navneet for your reply, much appreciated! I have few follow up questions if you can please, let me share me some of the code snippets as well.
- How does approx knn and exact KNN is differentiated in terms of implementation, based on my understand and documentation, One different is how its defined in mapping. For example appprox knn, below is the mapping I have and index.knn as true in settings
“embedding”: {
“type”: “knn_vector”,
“dimension”: 384,
“method”: {
“name”: “hnsw”,
“space_type”: “cosinesimil”,
“engine”: “nmslib”,
“parameters”: {
“ef_construction”: 512,
“m”: 16
}
}
For Exact KNN, below is the mapping I have and index.knn as true in settings
“embedding”: {
“type”: “knn_vector”,
“dimension”: 384
}
}
Second different is how serach is called, for approx knn, my serach template is like below
{
“size”: “{{SIZE}}{{^SIZE}}10{{/SIZE}}”,
“from”: “{{FROM}}{{^FROM}}0{{/FROM}}”,
“_source”: {
“exclude”: [
“embedding”
]
},
“query”: {
“bool”: {
“filter”: “FILTER_QUERY”,
“must”: [
{
“knn”: {
“embedding”: {
“vector”: “QUERY_VECTOR”,
“k”: “{{K}}{{^K}}10{{/K}}”
}
}
}
]
}
}
}
for exact KNN, my serach template is like below using function script:
{
“size”: “{{SIZE}}{{^SIZE}}10{{/SIZE}}”,
“from”: “{{FROM}}{{^FROM}}0{{/FROM}}”,
“_source”: {
“exclude”: [
“embedding”
]
},
“query”: {
“function_score”: {
“query”: {
“bool”: {
“filter”: “FILTER_QUERY”,
“must”:
}
},
“functions”: [
{
“script_score”: {
“script”: {
“source”: “knn_score”,
“lang”: “knn”,
“params”: {
“field”: “embedding”,
“query_value”: “QUERY_VECTOR”,
“space_type”: “cosinesimil”
}
}
}
}
],
“score_mode”: “sum”,
“boost_mode”: “replace”
}
}
}
Is this the right way to differentiate and corrcet way to implement or Did I miss anything here?
Question 2) I am using below serach template for approx knn with boolean filter, Boolean filter is where I am passing document id’s to perform approx knn serach only on those document id’s. Let’s say I have 1000 document id’s and they are not uniquie id’s. Now I am performing serach on passed document id, let’s say 1,2 and 3 and I can have 100 records related to 1,2 and 3 document id’s. I am getting empty result for some serach queries but for some I am getting results.
Can you please share if that’s the corrcet way to implement approx knn? or is there any other way? I am using OpenSearch 1.3 version, I know in latest version we have filters for ANN, but would that fit for my use case?
{
“size”: “{{SIZE}}{{^SIZE}}10{{/SIZE}}”,
“from”: “{{FROM}}{{^FROM}}0{{/FROM}}”,
“_source”: {
“exclude”: [
“embedding”
]
},
“query”: {
“bool”: {
“filter”: “FILTER_QUERY”,
“must”: [
{
“knn”: {
“embedding”: {
“vector”: “QUERY_VECTOR”,
“k”: “{{K}}{{^K}}10{{/K}}”
}
}
}
]
}
}
}
Question 3) Based on my understanding, approx KNN HNSW graph is build during indexing time and that’s the reason pre-filter doesn’t works? Is this the right understanding, can you please more on this? Also, Exact KNN is serach time operation, no datastructure is build during indexing time for Exact KNN, is it correct understanding?
Apologies for the long message, I would really appreciate your help on this. Thanks!