Hello All,
I am using Opendistro ES for knn-based vector search. I insert vectors as per below settings and mappings:
settings = {
"index": {
"knn": True,
"knn.algo_param.ef_search": 1024,
"knn.algo_param.ef_construction": 1024,
"knn.algo_param.m": 32,
"max_inner_result_window": 6,
}
}
mappings = {
"properties": {
"doc_id": {"type": "keyword"},
"doc_text": {"type": "text"},
"feature_vector": {"type": "knn_vector", "dimension": 512},
}
}
For searching my documents, I am using approximate KNN. What I have observed that, everytime I reindex my data (upsert data into the index), I get different score for some documents in the result list.
My search query:
request = {
"query": {
"bool": {
"must": [
{
"knn": {
"feature_vector": {
"vector": query_vector.tolist(),
"k": n_results,
}
}
}
],
"should": [
{
"match": {
"doc_text": {
"query": "entity1",
"operator": "and",
}
}
},
{
"match": {
"doc_text": {
"query": "entity2",
"operator": "and",
}
}
}
],
}
},
}
For example,
Scenario 1:
- I indexed data into a new Index.
- Used KNN Vector search.
- In the result list, say doc1 gets the score 0.59
Scenario 2:
- I reindex the data.
- Used KNN Vector search.
- In the result list, doc1 gets a different score (0.58).
In between these two scenarios, only reindexing step differs, rest all are same. The consequence is that I get different precision/recall score.
Can anyone explain the change in the observation?
I have tried using Brute-force KNN search as well. In this case, getting the non-deterministic result is less frequent.
docker version used: amazon/opendistro-for-elasticsearch:1.13.1
language: Python
library used: elasticsearch = “^7.8.1”
If you need any details, please let me know.
Thanks in advance,
Abhash Sinha