Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): 2.13 or above
Describe the issue:
PUT _bulk?refresh=true
{ “index”: { “_index”: “my-knn-index-1”, “_id”: “1” } }
{“nested_field”:
[{“my_vector”:[1,1,1], “color”: “blue”},
{“my_vector”:[20,20,20], “color”: “yellow”},
{“my_vector”:[30,30,30], “color”: “white”}]}
{ “index”: { “_index”: “my-knn-index-1”, “_id”: “2” } }
{“nested_field”:
[{“my_vector”:[10,10,10], “color”: “red”},
{“my_vector”:[2,2,2], “color”: “green”},
{“my_vector”:[3,3,3], “color”: “black”}]}
Right now if I do a nested field KNN search for [1,1,1] I will get the “_id”:“1” with a score of 1 along with other nested vectors and then “_id”:“2” with all its nested vectors.
I want to have a weighted scoring. Instead of 1st doc, I want the second to show up because even though there is an exact match in doc 1, doc 2 has more vectors which are closer to my query.
Further, The results should probably score each of the nested vectors. For ex:
Query
[1,1,1], threshold score = 0.5
Response
“_id”: “2”
my_vector":[2,2,2] - Score 0.7
my_vector":[3,3,3] - Score 0.6
“_id”: “1”
my_vector":[1,1,1] - Score 1
Other vectors like my_vector":[10,10,10] is skipped because its below the threshold.
I think the first part of the approach did exist in OpenSearch 2.11 or 2.10.
Seeking information and resources to implement the above in OpenSearch.
Configuration:
Relevant Logs or Screenshots: