Knn_score script is giving unexpected document scores on nested fields

Good day all,

I have trying different queries in kNN plugin. I notices that it gives unexpected document scores when using knn_score script on nested fields as shown below. The inner score is 1.7148095 while the document score is actually less (1.4354433). This eventually results in having relevant documents down the list. Note also that using a-KNN (kNN query) works fine and gives the expected result.

Query:
GET got-knn-v64c300/_search
{
“_source”: {
“includes”: [“title”]
},
“size”: 1,
“query”: {
“nested”: {
“path”: “embeddings”,
“query”: {
“script_score”: {
“query”: {
“match_all”: {}
},
“script”: {
“source”: “knn_score”,
“lang”: “knn”,
“params”: {
“field”: “embeddings.vector”,
“query_value”: [ 0.0749, -0.0284, -0.0334, 0.0742, -0.1930, 0.0919, 0.0302, 0.0585,
-0.0030, -0.0676, 0.0867, -0.0101, 0.0473, 0.0287, 0.0030, -0.1234,
-0.0840, 0.0298, -0.1001, -0.0815, 0.0542, -0.0110, -0.0674, -0.0742,
0.1060, 0.0701, -0.0031, -0.0129, 0.0322, -0.0073, 0.0654, -0.0259,
-0.0467, 0.0150, -0.1977, -0.1922, -0.0051, -0.0181, 0.0132, -0.0731,
-0.0283, 0.0203, 0.0092, 0.0778, 0.0541, -0.0025, 0.0916, 0.0038,
0.0190, -0.0840, 0.0208, -0.0481, -0.1400, -0.1270, -0.0078, 0.0475,
0.1077, -0.0318, 0.0420, -0.0100, 0.0003, 0.0179, -0.0715, -0.0926],
“space_type”: “cosinesimil”
}
}
}

 },
 "inner_hits": {
        "size": 1, 
        "_source": {
          "includes": ["embeddings.sentence"]
        }
      }

}

}
}

Result:
{
“took” : 34,
“timed_out” : false,
“_shards” : {
“total” : 1,
“successful” : 1,
“skipped” : 0,
“failed” : 0
},
“hits” : {
“total” : {
“value” : 130,
“relation” : “eq”
},
“max_score” : 1.4354433,
“hits” : [
{
“_index” : “got-knn-v64c300”,
“_type” : “_doc”,
“_id” : “82”,
“_score” : 1.4354433,
“_source” : {
“title” : “37_Joffrey_Baratheon.txt”
},
“inner_hits” : {
“embeddings” : {
“hits” : {
“total” : {
“value” : 74,
“relation” : “eq”
},
“max_score” : 1.7148095,
“hits” : [
{
“_index” : “got-knn-v64c300”,
“_type” : “_doc”,
“_id” : “82”,
“_nested” : {
“field” : “embeddings”,
“offset” : 8
},
“_score” : 1.7148095,
“_source” : {
“sentence” : “In reality, his biological father is his mother’s twin brother, Jaime Lannister.”
}
}
]
}
}
}
}
]
}
}

Schema:
{
“got-knn-v64c300” : {
“aliases” : { },
“mappings” : {
“properties” : {
“embeddings” : {
“type” : “nested”,
“properties” : {
“sentence” : {
“type” : “text”
},
“vector” : {
“type” : “knn_vector”,
“dimension” : 64,
“method” : {
“engine” : “nmslib”,
“space_type” : “l2”,
“name” : “hnsw”,
“parameters” : {
“ef_construction” : 128,
“m” : 24
}
}
}
}
},
“text” : {
“type” : “text”
},
“title” : {
“type” : “text”,
“fields” : {
“keyword” : {
“type” : “keyword”,
“ignore_above” : 256
}
}
}
}
},
“settings” : {
“index” : {
“number_of_shards” : “1”,
“knn.algo_param” : {
“ef_search” : “100”
},
“provided_name” : “got-knn-v64c300”,
“knn” : “true”,
“creation_date” : “1636898624187”,
“number_of_replicas” : “1”,
“uuid” : “YUZpIo-WTEqbXNBLyamR8A”,
“version” : {
“created” : “135227827”
}
}
}
}
}

@jmazane - thoughts?

I believe this is related to [BUG] Sporadic empty inner hits on nested kNN search · Issue #466 · opensearch-project/k-NN · GitHub. Apologies for the delay

1 Like