Versions:
AWS Managed OpenSearch upgraded to 2.19.
Working in Dashboards v2.19.0 Dev Console.
Describe the issue:
I am trying to use Hybrid Score Explanation Processor and replicate the example in the docs here. My goal is to get the pre-normalized scores from the KNN phase. Please, correct me if I am wrong, but as I understood I should be able to get those with this processor.
I am working through Dashboards Dev Console. I have created sample index, populated it with sample data, created same processor as per docs and performed similar query (knn instead of neural).
Unfortunately, in the response I cannot see the explanation of KNN search. There is full explanation for the match phase but for knn I get this:
{
"value": 1,
"description": "min_max normalization of:",
"details": [
{
"value": 1,
"description": "No Explanation",
"details": []
}
]
}
In the docs it should look something like this:
{
"value": 0.8503647,
"description": "min_max normalization of:",
"details": [
{
"value": 0.015177966,
"description": "within top 5",
"details": []
}
]
}
Query and doc vectors are different to the potential pre-normalized score cannot be 1 and instead of "description": "within top 5" it says "description": "No Explanation".
Please, confirm whether it is possible to get the pre-normalized scores for the neural or KNN phase with hybrid query and this explainer processor and if yes, please help me configure it correctly.
Also, does the vector engine and method affect this issue? Here I have used faiss but in working environment we used nmslib because of the requirement to use cosinesimil space.
… and please excuse me for possible rookie mistakes or misunderstandings. I am relatively new to OpenSearch…
Configuration:
This are the steps that I have performed.
Test index creation:
PUT test-index
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"my_vector1": {
"type": "knn_vector",
"dimension": 8,
"method": {
"name": "hnsw",
"space_type": "cosinesimil",
"engine": "faiss"
}
},
"my_text1": {
"type": "text",
"analyzer": "standard"
}
}
}
}
Population with sample data:
POST test-index/_bulk
{ "index": { "_id": "1" } }
{ "my_text1": "The quick brown fox jumps over the lazy dog", "my_vector1": [0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45] }
{ "index": { "_id": "2" } }
{ "my_text1": "OpenSearch makes vector search easy", "my_vector1": [0.50, 0.45, 0.40, 0.35, 0.30, 0.25, 0.20, 0.15] }
{ "index": { "_id": "3" } }
{ "my_text1": "Sample document for KNN vector testing", "my_vector1": [0.12, 0.22, 0.32, 0.42, 0.52, 0.62, 0.72, 0.82] }
Hybrid search explainer processor in hybrid pipeline:
PUT /_search/pipeline/test-hse-pipeline
{
"description": "Post processor for hybrid search",
"phase_results_processors": [
{
"normalization-processor": {
"normalization": {
"technique": "min_max"
},
"combination": {
"technique": "arithmetic_mean"
}
}
}
],
"response_processors": [
{
"hybrid_score_explanation": {}
}
]
}
Test search query:
GET test-index/_search?search_pipeline=test-hse-pipeline&explain=true
{
"size": 1,
"_source": false,
"query": {
"hybrid": {
"queries": [
{
"match": {
"my_text1": {
"query": "quick fox vector testing"
}
}
},
{
"knn": {
"my_vector1": {
"vector": [0.0, 0.15, 0.23, 0.25, 0.30, 0.35, 0.40, 0.45],
"min_score": 0.2
}
}
}
]
}
}
}
Relevant Logs or Screenshots:
Response of test search query with explainer:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_shard": "[test-index][0]",
"_node": "x6xPcNjgTFiCJNXv0-yTjw",
"_index": "test-index",
"_id": "3",
"_score": 1,
"_explanation": {
"value": 1,
"description": "arithmetic_mean combination of:",
"details": [
{
"value": 1,
"description": "min_max normalization of:",
"details": [
{
"value": 0.5753642,
"description": "sum of:",
"details": [
{
"value": 0.2876821,
"description": "weight(my_text1:vector in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.2876821,
"description": "score(freq=1.0), computed as boost * idf * tf from:",
"details": [
{
"value": 2.2,
"description": "boost",
"details": []
},
{
"value": 0.2876821,
"description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details": [
{
"value": 1,
"description": "n, number of documents containing term",
"details": []
},
{
"value": 1,
"description": "N, total number of documents with field",
"details": []
}
]
},
{
"value": 0.45454544,
"description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details": [
{
"value": 1,
"description": "freq, occurrences of term within document",
"details": []
},
{
"value": 1.2,
"description": "k1, term saturation parameter",
"details": []
},
{
"value": 0.75,
"description": "b, length normalization parameter",
"details": []
},
{
"value": 6,
"description": "dl, length of field",
"details": []
},
{
"value": 6,
"description": "avgdl, average length of field",
"details": []
}
]
}
]
}
]
},
{
"value": 0.2876821,
"description": "weight(my_text1:testing in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.2876821,
"description": "score(freq=1.0), computed as boost * idf * tf from:",
"details": [
{
"value": 2.2,
"description": "boost",
"details": []
},
{
"value": 0.2876821,
"description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details": [
{
"value": 1,
"description": "n, number of documents containing term",
"details": []
},
{
"value": 1,
"description": "N, total number of documents with field",
"details": []
}
]
},
{
"value": 0.45454544,
"description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details": [
{
"value": 1,
"description": "freq, occurrences of term within document",
"details": []
},
{
"value": 1.2,
"description": "k1, term saturation parameter",
"details": []
},
{
"value": 0.75,
"description": "b, length normalization parameter",
"details": []
},
{
"value": 6,
"description": "dl, length of field",
"details": []
},
{
"value": 6,
"description": "avgdl, average length of field",
"details": []
}
]
}
]
}
]
}
]
}
]
},
{
"value": 1,
"description": "min_max normalization of:",
"details": [
{
"value": 1,
"description": "No Explanation",
"details": []
}
]
}
]
}
}
]
}
}