Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch 2.5
Describe the issue:
Hi, I was getting problematic results when applying Lucene HNSW filters on nested knn vectors. I followed the instructions in Search with k-NN filters - OpenSearch documentation and put the filter criteria within the knn_vector
field’s filter
subsection in the query plan, however the results returned weren’t really filtered based on the criteria.
Configuration:
index schema:
{
"settings": {
"index": {
"refresh_interval": "60s",
"number_of_shards": "72",
"number_of_replicas": "0",
"knn": true,
"knn.algo_param.ef_search": 100
}
},
"mappings": {
"properties": {
"documentId": {
"type": "keyword"
},
"embedding": {
"type": "nested",
"properties": {
"vector": {
"type": "knn_vector",
"dimension": 768,
"method": {
"name": "hnsw",
"space_type": "l2",
"engine": "lucene",
"parameters": {
"ef_construction": 100,
"m": 16
}
}
}
}
},
"cleanExplicitVariations": {
"type": "nested",
"properties": {
"cleanExplicit": {
"type": "keyword"
},
"region": {
"type": "keyword"
},
"regionalOverrides": {
"type": "keyword"
}
}
}
}
}
}
There’re 3MM documents in the index. The documents have “NOT_EXPLICIT” or “EXPLICIT” value for cleanExplicitVariations.cleanExplicit field.
query plan with filter on “NOT_EXPLICIT”:
{
"size": 10,
"query": {
"nested": {
"path": "embedding",
"query": {
"knn": {
"embedding.vector": {
"vector": [...],//768 dimension vectors
"k": 10,
"filter": {
"bool": {
"must": [{
"nested": {
"path": "cleanExplicitVariations",
"query": {
"bool": {
"must": {
"term": {
"cleanExplicitVariations.cleanExplicit": "NOT_EXPLICIT"
}
}
}
}
}
}]
}
}
}
}
}
}
}
}
However the results returned contain both “NOT_EXPLICIT” and “EXPLICIT”.
My question:
Can we use Lucene HNSW filter with nested vectors? Is there something wrong with the query plan?
If I reconstruct the filter with DSL the results are correct, but I need the additional functionality Lucene HNSW filter provides (where the algorithm chooses to use exact kNN or ANN).