Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): OpenSearch2.5
Describe the issue:
Similar to issue #13191, but in my situation I need to filter nested vectors by fields outside of the nested object. Imagine each doc is a movie, and each movie has many frame images, and I have an embedding associated with each frame image. When searching frame images by vector similarity, I need to filter by attributes of the movie.
Indexing schema:
{
"mappings": {
"properties": {
"movie_id": {
"type": "keyword"
},
"frame_images": {
"type": "nested",
"properties": {
"embedding": {
"type": "knn_vector",
"dimension": 512,
"method": {
"name": "hnsw",
"space_type": "cosinesimil",
"engine": "lucene",
"parameters": {
"ef_construction": 512,
"m": 16
}
}
}
}
},
"movie_type": {
"type": "keyword"
}
}
}
}
I wanted to leverage efficient filtering as described in the doc here k-NN search with filters - OpenSearch documentation . Query (searching for frame images and filter by movie type):
GET my_movies_with_frame_images/_search
{
"query": {
"bool": {
"must": [{
"nested": {
"path": "frame_images",
"query": {
"knn": {
"frame_images.embedding": {
"k": 40,
"vector": ...,
"filter": {"terms": {"movie_type": ["drama", "thriller"]}}
}
}
},
"score_mode": "max",
"inner_hits": {}
}
}]
}
},
"sort": [{"_score": {"order": "desc"}}],
}
The query above would return empty results.
I had to bypass that by use post-filtering (k-NN search with filters - OpenSearch documentation) instead, using the following query:
GET my_movies_with_frame_images/_search
{
"query": {
"bool": {
"must": [{
"nested": {
"path": "frame_images",
"query": {
"knn": {
"frame_images.embedding": {
"k": 40,
"vector": ...,
}
}
},
"score_mode": "max",
"inner_hits": {}
}
}],
"filter": {"terms": {"movie_type": ["drama", "thriller"]}}
}
},
"sort": [{"_score": {"order": "desc"}}],
}
This works. However, the performance of post filtering is not as great as efficient filtering, and it uses more resources. I hope that efficient filtering can also work with nested knn fields.