Lucene HNSW nested knn with efficient filtering does not work for non-nested fields

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): OpenSearch2.5

Describe the issue:

Similar to issue #13191, but in my situation I need to filter nested vectors by fields outside of the nested object. Imagine each doc is a movie, and each movie has many frame images, and I have an embedding associated with each frame image. When searching frame images by vector similarity, I need to filter by attributes of the movie.

Indexing schema:

{
  "mappings": {
    "properties": {
      "movie_id": {
        "type": "keyword"
      },
      "frame_images": {
        "type": "nested",
        "properties": {
          "embedding": {
            "type": "knn_vector",
            "dimension": 512,
            "method": {
                    "name": "hnsw",
                    "space_type": "cosinesimil",
                    "engine": "lucene",
                    "parameters": {
                        "ef_construction": 512,
                        "m": 16
                    }
                }
          }
        }
      },
      "movie_type": {
          "type": "keyword"
       }
    }
  }
}

I wanted to leverage efficient filtering as described in the doc here k-NN search with filters - OpenSearch documentation . Query (searching for frame images and filter by movie type):

GET my_movies_with_frame_images/_search
{
  "query": {
    "bool": {
      "must": [{
        "nested": {
          "path": "frame_images",
          "query": {
            "knn": {
              "frame_images.embedding": {
                "k": 40,
                "vector": ...,
                "filter": {"terms": {"movie_type": ["drama", "thriller"]}}
              }
            }
          },
          "score_mode": "max",
          "inner_hits": {}
        }
      }]
    }
  },
  "sort": [{"_score": {"order": "desc"}}],
}

The query above would return empty results.

I had to bypass that by use post-filtering (k-NN search with filters - OpenSearch documentation) instead, using the following query:

GET my_movies_with_frame_images/_search
{
  "query": {
    "bool": {
      "must": [{
        "nested": {
          "path": "frame_images",
          "query": {
            "knn": {
              "frame_images.embedding": {
                "k": 40,
                "vector": ...,
              }
            }
          },
          "score_mode": "max",
          "inner_hits": {}
        }
      }],
      "filter": {"terms": {"movie_type": ["drama", "thriller"]}}
    }
  },
  "sort": [{"_score": {"order": "desc"}}],
}

This works. However, the performance of post filtering is not as great as efficient filtering, and it uses more resources. I hope that efficient filtering can also work with nested knn fields.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.