Lucene HNSW filter with nested knn vectors not working

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch 2.5

Describe the issue:
Hi, I was getting problematic results when applying Lucene HNSW filters on nested knn vectors. I followed the instructions in Search with k-NN filters - OpenSearch documentation and put the filter criteria within the knn_vector field’s filter subsection in the query plan, however the results returned weren’t really filtered based on the criteria.

Configuration:
index schema:

{
  "settings": {
      "index": {
        "refresh_interval": "60s",
        "number_of_shards": "72",
        "number_of_replicas": "0",
        "knn": true,
        "knn.algo_param.ef_search": 100
      }
  },
  "mappings": {
    "properties": {
      "documentId": {
        "type": "keyword"
      },
      "embedding": {
        "type": "nested",
        "properties": {
          "vector": {
            "type": "knn_vector",
            "dimension": 768,
            "method": {
                    "name": "hnsw",
                    "space_type": "l2",
                    "engine": "lucene",
                    "parameters": {
                        "ef_construction": 100,
                        "m": 16
                    }
                }
          }
        }
      },
      "cleanExplicitVariations": {
          "type": "nested",
          "properties": {
            "cleanExplicit": {
              "type": "keyword"
            },
            "region": {
              "type": "keyword"
            },
            "regionalOverrides": {
              "type": "keyword"
            }
          }
        }
    }
  }
}

There’re 3MM documents in the index. The documents have “NOT_EXPLICIT” or “EXPLICIT” value for cleanExplicitVariations.cleanExplicit field.

query plan with filter on “NOT_EXPLICIT”:

{
	"size": 10,
	"query": {
		"nested": {
			"path": "embedding",
			"query": {
				"knn": {
					"embedding.vector": {
						"vector": [...],//768 dimension vectors
						"k": 10,
						"filter": {
							"bool": {
								"must": [{
									"nested": {
										"path": "cleanExplicitVariations",
										"query": {
											"bool": {
												"must": {
													"term": {
														"cleanExplicitVariations.cleanExplicit": "NOT_EXPLICIT"
													}
												}
											}
										}
									}
								}]
							}
						}
					}
				}
			}
		}
	}
}

However the results returned contain both “NOT_EXPLICIT” and “EXPLICIT”.

My question:
Can we use Lucene HNSW filter with nested vectors? Is there something wrong with the query plan?
If I reconstruct the filter with DSL the results are correct, but I need the additional functionality Lucene HNSW filter provides (where the algorithm chooses to use exact kNN or ANN).

@jmazane @vamshin @nknize - would you have any insight to share with @wizhang on this question? thank you

@martin.g is already looking into it.

@kris @wizhang

1 Like

Hello,

For this question we need more time to investigate properly. Initial understanding is that it should work as you’ve described - do the pre-filtering, however during testing I’ve found that system gives me empty results, which is not as per expectations. I’ll do more deep dive to figure out what is/if anything wrong with mapping and query.

Hi,

After checking all options I’ve found that in order to make it work you need to put type used in filter under the main nested type used for knn query. It’s related to internals of how context of nested types are passed, basically path of the main query used as a basis for filter sub-query.

I suggest you use types like below:

mapping

{
    "size": 10,
    "query": {
        "nested": {
            "path": "embedding",
            "query": {
                "knn": {
                    "embedding.vector": {
                        "vector": [
                            3.2,
                            3.1
                        ],
                        "k": 2,
                        "filter": {
                            "bool": {
                                "must": [
                                    {
                                        "nested": {
                                            "path": "embedding.cleanExplicitVariations",
                                            "query": {
                                                "term": {
                                                    "embedding.cleanExplicitVariations.cleanExplicit": "NOT_EXPLICIT"
                                                }
                                            }
                                        }
                                    }
                                ]
                            }
                        }
                    }
                }
            }
        }
    }
}

data upload

{
    "embedding": {
        "vector": [
            3.1,
            2.9
        ],
        "cleanExplicitVariations": {
            "cleanExplicit": "NOT_EXPLICIT"
        }
    }
}

search query

{
    "size": 10,
    "query": {
        "nested": {
            "path": "embedding",
            "query": {
                "knn": {
                    "embedding.vector": {
                        "vector": [
                            3.2,
                            3.1
                        ],
                        "k": 2,
                        "filter": {
                            "bool": {
                                "must": [
                                    {
                                        "nested": {
                                            "path": "embedding.cleanExplicitVariations",
                                            "query": {
                                                "term": {
                                                    "embedding.cleanExplicitVariations.cleanExplicit": "NOT_EXPLICIT"
                                                }
                                            }
                                        }
                                    }
                                ]
                            }
                        }
                    }
                }
            }
        }
    }
}

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.