kNN (nmslib) returns a fewer results than expected

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch 2.13

Describe the issue:
Approximate kNN returns only a few hits than expected. I’m wondering what’s wrong with my configuration or my understanding.

The below query is to get all results without kNN query.

{
      query: {
        bool: {
          must: [
          ],
          filter: [
            {
              term: {
                some_field: {
                  value: some_field_value
                },
              },              
            },
            {
              term :{
                another_field: {
                  value: another_field_value,
                },
              },
            },
          ]
        }
      },
      size: 100
    }

returns

{
  took: 37,
  timed_out: false,
  _shards: { total: 5, successful: 5, skipped: 0, failed: 0 },
  hits: {
    total: { value: 66, relation: 'eq' },

results, which is as expected. However, the kNN query only returns much fewer hits, even though k and size should be big enough.

{
      query: {
        bool: {
          must: [
            {
              knn: {
                embedding: {
                  vector: query_embedding,
                  k: 100,
                }
              }
            },
          ],
          filter: [
            {
              term: {
                some_field: {
                  value: some_field_value
                },
              },              
            },
            {
              term :{
                another_field: {
                  value: another_field_value,
                },
              },
            },
          ]
        }
      },
      size: 100
    }

returns only

{
  took: 12254,
  timed_out: false,
  _shards: { total: 5, successful: 5, skipped: 0, failed: 0 },
  hits: {
    total: { value: 9, relation: 'eq' },

. I was expecting it to return all results. I am trying to understand and fix this situation. Currently, my kNN’s recall seems low and have no idea why. I checked missing documents and confirmed it has embedding. Thank you for your great job!

Configuration:
Index name
my-rag-chunks

Health

Green

Status
Open

Creation date
6/23/2024, 7:51:10 PM

Total size
33.2gb

Size of primaries
16.5gb

Total documents
635298

Deleted documents
57353

Primaries
5

Replicas
1

About mapping,

    "embedding": {
      "dimension": 1536,
      "method": {
        "engine": "nmslib",
        "space_type": "innerproduct",
        "name": "hnsw",
        "parameters": {}
      },
      "type": "knn_vector"
    },

Relevant Logs or Screenshots:

I believe it is because an approximate kNN’s graph is skewed when filtered. Now I’m using kNN’s Scoring script filter, which is calculate kNN exactly.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.