Sporadic empty inner hits on nested kNN search

Hi there,

I am having some issues using a nested kNN search. The problem is that occasionally, documents will be returned without any inner hits. How is this possible?

Search result:

{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1},
 'hits': {'hits': [{'_id': 'bRkrI4IBuKQL3UqO8DkV',
                    '_index': 'synthetic_data_index',
                    '_score': 2.0406117,
                    '_source': {'nested_object': {'cool_vector_field': [0.2234513587608724, 0.8878394741163076, 0.3087446303001422, 0.5401258921662346, -0.9228053400350715],
                                                  'some_text_field': 'This is nested text for doc number 383'},
                                'non_nested_text': 'This doc is number 383'},
                    'inner_hits': {'nested_object': {'hits': {'hits': [{'_id': 'bRkrI4IBuKQL3UqO8DkV',
                                                                        '_index': 'synthetic_data_index',
                                                                        '_nested': {'field': 'nested_object', 'offset': 0},
                                                                        '_score': 2.0406117,
                                                                        '_source': {'cool_vector_field': [0.2234513587608724,
                                                                                                          0.8878394741163076,
                                                                                                          0.3087446303001422,
                                                                                                          0.5401258921662346,
                                                                                                          -0.9228053400350715],
                                                                                    'some_text_field': 'This is nested text for doc number 383'}}],
                                                              'max_score': 2.0406117,
                                                              'total': {'relation': 'eq', 'value': 1}}}}},
                   {'_id': 'bhkrI4IBuKQL3UqO8DkV',
                    '_index': 'synthetic_data_index',
                    '_score': 2.0406117,
                    '_source': {'nested_object': {'cool_vector_field': [-0.3667193179233421, 0.04664013242577236, -0.4679759075333949, 0.9335512141017783, 0.9847209912260526],
                                                  'some_text_field': 'This is nested text for doc number 384'},
                                'non_nested_text': 'This doc is number 384'},
                    'inner_hits': {'nested_object': {'hits': {'hits': [], 'max_score': None, 'total': {'relation': 'eq', 'value': 0}}}}},
                   {'_id': 'bxkrI4IBuKQL3UqO8DkV',
                    '_index': 'synthetic_data_index',
                    '_score': 2.0406117,
                    '_source': {'nested_object': {'cool_vector_field': [-0.9203098606975535, -0.8629298912981729, -0.4274567965220182, 0.5190442025173878, -0.32420767814040885],
                                                  'some_text_field': 'This is nested text for doc number 385'},
                                'non_nested_text': 'This doc is number 385'},
                    'inner_hits': {'nested_object': {'hits': {'hits': [], 'max_score': None, 'total': {'relation': 'eq', 'value': 0}}}}}],
          'max_score': 2.0406117,
          'total': {'relation': 'gte', 'value': 10000}},
 'status': 200,
 'timed_out': False,
 'took': 3}

Notice the two last hits have empty inner hits.
This is the query:

{'query': {'nested': {'inner_hits': {},
                      'path': 'nested_object',
                      'query': {'knn': {'nested_object.cool_vector_field': {'k': 3,
                                                                            'vector': [-0.53387915, -0.14078664, -0.41952186,  0.11891716, -0.30830444]}}},
                      'score_mode': 'max'}},
 'size': 3}

Index mappings:

{'mappings': {'properties': {'nested_object': {'properties': {'cool_vector_field': {'dimension': 5,
                                                                                    'method': {'engine': 'nmslib',
                                                                                               'name': 'hnsw',
                                                                                               'parameters': {'ef_construction': 128,
                                                                                                              'm': 24},
                                                                                               'space_type': 'innerproduct'},
                                                                                    'type': 'knn_vector'},
                                                              'some_text_field': {'type': 'text'}},
                                               'type': 'nested'}}},
 'settings': {'index': {'knn': True,
                        'knn.algo_param.ef_search': 100,
                        'refresh_interval': '30s'},
              'number_of_shards': 1}}

Note that this doesn’t happen for every search. This is more prevalent with datasets larger than 10000 docs. It seems like this issue is more prevalent:

  • The more docs there are
  • The more shards that are used
  • When size > k
  • When size and k are large

Perhaps this is a bug with the kNN plugin? Any help would be greatly appreciated!

After playing around with this problem more, it seems like all the documents retrieved without inner hits for a single query are groups of consecutively indexed documents. No idea why. For example:

[‘3395’, ‘3396’, ‘3397’, ‘3398’, ‘3399’, ‘4250’, ‘4251’, ‘4252’, ‘4253’, ‘4254’]

(I set the ID to correspond to the order the document is indexed)

Hi,
We are taking a look into this issue via github: [BUG] Sporadic empty inner hits on nested kNN search · Issue #466 · opensearch-project/k-NN · GitHub