Hi there,
I am having some issues using a nested kNN search. The problem is that occasionally, documents will be returned without any inner hits. How is this possible?
Search result:
{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1},
'hits': {'hits': [{'_id': 'bRkrI4IBuKQL3UqO8DkV',
'_index': 'synthetic_data_index',
'_score': 2.0406117,
'_source': {'nested_object': {'cool_vector_field': [0.2234513587608724, 0.8878394741163076, 0.3087446303001422, 0.5401258921662346, -0.9228053400350715],
'some_text_field': 'This is nested text for doc number 383'},
'non_nested_text': 'This doc is number 383'},
'inner_hits': {'nested_object': {'hits': {'hits': [{'_id': 'bRkrI4IBuKQL3UqO8DkV',
'_index': 'synthetic_data_index',
'_nested': {'field': 'nested_object', 'offset': 0},
'_score': 2.0406117,
'_source': {'cool_vector_field': [0.2234513587608724,
0.8878394741163076,
0.3087446303001422,
0.5401258921662346,
-0.9228053400350715],
'some_text_field': 'This is nested text for doc number 383'}}],
'max_score': 2.0406117,
'total': {'relation': 'eq', 'value': 1}}}}},
{'_id': 'bhkrI4IBuKQL3UqO8DkV',
'_index': 'synthetic_data_index',
'_score': 2.0406117,
'_source': {'nested_object': {'cool_vector_field': [-0.3667193179233421, 0.04664013242577236, -0.4679759075333949, 0.9335512141017783, 0.9847209912260526],
'some_text_field': 'This is nested text for doc number 384'},
'non_nested_text': 'This doc is number 384'},
'inner_hits': {'nested_object': {'hits': {'hits': [], 'max_score': None, 'total': {'relation': 'eq', 'value': 0}}}}},
{'_id': 'bxkrI4IBuKQL3UqO8DkV',
'_index': 'synthetic_data_index',
'_score': 2.0406117,
'_source': {'nested_object': {'cool_vector_field': [-0.9203098606975535, -0.8629298912981729, -0.4274567965220182, 0.5190442025173878, -0.32420767814040885],
'some_text_field': 'This is nested text for doc number 385'},
'non_nested_text': 'This doc is number 385'},
'inner_hits': {'nested_object': {'hits': {'hits': [], 'max_score': None, 'total': {'relation': 'eq', 'value': 0}}}}}],
'max_score': 2.0406117,
'total': {'relation': 'gte', 'value': 10000}},
'status': 200,
'timed_out': False,
'took': 3}
Notice the two last hits have empty inner hits.
This is the query:
{'query': {'nested': {'inner_hits': {},
'path': 'nested_object',
'query': {'knn': {'nested_object.cool_vector_field': {'k': 3,
'vector': [-0.53387915, -0.14078664, -0.41952186, 0.11891716, -0.30830444]}}},
'score_mode': 'max'}},
'size': 3}
Index mappings:
{'mappings': {'properties': {'nested_object': {'properties': {'cool_vector_field': {'dimension': 5,
'method': {'engine': 'nmslib',
'name': 'hnsw',
'parameters': {'ef_construction': 128,
'm': 24},
'space_type': 'innerproduct'},
'type': 'knn_vector'},
'some_text_field': {'type': 'text'}},
'type': 'nested'}}},
'settings': {'index': {'knn': True,
'knn.algo_param.ef_search': 100,
'refresh_interval': '30s'},
'number_of_shards': 1}}
Note that this doesn’t happen for every search. This is more prevalent with datasets larger than 10000 docs. It seems like this issue is more prevalent:
- The more docs there are
- The more shards that are used
- When size > k
- When size and k are large
Perhaps this is a bug with the kNN plugin? Any help would be greatly appreciated!