Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch 2.13
Describe the issue:
Approximate kNN returns only a few hits than expected. I’m wondering what’s wrong with my configuration or my understanding.
The below query is to get all results without kNN
query.
{
query: {
bool: {
must: [
],
filter: [
{
term: {
some_field: {
value: some_field_value
},
},
},
{
term :{
another_field: {
value: another_field_value,
},
},
},
]
}
},
size: 100
}
returns
{
took: 37,
timed_out: false,
_shards: { total: 5, successful: 5, skipped: 0, failed: 0 },
hits: {
total: { value: 66, relation: 'eq' },
results, which is as expected. However, the kNN
query only returns much fewer hits, even though k
and size
should be big enough.
{
query: {
bool: {
must: [
{
knn: {
embedding: {
vector: query_embedding,
k: 100,
}
}
},
],
filter: [
{
term: {
some_field: {
value: some_field_value
},
},
},
{
term :{
another_field: {
value: another_field_value,
},
},
},
]
}
},
size: 100
}
returns only
{
took: 12254,
timed_out: false,
_shards: { total: 5, successful: 5, skipped: 0, failed: 0 },
hits: {
total: { value: 9, relation: 'eq' },
. I was expecting it to return all results. I am trying to understand and fix this situation. Currently, my kNN’s recall seems low and have no idea why. I checked missing documents and confirmed it has embedding
. Thank you for your great job!
Configuration:
Index name
my-rag-chunks
Health
Green
Status
Open
Creation date
6/23/2024, 7:51:10 PM
Total size
33.2gb
Size of primaries
16.5gb
Total documents
635298
Deleted documents
57353
Primaries
5
Replicas
1
About mapping,
"embedding": {
"dimension": 1536,
"method": {
"engine": "nmslib",
"space_type": "innerproduct",
"name": "hnsw",
"parameters": {}
},
"type": "knn_vector"
},
Relevant Logs or Screenshots: