k-NN multiple field search in OpenSearch

Hi.

Assume that we have this index in OpenSearch:

 {
    "settings": {
        "index.knn": True,
        "number_of_replicas": 0,
        "number_of_shards": 1,
    },
    "mappings": {
        "properties": {
            "title": {"type": "text"},
            "tag": {"type": "text"},
            "e1": {
                "type": "knn_vector",
                "dimension": 512,
                "method": {
                    "name": "hnsw",
                    "space_type": "cosinesimil",
                    "engine": "nmslib",
                    "parameters": {"ef_construction": 512, "m": 24},
                },
            },
            "e2": {
                "type": "knn_vector",
                "dimension": 512,
                "method": {
                    "name": "hnsw",
                    "space_type": "cosinesimil",
                    "engine": "nmslib",
                    "parameters": {"ef_construction": 512, "m": 24},
                },
            },
            "e3": {
                "type": "knn_vector",
                "dimension": 512,
                "method": {
                    "name": "hnsw",
                    "space_type": "cosinesimil",
                    "engine": "nmslib",
                    "parameters": {"ef_construction": 512, "m": 24},
                },
            },
        }
    },
}

And we want to perform a search over all the fields (approximate knn for the vector fields). What would be the correct way to do this in OpenSearch?

In other words, I want to know how this or this which is for ElasticSearch can be done in OpenSearch.

I have this query that works but I’m not sure if it is the correct way of doing this and if it uses approximate knn:

{
    "size": 10,
    "query": {
        "bool": {
            "should": [
                {
                    "function_score": {
                        "query": {
                            "knn": {
                                "e1": {
                                    "vector": [0, 1, 2, 3],
                                    "k": 10,
                                },
                            }
                        },
                        "weight": 1,
                    }
                },
                {
                    "function_score": {
                        "query": {
                            "knn": {
                                "e2": {
                                    "vector": [0, 1, 2, 3],
                                    "k": 10,
                                },
                            }
                        },
                        "weight": 1,
                    }
                },
                {
                    "function_score": {
                        "query": {
                            "knn": {
                                "e3": {
                                    "vector": [0, 1, 2, 3],
                                    "k": 10,
                                },
                            }
                        },
                        "weight": 1,
                    }
                },
                {
                    "function_score": {
                        "query": {
                            "match": {"title": "title"}
                        },
                        "weight": 0.1,
                    }
                },
                {
                    "function_score": {
                        "query": {"match": {"tag": "tag"}},
                        "weight": 0.1,
                    }
                },
            ]
        }
    },
    "_source": False,
}

I did not find documentation on function_score for OpenSearch so I’m confused about what the above query does. Can someone explain the query?

1 Like

@jmazane @vamshin - could you assist @Alireza on this question? Thank you

1 Like

I asked the question on StackOverflow too. I think the query does combine approximate kNN with other features like the upcoming feature in ElasticSearch. I would still appreciate it if you would let me know if I’m right or wrong or if this is the recommended way to do it in OpenSearch.

Though, there is the problem of duplicate/missing results when paginating such a query. I have opened an issue on GitHub describing the issue in detail. I would also appreciate it if you could share any insight regarding that issue.

Thank you.

Hi @Alireza, you seem to be on right path regarding combining knn query with other fields. Let us know if you see issues.

On the paginating issue, we will get back asap.

1 Like

@Alireza Function score is supported in OpenSearch. Let me take a stab at explaining the query execution which you have written. I can see that you have added 3 k-NN fields in the query and 2 text based queries. A simple way to understand the above query is at a data node level is all the sub-queries which are wrapped in function scores will be executed and the scores of the documents which are produced will be multiplied with the weight which you have provided. Once this is done, for every documentId which has a scores from the queries will be added, if a document id is not present in 1 query and it is present on another then scores its score where it is missing will be thought as 0.
For K-NN, every sub query will provide at max 10 results, as you have set k=10, but for text match queries there can be more or less depending on whether the texts can be found or not.

One thing to note:
The explanation provided above is very high level but good for understanding. Please don’t assume that query will be executed like this, there are many optimizations that can happen.

Difference Between ES and OpenSearch K-NN Query Execution:
The k-NN query execution that happens in Elastic Search is very different from how it is performed in OpenSearch. ES treats K-NN Query different in a way that if a K-NN Query clause is present in _search api call, the search type becomes DFS_QUERY_AND_FETCH, where they will first retrieve all the K-NN results creates top K out of it, and send those top K document Ids to each shard. These top K are then combined with text match queries. Reference.

This is different in OpenSearch, where OpenSearch uses default search type which is QUERY_AND_THEN_FETCH. OpenSearch will send out the both K-NN and Text match queries to every shard and each shard will combine the scores of documents that they get in k-NN and text match query.

You further see this, in the way both the _Search api payload is created. In ES, the k-nn query clause is outside of “query” object, but it is inside “query” object in OpenSearch.

Please let us know if this helps.

1 Like

Thanks for the explanation @Navneet .

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.