Is it possible to use kNN-Search during aggregation?

Christopher2002 · April 13, 2022, 11:34am

Hey guys,
for a learning project I have set up an OpenSearch database and search it with kNN searches to find similar documents. Technically, this all works well, but it is a bit slow because I have to transport the vectors (length of 1000) twice over the network. The programme workflow looks like this:

Select a document based on its _id and load the vectors into the client’s programme
Do a similarity search with kNN search with the vector of the previous selected document

This step currently runs in two requests, whereby the vectors need a lot of time to be transported over the network. Therefore, I wonder if it is not possible to combine both requests with aggregation so that the vectors do not have to be sent.

Is such an aggregation possible?
Is something like this more efficient? - I think so, but I am not an expert in this field.

I am grateful for any help! - Have a nice day

Vijay · April 18, 2022, 6:09pm

Hi @Christopher2002
Performing two requests , first to identify the document and second to use the document in search is very common, but, in general, they will be either in different system or different index. I understand that in your case both request is using same index, hence, you want to minimize the number of calls. Have you considered source filtering to reduce the payload? In the meantime, can you share both of your request, that will help us to provide solution that fits best for your scenario.

Christopher2002 · June 20, 2022, 2:11pm

Hey @Vijay
My both requests look like this:

Select documents by ids

{
  query : {
    ids : { 
      values : ["id0", "id1"... ,"idn"]
    }
  },
  _source : {
    includes : ["vector"]
  }
}

Find similar docs with different filters (like time range, author…)

{
  size : 5,
  query : {
    script_score : {
      query : {
        bool : {
          must : [
            {
              range : {
                timestamp : { gte : "now-7d" }
              }
            }
          ]
        }
      },
      script : {
        "source": "knn_score",
        "lang": "knn",
        "params": {
          "field": "vector",
          "query_value": [vector-of-firstly-selected-doc],
          "space_type": "cosinesimil"
        }
      }
    }
  },
  _source : {
    includes :  ["author", "permlink", ...]
  }
}

I already reduced the documents to retrieve only relevant fields.
Thank you for your Message!

Topic		Replies	Views
Approximate KNN search with vector ID instead of vector data k-NN	2	382	July 18, 2023
Bulk KNN vector query k-NN	1	803	August 5, 2021
One filter for multiple vectors in _msearch k-NN	2	851	May 21, 2021
Approximate k-NN with pre-filter k-NN	6	3981	January 30, 2025
Reranking results with multiple vectors k-NN	6	2637	June 22, 2020

Is it possible to use kNN-Search during aggregation?

Related topics