Is it possible to use kNN-Search during aggregation?

Hey guys,
for a learning project I have set up an OpenSearch database and search it with kNN searches to find similar documents. Technically, this all works well, but it is a bit slow because I have to transport the vectors (length of 1000) twice over the network. The programme workflow looks like this:

  1. Select a document based on its _id and load the vectors into the client’s programme
  2. Do a similarity search with kNN search with the vector of the previous selected document

This step currently runs in two requests, whereby the vectors need a lot of time to be transported over the network. Therefore, I wonder if it is not possible to combine both requests with aggregation so that the vectors do not have to be sent.

  • Is such an aggregation possible?
  • Is something like this more efficient? - I think so, but I am not an expert in this field.

I am grateful for any help! - Have a nice day

Hi @Christopher2002
Performing two requests , first to identify the document and second to use the document in search is very common, but, in general, they will be either in different system or different index. I understand that in your case both request is using same index, hence, you want to minimize the number of calls. Have you considered source filtering to reduce the payload? In the meantime, can you share both of your request, that will help us to provide solution that fits best for your scenario.

Hey @Vijay
My both requests look like this:

  1. Select documents by ids
{
  query : {
    ids : { 
      values : ["id0", "id1"... ,"idn"]
    }
  },
  _source : {
    includes : ["vector"]
  }
}
  1. Find similar docs with different filters (like time range, author…)
{
  size : 5,
  query : {
    script_score : {
      query : {
        bool : {
          must : [
            {
              range : {
                timestamp : { gte : "now-7d" }
              }
            }
          ]
        }
      },
      script : {
        "source": "knn_score",
        "lang": "knn",
        "params": {
          "field": "vector",
          "query_value": [vector-of-firstly-selected-doc],
          "space_type": "cosinesimil"
        }
      }
    }
  },
  _source : {
    includes :  ["author", "permlink", ...]
  }
}

I already reduced the documents to retrieve only relevant fields.
Thank you for your Message!