Need help in understanding ANN implementation vs exact KNN

We have around 3.7M documents and we want to implement KNN search

  1. We tried with ANN using nmslib engine and it works great performance wise . But its behavior is non-deterministic. We are using following boolean query as we have 2 embeddings per document. Here is the query
"query": {
    "bool": {
      "should": [
        {
          "knn": {
              "description_vector": {
                "vector": <vector>,
                "k": 10000
              }
          }
        },
       {
          "knn": {
              "tag_vector": {
                "vector": <vector>,
                "k": 10000
              }
          }
        }
      ]
    }
  }

How does this query work ? Based on our experiments, this is what we noticed

  1. Fetch K documents just based on description_vector similarity score from each segment
  2. Fetch K documents just based on tag_vector similarity score from each segment
  3. Sum scores for each document to generate the final score

Is above logic correct ? if not then how does it work ?

If the above logic is true then sometimes a document X which was included in step 1 might not be included in step 2. The document X gets missed out because it is on the lower end of similarity score. We did see such behavior in our experiments. To avoid this problem, we set maximum allowed value of K for each vector but we have more than 10K documents per segment and this is not enough to solve our issue. So how do we ensure that we always check all documents ?

  1. We plan on using exact KNN but given the size of our data (3.7M documents with 2 embeddings per document which 384 dimension each ), not sure if this would scale. Does exact KNN also uses native memory to load all vectors ?
1 Like