Knn plugin output different from nmslib's output

vchauras · September 12, 2022, 2:54am

Hi folks, we are using ES 7.10.2 with knn-plugin version “opendistro-knn 1.13.0.0”. We are using the documentation here to query ES - Approximate Search - Open Distro Documentation

We have 2 setups:

an ES+knn setup
standalone application with nmslib (java+jni) giving back ANN.

In both the setups we use the same hnsw graph construction params like

         "knn.algo_param.ef_construction": 128,
         "knn.algo_param.ef_search": 512,
         "knn.algo_param.m": 32,
         "knn.space_type": "cosinesimil",

Setup2 is already in production where we fetch 1K approx-NN. We are now comparing it with the output of setup1 (i.e. ES) where we also fetch 1K. But the candidates from ES are not overlapping with Setup2 which is very strange. Out of 1K only < 10 candidates are overlapping sometimes.

We query ES like follows:

GET qa_all/_search
{
  "size": 1000,
  "query": {
    "knn": {
      "emb": {
        "vector": [-0.04278,-0.06262,0.00382,0.01775,-0.03303,-0.09063,-0.03692,-0.08344,0.12808,0.08798,-0.14282,-0.03064,-0.08949,-0.1082,0.03501,0.02708,-0.08741,-0.09282,0.0877,-0.01165,-0.10274,0.01546,-0.04741,-0.04248,0.11622,-0.10024,0.01163,-0.07968,-0.06767,-0.13108,-0.08212,-0.06128,0.15309,0.40923,-0.02716,-0.12542,-0.13863,-0.08499,-0.05578,0.0042,-0.03063,0.00528,-0.01689,-0.00227,0.14426,-0.05696,0.03857,-0.0407,-0.04939,0.01294,-0.09725,-0.05225,0.04993,0.19513,-0.01119,-0.12615,-0.04428,0.0737,-0.06909,-0.07091,0.07381,0.03361,0.13018,-0.03799,-0.07998,0.13459,0.03418,0.02439,-0.08377,-0.07341,-0.10374,-0.09418,0.02707,-0.08679,0.08897,-0.02819,-0.04146,-0.10599,-0.06915,-0.05306,-0.06337,-0.0146,-0.07916,-0.07958,-0.03024,0.13883,0.16783,0.04984,-0.04686,0.01605,0.02205,0.04318,-0.00442,-0.06357,-0.05019,-0.09602,-0.01005,-0.10499,0.00293,-0.08889,0.00955,-0.09967,-0.09681,-0.08478,0.07651,-0.08328,-0.10132,0.11603,-0.16051,-0.04568,-0.02952,-0.03266,-0.10144,-0.11449,-0.02146,-0.02963,0.02939,-0.08336,0.17852,0.0613,0.15482,0.0029,0.00157,-0.10472,0.01327,-0.04802,-0.15749,-0.04988],
        "k": 1000
      }
    }
  }
}

Our ES has 1 shard/1-index having ~ 14M documents. Setup2 also has the same 14M documents.

Can someone tell us why ES has such less overlap? Does ES internally does some more magic rather than just calling the nmslib’s internally jni method for knnquery? Setup2 is in production and already showing good results while ES is something we preferred to use; but now we are confused.

Topic		Replies	Views
Different results for Nmslib and Elastic Knn Search k-NN	21	4443	August 6, 2020
Elastiknn vs Opendistro KNN k-NN	2	820	March 26, 2021
Is the K-NN comparable with the latest Elasticsearch vector search feature? Open Source Elasticsearch and Kibana	2	1100	October 2, 2019
kNN (nmslib) returns a fewer results than expected k-NN troubleshoot	2	217	August 26, 2024
Cosine Similarity Formula k-NN	17	4993	December 29, 2020

Knn plugin output different from nmslib's output

Related topics