Getting different results in vector scores?

dziedrius · July 10, 2024, 2:42pm

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
using latest docker image: opensearchproject/opensearch:latest

Describe the issue:
We’re testing vector search and so far not happy with results, wondering what have we done wrong.

I have test document with content:

“Ebola Virus Disease (EVD) and encourage U.S. hospitals to prepare for managing patients with\r\nEbola and other infectious diseases. Every hospital should ensure that it can detect a patient with\r\nEbola, protect healthcare workers so they can safely care for the patient, and respond in a coordinated fashion. Many of the signs and symptoms of Ebola are non-specific and similar to those of many common”

and use OpenAI embedding model “text-embedding-3-large” to generate embedding vectors.

The problem is, that if I search for not related word, I get very similar score as to word that actually exists in the document:

“dog Rex”: score 0.5197706
“virus”: score 0.5711079

When I do manual cosine equality in C# code, I get quite different values:

“dog Rex”: score 0.07607428956253202
“virus”: score 0.24901766327076158

While not perfect, but that’s ~6 times difference, compared to 10% difference in Open Search.

My manual cosine equality function looks like this:

double CalculateCosineSimilarity(float[] vector1, float[] vector2)
{
    if (vector1.Length != vector2.Length)
    {
        throw new ArgumentException("Vectors must be of equal length.");
    }

    double dotProduct = vector1.Zip(vector2, (a, b) => a * b).Sum();
    double magnitude1 = Math.Sqrt(vector1.Sum(a => a * a));
    double magnitude2 = Math.Sqrt(vector2.Sum(b => b * b));
        
    return dotProduct / (magnitude1 * magnitude2);
}

Configuration:

I’ve created index as following (initially tried with ef_construction 128, later increased to 500):

PUT /my-index
{
  "settings": {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text"
      },
      "content_vector": {
        "type": "knn_vector",
        "dimension": 3072,
        "method": {
          "name": "hnsw",
          "space_type": "cosinesimil",
          "engine": "nmslib",
          "parameters": {
            "ef_construction": 500,
            "m": 16
          }
        }
      }
    }
  }
}

post document:

POST /my-index/_doc/1
{
  "content": "Ebola Virus Disease (EVD) and encourage U.S. hospitals to prepare for managing patients with\r\nEbola and other infectious diseases. Every hospital should ensure that it can detect a patient with\r\nEbola, protect healthcare workers so they can safely care for the patient, and respond in a coordinated fashion. Many of the signs and symptoms of Ebola are non-specific and similar to those of many common",
  "content_vector": [  ...omitted for brevity...  ]
}

search:

POST /my-index/_search
{
  "size": 10,
  "query": {
    "knn": {
      "content_vector": {
        "vector": [  ...omitted for brevity...  ],
        "k": 10
      }
    }
  }
}

Relevant Logs or Screenshots:

dziedrius · July 15, 2024, 5:28am

Seems that there’s not too much traffic or the question is too complex

I did simplify the problem to the minimum - I try to index two simplest vectors - [0,1] and [0,-1]. If cosine similarity is a cosine over angle of vectors, angle is 180 degrees, cosine value should be -1, but I’m getting 0.33:

POST /my-index2/_search
{
  "size": 10,
  "query": {
    "knn": {
      "content_vector": {
        "vector": [0, 1],
        "k": 10
      }
    }
  }
}

result:

 "hits": [
      {
        "_index": "my-index2",
        "_id": "1",
        "_score": 0.9999999,
        "_source": {
          "content": "up",
          "content_vector": [
            0,
            1
          ]
        }
      },
      {
        "_index": "my-index2",
        "_id": "2",
        "_score": 0.33333334,
        "_source": {
          "content": "down",
          "content_vector": [
            0,
            -1
          ]
        }
      }
    ]

so either score is not a cosine similarity or I’m missing something very obvious.

Navneet · July 16, 2024, 6:08am

@dziedrius let me try to ans this.

cosine_similarity = 1 - cos in Opensearch, you can read about this here: Approximate k-NN search - OpenSearch Documentation

Once we have the cosine_similarity we then convert it to Opesnearch score. Use the same documentation to how know how the scores are generated. In your case of score 0.99999 the way it is getting calculated is

Query Vector: [0,1]
Document Vector: [0, -1]

cos(theta) = -1/(1x1) = -1

cosine_similarity = 1 - (-1) = 2

Opensearch score = 1 /(1 + cosine_similarity) = 1/3 = 0.33333

I hope this clarifies your doubt.

dziedrius · July 17, 2024, 11:42am

@Navneet - thanks that explains the resulting scores.

Where I still struggle - the last step of normalization. It seems that lucene’s approach (2-d/2) would have better dynamic range - [0,1], while nmslib - [0.333, 1], hence wondering why they chose their approach?

Why dynamic range could be important - seems that at least some of the embedding models have limited range (discussion about it here: Some questions about text-embedding-ada-002’s embedding - API - OpenAI Developer Forum)

Navneet · July 24, 2024, 8:08am

This a really good question, but I don’t have a good ans for you on this. @jmazane what is your thought on the last question.

jmazane · July 24, 2024, 2:41pm

Hi @dziedrius and @Navneet,

Thats an interesting point and interesting thread. Long ago I think we added and it was simple enough to follow what l2 was doing (see cosine PR and spot where conversion happens). But, the normalization of (1+cos) does seem like it would preserve more precision than 1/(1+cos). Not sure why they divide by 2 after that in lucene - seems for ordering sake (1+cos) would work well, it just gives range of [0, 2]. Oddly enough, we do this for script scoring. See here.

That being said, I am not sure what practical consequence of this would be - i.e. when it would become noticeable or how close 2 values would need to be. The other thing is that for prod systems, we recommend for cosine vectors, to normalize vectors as pre-processing step and then using the innerproduct space, which will have same result of cosine, but will be much faster during querying.

Jack

system · September 22, 2024, 2:41pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Inconsistent results using KNN script score with Cosine Similarity k-NN	5	2361	October 15, 2021
Inconsistent similarity scores using L2 space type and larger embedding model OpenSearch troubleshoot	0	146	October 17, 2024
Reindexing Produces Different Result On The Same Query Vector k-NN	9	1415	May 12, 2021
Opendistro KNN score giving different scores on the same query vector k-NN	3	1396	October 13, 2020
How knn score in cosinesimil space is being calculated? OpenSearch	2	63	April 2, 2025

Getting different results in vector scores?

Related topics