Where can I find total vector count per index?

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): 2.13 AWS OpenSearch

Describe the issue:

I’d like to crunch some numbers to estimate memory usage for knn vectors (k-NN index - OpenSearch Documentation)

However, I can’t find stats for total vectors stored in an index anywhere. Index API and KNN Plugin API don’t seem to expose this metric.

ElasticSearch Index API, for example, does have “dense_vectors” metric: dense_vector
Total number of dense vectors indexed.

I’m looking for something similar in OpenSearch.

1 Like

This probably might help

This gives total document count for index, if you have say 1 vector per document then you can probably infer

@spork Elastic Search is exposing this via <index-name>/_stats api. But in Opensearch such capability is not there. Feel free to create a gh issue in the K-NN repo here: Issues · opensearch-project/k-NN · GitHub

If you know that every document in your index has the vector field then you can use suggestion provided by @shatejas to count the number of docs then use the formulas present here: k-NN index - OpenSearch Documentation.

Another in-direct way to know the number of docs is by hitting the GET _plugins/_knn/_stats api and then checking how much memory is taken by the index in the key indices_in_cache and then using that number to find the number of docs using the same formulas which you have linked in your question.

  1. Unfortunately, our documents can have varying number of fields encoded per document, it’s not a static ratio.

  2. For your second suggestion - I’d need to get the index loaded into the graph cache FIRST (GET /_plugins/_knn/warmup/INDEX_NAME) to do the reverse count. The trouble is that we don’t have enough memory to load some of the indices. Hence the need to calculate without loading data into cache to begin with. Otherwise, I’d just load everything in and have my memory estimation that way :slight_smile:

I’d be happy to create a gh issue. Thanks for your help!

1 Like

Hi @spork
For #2, the response is added on the question: Question about ANN graph memory size - #3 by viktari . I hope that will clarify on how to load all the graphs and then use k-NN stats API to further view the total number of docs using the vector field.

@spork ,
which algorithm(exact KNN or ANN) and library(faiss/nmslib/lucene engine if ANN is used) did you use to index documents as dense_vector type?

@yeonghyeonKo KNN algo, with faiss similarity lib, in hnsw. Using “minilm-l6-v2” model (384 dimensions).

also, how would those formulas apply to “float”, “byte” and “binary” encodings for vectors? disk-based knns?