@spork Elastic Search is exposing this via <index-name>/_stats api. But in Opensearch such capability is not there. Feel free to create a gh issue in the K-NN repo here: Issues · opensearch-project/k-NN · GitHub
If you know that every document in your index has the vector field then you can use suggestion provided by @shatejas to count the number of docs then use the formulas present here: k-NN index - OpenSearch Documentation.
Another in-direct way to know the number of docs is by hitting the GET _plugins/_knn/_stats api and then checking how much memory is taken by the index in the key indices_in_cache and then using that number to find the number of docs using the same formulas which you have linked in your question.
Unfortunately, our documents can have varying number of fields encoded per document, it’s not a static ratio.
For your second suggestion - I’d need to get the index loaded into the graph cache FIRST (GET /_plugins/_knn/warmup/INDEX_NAME) to do the reverse count. The trouble is that we don’t have enough memory to load some of the indices. Hence the need to calculate without loading data into cache to begin with. Otherwise, I’d just load everything in and have my memory estimation that way
I’d be happy to create a gh issue. Thanks for your help!
Hi @spork
For #2, the response is added on the question: Question about ANN graph memory size - #3 by viktari . I hope that will clarify on how to load all the graphs and then use k-NN stats API to further view the total number of docs using the vector field.