Need help in accessing memory requirements to serve 50 million documents

chaitanya_basava · November 11, 2021, 5:33am

Hi

Our team is using opendistro for powering our search engine by storing the embedding vectors and using the KNN plugin to fetch top similar results. We are facing issues in being able to get an accurate calculation of the RAM requirements needed to hold 50 million documents, each having a 384 sized vector to represent the content of the document.
We are using a machine with 58 GB of usable RAM to host elasticsearch and one primary and two replica nodes to store the documents.

Can someone help us in figuring out the calculations?

Thanks in advance.

jmazane · November 22, 2021, 7:13pm

Hi @chaitanya_basava,

Sorry for delay.

This section of our docs says how to estimate memory. We say that size of native memory that will be needed to load the graphs into memory for search is 1.1 * (4*d + 8*M) * num_vectors. Assuming you have an M of 32, this would be 98.5 GB should be set aside for graphs. However, given that there are two replicates, the total memory would be ~295.5 GB.

Typically, we recommend that half of the ram (at max 32 GB) on a machine be used for the JVM heap for the OpenSearch/Elasticsearch process. Then, of the space remaining (after subtracting out JVM heap), between half and 75% can be used for the graphs (295.5 GB GB).

Topic		Replies	Views
K-NN Resource Usage k-NN	5	2783	December 13, 2024
Question about ANN graph memory size k-NN discuss , configure	4	220	January 29, 2025
How to calculate memory consumption when using the Lucene Engine for KNN vectors? k-NN	2	817	June 11, 2024
Is Opensearch free and what is its memory requirement? OpenSearch	5	7970	September 1, 2023
Whether the vector cache memory supports sharding k-NN	5	797	June 16, 2021

Need help in accessing memory requirements to serve 50 million documents

Related topics