Question about ANN graph memory size

ken-ad · October 16, 2024, 3:11am

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
2.15 (AWS OpenSearch)

Describe the issue:
I’d like to double-check how ANN graph memory size works. I’ve found a few different equations for how much space a graph takes in memory.

I have 8 data nodes each with 64 GB RAM. My understanding of Opensearch and AWS is that half the RAM goes to the Opensearch heap (32 GB) and half of what’s remaining (32 GB) goes to graph memory caching (16 GB).

Meanwhile, my index has a KNN vector with 384 dimensions and an M of 16 (the default). There are 37 million documents and 1 replica.

Here’s a readout from a data node in GET /_plugins/_knn/*/stats. It seems to show that the node is maxed out on graph memory with 15 GB-ish allocated and is unable to fit the graph in memory, judging from cache_capacity_reached.

"qj0SM3L1RQmwF8yvift2ug": {
      "max_distance_query_with_filter_requests": 0,
      "graph_memory_usage_percentage": 99.36626,
      "graph_query_requests": 149283,
      "graph_memory_usage": 15283920,
      "cache_capacity_reached": true,
      "load_success_count": 39663,
      "training_memory_usage": 0,
      "indices_in_cache": {
        "candidates-896bdaf8-1b82-4379-baa9-f47eb3b6f7d8": {
          "graph_memory_usage": 15283920,
          "graph_memory_usage_percentage": 99.36626,
          "graph_count": 210
        }
      },
      "script_query_errors": 0,
      "hit_count": 109609,
      "knn_query_requests": 0,
      "total_load_time": 2357202456074,
      "miss_count": 39674,
      "min_score_query_requests": 80,
      "knn_query_with_filter_requests": 0,
      "training_memory_usage_percentage": 0,
      "max_distance_query_requests": 0,
      "lucene_initialized": false,
      "graph_index_requests": 0,
      "faiss_initialized": true,
      "load_exception_count": 0,
      "training_errors": 0,
      "min_score_query_with_filter_requests": 80,
      "eviction_count": 39453,
      "nmslib_initialized": false,
      "script_compilations": 0,
      "script_query_requests": 0,
      "graph_stats": {
        "refresh": {
          "total_time_in_millis": 0,
          "total": 0
        },
        "merge": {
          "current": 0,
          "total": 0,
          "total_time_in_millis": 0,
          "current_docs": 0,
          "total_docs": 0,
          "total_size_in_bytes": 0,
          "current_size_in_bytes": 0
        }
      },
      "graph_query_errors": 0,
      "indexing_from_model_degraded": false,
      "graph_index_errors": 0,
      "training_requests": 0,
      "script_compilation_errors": 0
    },

My understanding from this readout is that the graph doesn’t fit into memory meaning that ANN will be slower.

And here’s the index config for the vector field:


"my_vector": {
                "type": "knn_vector",
                "dimension": 384,
                "method": {
                    "name": "hnsw",
                    "space_type": "innerproduct",
                    "engine": "faiss",
                },
                "doc_values": False,
            },

The closest equation I’ve found for that graph memory size is (1.1 * (dimension * 4 + 8 * m) * (num_documents *2) / 1024 / 1024 / 1024) / num_nodes for the GB of the graph - that’s 15.7 GB, assuming it’s right.

My core question here though is - how do I predict the size of the graph memory? Right now I know it doesn’t fit into 15.2 GB RAM. Am I really only .5 GB off from what I need (based on the 15.7 GB number)? Is 15.7 GB of RAM what’s required for a single data node with a vector of this type?

Configuration:

Relevant Logs or Screenshots:

spork · October 24, 2024, 7:47pm

From my understanding Graph Memory resides on javas heap memory (in "Opensearch heap (32 GB) ") and not in OS independently. With one partial exception - Expanding k-NN with Lucene approximate nearest neighbor search · OpenSearch

I also have a similar question to yours and haven’t heard an answer yet Where can I find total vector count per index?

viktari · October 28, 2024, 9:22pm

Hi Ken,

The formula for calculating the graph size looks correct:

Total Graph Size (GB) = 1.1 * (dimension * 4 + 8 * M) * (num_documents * (1 + Number of Replicas)) / 1024 / 1024 / 1024

Regarding the cache capacity being reached and the eviction count of 39,453, here’s what’s happening:

By default, the circuit breaker is set to 50% of available memory (as outlined in this documentation). Since 32GB is allocated for JVM and out of Remaining 32GB, 16GB is available for graph caching. As the memory usage approaches this threshold, frequent evictions occur because the circuit breaker is being triggered, resulting in constant loading and unloading of graphs.

To mitigate this, one approach is to increase the circuit breaker percentage to 60%. This would allow approximately 19.2GB for caching, ensuring that a graph of 15.7GB can be loaded into memory without the circuit breaker being triggered.

Please let me know if you any further questions.

Thanks

spork · November 30, 2024, 4:41pm

@viktari when you say “num_documents” - do you assume a document has a SINGLE encoded field?

system · January 29, 2025, 4:41pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
K-NN Resource Usage k-NN	5	2916	December 13, 2024
How to deal with "knn.circuit_breaker.triggered stays set. Nodes at max cache capacity" k-NN	5	3878	March 1, 2021
'cache_capacity_reached': True in the Knn plugin k-NN troubleshoot	1	448	March 23, 2024
Need help in accessing memory requirements to serve 50 million documents k-NN	1	552	November 22, 2021
Knn_vector takes too much disk space k-NN	4	964	November 15, 2024

Question about ANN graph memory size

Related topics