Hi @dnock
1. How much should we scale our cluster to accommodate efficient queries?
In order to achieve efficient queries, all of the graphs will need to fit into the available memory.
Available memory = (RAM - Elasticsearch Max Heap Size) * Circuit Breaker Limit (i.e. 0.5 for 50%)
To estimate the amount of memory your graphs will take up, we use the following formula:
Total graph memory = 1.1 * (8*M + 4*dimension) * number of vectors (including replicas)
For efficient queries:
Total graph memory < Available memory
2. What SLAs can we provide with regards to the number of documents we can support while still surfacing performant queries?
It depends on the index set up configuration like number of shards. Here are some numbers from our experiment:
Data set:- 150M vectors with 128 dimensions across different indices.
Algo params :- m=16, efSearch=1024, efConstruction=1024,
No of data nodes :- 6, m5.12xlarge
Mater nodes :- 3, m5.xlarge
Latencies:-
tp50: 22ms
tp90: 40ms
tp99: 90ms
We have done performance analysis for different vector dimensions and collection. We need to formalize and put in the consumable manner. We are prioritizing the effort to bring this to the performance tuning doc.
3. Are the underlying query graphs just stored on data nodes? And is the graphMemoryUsage statistic the best metric for exploring memory consumption increases? When we run GET _cat/indices and look at tm for a given index the measurement reads 1064kb while the node graph_memory_usage for GET _opendistro/_knn/stats range from 0 - 3171kb depending on the node. Which value is best for measuring memory consumption?
Yes, the underlying graphs are just stored on data nodes. No graphs will be stored on dedicated masters.
Yes, graphMemoryUsage statistic is the best metric for evaluating memory consumption increases. Also keep an eye on cache capacity reached metric and circuit breaker triggered. These indicate that the cache has been filled up and higher latencies on search will follow.
_cat/indices
will not keep track of the memory the graphs use. GET _opendistro/_knn/stats
should be preferred for measuring memory consumption.
4. If a cluster has multi indices each with their own knn vectors, how is the knn.memory.circuit_breaker.limit value measured? Is it on a per graph bases?
knn.memory.circuit_breaker.limit applies to the total memory of all of the k-NN indices, not per index. For example, if you have 10 indices, each with 5 GB of graphs and the circuit breaker limit is set to a value that allows 40 GB of available memory, only 8 of those indices would fit in the cache.
Jack