Does adding 2 ML nodes in a cluster increase the indexing performance

tejashu · December 5, 2024, 5:09am

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

2.18

Describe the issue:
Can adding 2 ML nodes increase the performance of indexing vector embedding to the cluster

Configuration:

Relevant Logs or Screenshots:

yeonghyeonKo · December 5, 2024, 6:32am

@tejashu
Typically YES,
when you watch the metrics for nodes using GET _cat/nodes?v&s=name API,
CPU usage will be extremely high if assgined resource(cpu) for ML node is not enough.

Since high CPU usage(almost 100%) doesn’t make your cluster down but slow unlike memory, so does thread called opensearch_ml_predict for embedding vectors. (git)

If you want to speed indexing (knn_vector) up, I’d recommend you to add cpu cores to existing ML nodes. Of course you need to assign enough cpu/memory to Data Nodes so they can manage/flush/merge Graph(HNSW) in disk.

[opensearch@test-opensearch-cluster-data-2 index]$ ls -lahSS
total 350M
-rw-rw-r-- 1 opensearch opensearch 150M Nov 27 03:35 _10.cfs
-rw-rw-r-- 1 opensearch opensearch 143M Nov 27 03:35 _15.cfs
-rw-rw-r-- 1 opensearch opensearch 7.6M Nov 27 03:35 _12.cfs
-rw-rw-r-- 1 opensearch opensearch 3.7M Nov 27 03:11 _w.fdt
-rw-rw-r-- 1 opensearch opensearch 3.4M Nov 27 03:12 _y.cfs
-rw-rw-r-- 1 opensearch opensearch 2.5M Nov 27 03:35 _10_165_CM_TEXT_VECTOR_768.knn.faissc
-rw-rw-r-- 1 opensearch opensearch 2.4M Nov 27 03:35 _15_165_CM_TEXT_VECTOR_768.knn.faissc
-rw-rw-r-- 1 opensearch opensearch 1.7M Nov 27 03:11 _o.fdt
-rw-rw-r-- 1 opensearch opensearch 1.7M Nov 27 03:35 _13.cfs
-rw-rw-r-- 1 opensearch opensearch 1.6M Nov 27 03:12 _z.cfs
-rw-rw-r-- 1 opensearch opensearch 1.1M Nov 27 03:11 _w_NativeEngines990KnnVectorsFormat_0.vec
-rw-rw-r-- 1 opensearch opensearch 655K Nov 27 03:11 _v.cfs
-rw-rw-r-- 1 opensearch opensearch 586K Nov 27 03:11 _w_Lucene90_0.dvd
-rw-rw-r-- 1 opensearch opensearch 514K Nov 27 03:11 _o_NativeEngines990KnnVectorsFormat_0.vec
...

[2024-12-05T06:24:10,216][WARN ][o.o.k.i.c.K.NativeEngines990KnnVectorsWriter] [test-opensearch-cluster-data-1] Graph build took 0 ms for flush
[2024-12-05T06:24:10,223][WARN ][o.o.k.i.c.K.NativeEngines990KnnVectorsWriter] [test-opensearch-cluster-data-1] Graph build took 0 ms for flush
[2024-12-05T06:24:53,026][WARN ][o.o.k.i.c.K.NativeEngines990KnnVectorsWriter] [test-opensearch-cluster-data-2] Graph build took 0 ms for flush
[2024-12-05T06:24:53,033][WARN ][o.o.k.i.c.K.NativeEngines990KnnVectorsWriter] [test-opensearch-cluster-data-2] Graph build took 0 ms for flush
[2024-12-05T06:24:53,258][WARN ][o.o.k.i.c.K.NativeEngines990KnnVectorsWriter] [test-opensearch-cluster-data-2] Graph build took 0 ms for merge
[2024-12-05T06:24:53,270][WARN ][o.o.k.i.c.K.NativeEngines990KnnVectorsWriter] [test-opensearch-cluster-data-2] Graph build took 0 ms for merge
[2024-12-05T06:25:30,410][WARN ][o.o.k.i.c.K.NativeEngines990KnnVectorsWriter] [test-opensearch-cluster-data-0] Graph build took 0 ms for flush

p.s.) Gpu acceleration would be helpful if the cluster is deployed in VM not containers. There have been some issuers who want CUDA image for OpenSearch ML Node so we can deploy via k8s operator easily.

tejashu · December 5, 2024, 12:15pm

how can i increase the concurrency of the number of threads in the queue or what is the maximum queue that can be handled by one single ML node?

yeonghyeonKo · December 6, 2024, 4:49pm

@tejashu How have you run OpenSearch cluster? by docker, helm, systemctl, or k8s operator?
the size of thread pool for ml_predict is as twice as allocated core in limits.cpu

FYI)

github.com

opensearch-project/ml-commons/blob/1d306713a226fe84ff7cd2326084edc1beb3f112/plugin/src/main/java/org/opensearch/ml/plugin/MachineLearningPlugin.java#L876


      
              settings,
              TRAIN_THREAD_POOL,
              Math.max(1, OpenSearchExecutors.allocatedProcessors(settings) - 1),
              10,
              ML_THREAD_POOL_PREFIX + TRAIN_THREAD_POOL,
              false
          );
          FixedExecutorBuilder predictThreadPool = new FixedExecutorBuilder(
              settings,
              PREDICT_THREAD_POOL,
              OpenSearchExecutors.allocatedProcessors(settings) * 2,
              10000,
              ML_THREAD_POOL_PREFIX + PREDICT_THREAD_POOL,
              false
          );
          FixedExecutorBuilder remotePredictThreadPool = new FixedExecutorBuilder(
              settings,
              REMOTE_PREDICT_THREAD_POOL,
              OpenSearchExecutors.allocatedProcessors(settings) * 4,
              10000,
              ML_THREAD_POOL_PREFIX + REMOTE_PREDICT_THREAD_POOL,

system · February 4, 2025, 4:49pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Need to know impact of increasing primary shards for existing index OpenSearch	1	38	September 20, 2024
Performance and scaling of ML models and dense vector data Machine Learning discuss	6	715	May 12, 2023
Support for Cross cluster wiring for ML nodes OpenSearch	0	56	June 6, 2024
Low CPU utilization OpenSearch	3	808	November 3, 2023
High cpu on data nodes OpenSearch troubleshoot	4	380	August 20, 2024

Does adding 2 ML nodes in a cluster increase the indexing performance

Related topics