Does adding 2 ML nodes in a cluster increase the indexing performance

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

2.18

Describe the issue:
Can adding 2 ML nodes increase the performance of indexing vector embedding to the cluster

Configuration:

Relevant Logs or Screenshots:

@tejashu
Typically YES,
when you watch the metrics for nodes using GET _cat/nodes?v&s=name API,
CPU usage will be extremely high if assgined resource(cpu) for ML node is not enough.

Since high CPU usage(almost 100%) doesn’t make your cluster down but slow unlike memory, so does thread called opensearch_ml_predict for embedding vectors. (git)

If you want to speed indexing (knn_vector) up, I’d recommend you to add cpu cores to existing ML nodes. Of course you need to assign enough cpu/memory to Data Nodes so they can manage/flush/merge Graph(HNSW) in disk.

[opensearch@test-opensearch-cluster-data-2 index]$ ls -lahSS
total 350M
-rw-rw-r-- 1 opensearch opensearch 150M Nov 27 03:35 _10.cfs
-rw-rw-r-- 1 opensearch opensearch 143M Nov 27 03:35 _15.cfs
-rw-rw-r-- 1 opensearch opensearch 7.6M Nov 27 03:35 _12.cfs
-rw-rw-r-- 1 opensearch opensearch 3.7M Nov 27 03:11 _w.fdt
-rw-rw-r-- 1 opensearch opensearch 3.4M Nov 27 03:12 _y.cfs
-rw-rw-r-- 1 opensearch opensearch 2.5M Nov 27 03:35 _10_165_CM_TEXT_VECTOR_768.knn.faissc
-rw-rw-r-- 1 opensearch opensearch 2.4M Nov 27 03:35 _15_165_CM_TEXT_VECTOR_768.knn.faissc
-rw-rw-r-- 1 opensearch opensearch 1.7M Nov 27 03:11 _o.fdt
-rw-rw-r-- 1 opensearch opensearch 1.7M Nov 27 03:35 _13.cfs
-rw-rw-r-- 1 opensearch opensearch 1.6M Nov 27 03:12 _z.cfs
-rw-rw-r-- 1 opensearch opensearch 1.1M Nov 27 03:11 _w_NativeEngines990KnnVectorsFormat_0.vec
-rw-rw-r-- 1 opensearch opensearch 655K Nov 27 03:11 _v.cfs
-rw-rw-r-- 1 opensearch opensearch 586K Nov 27 03:11 _w_Lucene90_0.dvd
-rw-rw-r-- 1 opensearch opensearch 514K Nov 27 03:11 _o_NativeEngines990KnnVectorsFormat_0.vec
...

[2024-12-05T06:24:10,216][WARN ][o.o.k.i.c.K.NativeEngines990KnnVectorsWriter] [test-opensearch-cluster-data-1] Graph build took 0 ms for flush
[2024-12-05T06:24:10,223][WARN ][o.o.k.i.c.K.NativeEngines990KnnVectorsWriter] [test-opensearch-cluster-data-1] Graph build took 0 ms for flush
[2024-12-05T06:24:53,026][WARN ][o.o.k.i.c.K.NativeEngines990KnnVectorsWriter] [test-opensearch-cluster-data-2] Graph build took 0 ms for flush
[2024-12-05T06:24:53,033][WARN ][o.o.k.i.c.K.NativeEngines990KnnVectorsWriter] [test-opensearch-cluster-data-2] Graph build took 0 ms for flush
[2024-12-05T06:24:53,258][WARN ][o.o.k.i.c.K.NativeEngines990KnnVectorsWriter] [test-opensearch-cluster-data-2] Graph build took 0 ms for merge
[2024-12-05T06:24:53,270][WARN ][o.o.k.i.c.K.NativeEngines990KnnVectorsWriter] [test-opensearch-cluster-data-2] Graph build took 0 ms for merge
[2024-12-05T06:25:30,410][WARN ][o.o.k.i.c.K.NativeEngines990KnnVectorsWriter] [test-opensearch-cluster-data-0] Graph build took 0 ms for flush
  • p.s.) Gpu acceleration would be helpful if the cluster is deployed in VM not containers. There have been some issuers who want CUDA image for OpenSearch ML Node so we can deploy via k8s operator easily.

how can i increase the concurrency of the number of threads in the queue or what is the maximum queue that can be handled by one single ML node?

@tejashu How have you run OpenSearch cluster? by docker, helm, systemctl, or k8s operator?
the size of thread pool for ml_predict is as twice as allocated core in limits.cpu

FYI)