Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
2.18
Describe the issue:
Can adding 2 ML nodes increase the performance of indexing vector embedding to the cluster
Configuration:
Relevant Logs or Screenshots:
Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
2.18
Describe the issue:
Can adding 2 ML nodes increase the performance of indexing vector embedding to the cluster
Configuration:
Relevant Logs or Screenshots:
@tejashu
Typically YES,
when you watch the metrics for nodes using GET _cat/nodes?v&s=name
API,
CPU usage will be extremely high if assgined resource(cpu) for ML node is not enough.
Since high CPU usage(almost 100%) doesn’t make your cluster down but slow unlike memory, so does thread called opensearch_ml_predict for embedding vectors. (git)
If you want to speed indexing (knn_vector) up, I’d recommend you to add cpu cores to existing ML nodes. Of course you need to assign enough cpu/memory to Data Nodes so they can manage/flush/merge Graph(HNSW) in disk.
[opensearch@test-opensearch-cluster-data-2 index]$ ls -lahSS
total 350M
-rw-rw-r-- 1 opensearch opensearch 150M Nov 27 03:35 _10.cfs
-rw-rw-r-- 1 opensearch opensearch 143M Nov 27 03:35 _15.cfs
-rw-rw-r-- 1 opensearch opensearch 7.6M Nov 27 03:35 _12.cfs
-rw-rw-r-- 1 opensearch opensearch 3.7M Nov 27 03:11 _w.fdt
-rw-rw-r-- 1 opensearch opensearch 3.4M Nov 27 03:12 _y.cfs
-rw-rw-r-- 1 opensearch opensearch 2.5M Nov 27 03:35 _10_165_CM_TEXT_VECTOR_768.knn.faissc
-rw-rw-r-- 1 opensearch opensearch 2.4M Nov 27 03:35 _15_165_CM_TEXT_VECTOR_768.knn.faissc
-rw-rw-r-- 1 opensearch opensearch 1.7M Nov 27 03:11 _o.fdt
-rw-rw-r-- 1 opensearch opensearch 1.7M Nov 27 03:35 _13.cfs
-rw-rw-r-- 1 opensearch opensearch 1.6M Nov 27 03:12 _z.cfs
-rw-rw-r-- 1 opensearch opensearch 1.1M Nov 27 03:11 _w_NativeEngines990KnnVectorsFormat_0.vec
-rw-rw-r-- 1 opensearch opensearch 655K Nov 27 03:11 _v.cfs
-rw-rw-r-- 1 opensearch opensearch 586K Nov 27 03:11 _w_Lucene90_0.dvd
-rw-rw-r-- 1 opensearch opensearch 514K Nov 27 03:11 _o_NativeEngines990KnnVectorsFormat_0.vec
...
[2024-12-05T06:24:10,216][WARN ][o.o.k.i.c.K.NativeEngines990KnnVectorsWriter] [test-opensearch-cluster-data-1] Graph build took 0 ms for flush
[2024-12-05T06:24:10,223][WARN ][o.o.k.i.c.K.NativeEngines990KnnVectorsWriter] [test-opensearch-cluster-data-1] Graph build took 0 ms for flush
[2024-12-05T06:24:53,026][WARN ][o.o.k.i.c.K.NativeEngines990KnnVectorsWriter] [test-opensearch-cluster-data-2] Graph build took 0 ms for flush
[2024-12-05T06:24:53,033][WARN ][o.o.k.i.c.K.NativeEngines990KnnVectorsWriter] [test-opensearch-cluster-data-2] Graph build took 0 ms for flush
[2024-12-05T06:24:53,258][WARN ][o.o.k.i.c.K.NativeEngines990KnnVectorsWriter] [test-opensearch-cluster-data-2] Graph build took 0 ms for merge
[2024-12-05T06:24:53,270][WARN ][o.o.k.i.c.K.NativeEngines990KnnVectorsWriter] [test-opensearch-cluster-data-2] Graph build took 0 ms for merge
[2024-12-05T06:25:30,410][WARN ][o.o.k.i.c.K.NativeEngines990KnnVectorsWriter] [test-opensearch-cluster-data-0] Graph build took 0 ms for flush
how can i increase the concurrency of the number of threads in the queue or what is the maximum queue that can be handled by one single ML node?
@tejashu How have you run OpenSearch cluster? by docker, helm, systemctl, or k8s operator?
the size of thread pool for ml_predict is as twice as allocated core in limits.cpu
FYI)