Performance and scaling of ML models and dense vector data

I’m wondering if anyone has experience, docs or benchmarks related to how OpenSearch generally performs when it comes to ML-based data such as models and dense vector data. Specifically looking for info on how the data is handled in regards to sharding and replication (if applicable) as well as memory, storage and other things to be aware of. I know some it will depend on the data itself, but am looking for general guidelines to consider. Apologies if this is too vague, I’ve poked around a bit in the docs and generally searching, but not finding a whole lot just yet.

I can help you getting some answers around your question.

Adding some details, please let me know if you need more specific details i can share that too.

So in opensearch for ML the recommendation is having ML nodes in the cluster. Wherever there is a ML request, the request is sent to ML nodes. So no indexing non ML workloads are impacted if you are doing some ML work in cluster.

Also, currently ML models are running on CPU machines, but team is adding capabilities for GPU based machines also to run the ML workloads. This is upcoming feature, but you can always check here: Issues · opensearch-project/ml-commons · GitHub

More details around ML capabilities can be found here: About ML Commons - OpenSearch documentation

One usecase of ML that you might be interested is Semantics Search via Neural Plugin: Neural Search plugin - OpenSearch documentation

Thanks, that is helpful. I’m assuming, based loosely off reading the docs, that models get replicated across all the ML nodes?

I’m loosely familiar with the new Neural plugin, but not so sure on how to make it perform or scale.

yes the model is replicated across all ML nodes only. So once model is uploaded into OpenSearch cluster, to use the model you need to load the ML model. This will result into model get loaded into RAM of all ML nodes. So yes it get replicated and loaded on all ML nodes

@Navneet Thanks for sharing these information. Just share one update: we support running models on GPU machines from OpenSearch 2.6, read more GPU acceleration - OpenSearch documentation

@gsingers For this question " that models get replicated across all the ML nodes?" . It depends, if you load model via POST /_plugins/_ml/models/<model_id>/_load, the model will be deployed to all ML nodes. You can also deploy model to specific nodes by providing node ids like this

POST /_plugins/_ml/models/<model_id>/_load
    "node_ids": ["<node_id1>", "<node_id2>"]

@ylwu thanks for the correction. I was under impression it was not released may be I missed the release notes.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.