Performance and scaling of ML models and dense vector data

gsingers · March 12, 2023, 6:44pm

I’m wondering if anyone has experience, docs or benchmarks related to how OpenSearch generally performs when it comes to ML-based data such as models and dense vector data. Specifically looking for info on how the data is handled in regards to sharding and replication (if applicable) as well as memory, storage and other things to be aware of. I know some it will depend on the data itself, but am looking for general guidelines to consider. Apologies if this is too vague, I’ve poked around a bit in the docs and generally searching, but not finding a whole lot just yet.

Navneet · March 12, 2023, 7:23pm

Hi,
I can help you getting some answers around your question.

Adding some details, please let me know if you need more specific details i can share that too.

So in opensearch for ML the recommendation is having ML nodes in the cluster. Wherever there is a ML request, the request is sent to ML nodes. So no indexing non ML workloads are impacted if you are doing some ML work in cluster.

Also, currently ML models are running on CPU machines, but team is adding capabilities for GPU based machines also to run the ML workloads. This is upcoming feature, but you can always check here: Issues · opensearch-project/ml-commons · GitHub

More details around ML capabilities can be found here: About ML Commons - OpenSearch documentation

One usecase of ML that you might be interested is Semantics Search via Neural Plugin: Neural Search plugin - OpenSearch documentation

gsingers · March 12, 2023, 11:15pm

Thanks, that is helpful. I’m assuming, based loosely off reading the docs, that models get replicated across all the ML nodes?

I’m loosely familiar with the new Neural plugin, but not so sure on how to make it perform or scale.

Navneet · March 13, 2023, 2:05am

yes the model is replicated across all ML nodes only. So once model is uploaded into OpenSearch cluster, to use the model you need to load the ML model. This will result into model get loaded into RAM of all ML nodes. So yes it get replicated and loaded on all ML nodes

ylwu · March 13, 2023, 5:50pm

@Navneet Thanks for sharing these information. Just share one update: we support running models on GPU machines from OpenSearch 2.6, read more GPU acceleration - OpenSearch documentation

@gsingers For this question " that models get replicated across all the ML nodes?" . It depends, if you load model via POST /_plugins/_ml/models/<model_id>/_load, the model will be deployed to all ML nodes. You can also deploy model to specific nodes by providing node ids like this

POST /_plugins/_ml/models/<model_id>/_load
{
    "node_ids": ["<node_id1>", "<node_id2>"]
}

Navneet · March 13, 2023, 6:02pm

@ylwu thanks for the correction. I was under impression it was not released may be I missed the release notes.

system · May 12, 2023, 6:03pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[Feedback] ML Commons: ML Model Health Dashboard for Admins - Experimental Release Request For Comments releases	5	1119	May 4, 2023
[Feedback] Machine Learning Model Serving Framework - Experimental Release General Feedback releases	48	2938	July 12, 2023
Support dedicated ML node Machine Learning discuss , feature-request	1	875	October 6, 2022
Support for Cross cluster wiring for ML nodes OpenSearch	0	55	June 6, 2024
Documentation for new ML features in 2.4 OpenSearch	2	397	November 17, 2022

Performance and scaling of ML models and dense vector data

Related topics