Hi,
Is there a way to enable/disable Machine Learning features in a specific node ?
We want to assign those ML-Nodes to specific machines with adequate resources (cpu/memory).
for the moment we just uninstall ML plugin, but this requires to modify the image
Hi @wassim.dhib ,
Thanks for your question. We don’t support such feature now. But this is a good feature for us to add in the future, I have added it to our roadmap. Although we don’t have ETA for it, please follow our announcements for latest release content.
Thanks,
Yizhe
I wanted to follow up on this thread. We shipped this functionality in 2.1. Would love feedback on how the feature has benefited you.
opened 12:31AM - 30 Sep 21 UTC
closed 09:39PM - 07 Jul 22 UTC
enhancement
v2.1.0
**Is your feature request related to a problem?**
We released ml-commons plugin… in OpenSearch 1.3. It supports training model and predicting. ML model generally consuming more resources, especially for training process. The community wants to support bigger ML models which might require more resources and special hardware like GPU.
As OpenSearch doesn’t support ML node, we dispatch ML task to data node only. That means if user wants to train a large model, they need to scale up all data nodes which can be costly. And ML tasks will use shared resources on data nodes which may impact the core searching/indexing function.
**What solution would you like?**
Support a dedicated ML node, users don’t need to scale up their data node at all. Instead just configure a new ML node (with different settings, more powerful instance type) and add it to cluster via the YAML file (requires a cluster restart). By doing so, users can separate resource usage better by running ML task on dedicated node which can reduce impact to other critical tasks like search/ingestion.
OpenSearch core will check node role when start node. If role is not built-in roles like `data` role, it will throw exception and node can't start. To support dedicated ML node, we have to remove this limitation in OpenSearch core. That is done with this PR which supports dynamic node role in OpenSearch https://github.com/opensearch-project/OpenSearch/pull/3436.
With that we can enhance ml-commons code to dispatch task to `ml` nodes first. If no `ml` nodes we can fall back to data nodes.
**Do you have any additional context?**
[Original Proposal](https://github.com/opensearch-project/OpenSearch/issues/2877)