Support dedicated ML node

We have released a new ML plugin ml-commons in 1.3 and we are planning to add more models.

ML model generally consuming more resources, especially for training process. We are going to support bigger ML models which might require more resources and special hardware like GPU. As OpenSearch doesn’t support ML node, we dispatch ML task to data node only. That means if user want to train some big model, they need to scale up all data nodes which seems costly and not reasonable. If we can support dedicated ML node, user don’t need to scale up their data node at all, just need to configure a new ML node(with different settings, more powerful instance type) and add it to cluster. And we can separate resource usage better by running ML task on dedicated node which can reduce impact to other tasks like search/ingestion.

And generally we can add a “computation” node for computation-intensive tasks like ML. And we may build more general solution like assigning/changing node role/tags on the fly. Check more details on this Github issue Support dynamic node role · Issue #2877 · opensearch-project/OpenSearch · GitHub. Welcome any suggestions/questions! To keep the discussion easier, let’s post suggestions/questions on the Github issue directly.

Follow up to what @ylwu mentioned before, we closed the loop on ML nodes in 2.1.