OpenSearch 2.11.1
Cluster configuration:
3 master nodes
3 ingest nodes
3 data nodes
3 ML nodes
The cluster is deployed on Kubernetes.
opensearch.yaml has the following section:
plugins.ml_commons.only_run_on_ml_node: true
plugins.ml_commons.task_dispatch_policy: round_robin
plugins.ml_commons.max_ml_task_per_node: 10
plugins.ml_commons.max_model_on_node: 10
plugins.ml_commons.sync_up_job_interval_in_seconds: 3
plugins.ml_commons.monitoring_request_count: 100
plugins.ml_commons.max_register_model_tasks_per_node: 10
plugins.ml_commons.max_deploy_model_tasks_per_node: 10
plugins.ml_commons.allow_registering_model_via_url: false
plugins.ml_commons.native_memory_threshold: 90
plugins.ml_commons.model_auto_redeploy.enable: true
plugins.ml_commons.model_auto_redeploy.lifetime_retry_times: 5
For some reason, a cluster is trying to deploy a model on any non-ML node once it is restarted.
and the model becomes Partially responding:
What is interesting is that it doesn’t matter how many nodes will be restarted. Only one (the last one) is mentioned on the Model status UI.
is the any option that prevents such a behavior?