Opensearch-ml TransportError

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Macbook pro M1 (13.0 (22A8380) Ventura)

opensearch-dsl           2.0.1
opensearch-py            2.1.1
opensearch-py-ml         1.0.0

Describe the issue:
I’m trying to follow along with this guide but having trouble at the load model step.

load_model_output = ml_client.load_model(model_id)

However, the load fails with a resource error:

TransportError                            Traceback (most recent call last)
Cell In[23], line 1
----> 1 load_model_output = ml_client.load_model(model_id)
      2 print(load_model_output)

File ~/Desktop/super-search/venv/lib/python3.10/site-packages/opensearch_py_ml/ml_commons/ml_commons_client.py:78, in MLCommonClient.load_model(self, model_id)
     67 """
     68 This method loads model into opensearch cluster using ml-common plugin's load model api
     69 
   (...)
     73 :rtype: object
     74 """
     76 API_URL = f"{ML_BASE_URI}/models/{model_id}/_load"
---> 78 return self._client.transport.perform_request(
     79     method="POST",
     80     url=API_URL,
     81 )

File ~/Desktop/super-search/venv/lib/python3.10/site-packages/opensearchpy/transport.py:408, in Transport.perform_request(self, method, url, headers, params, body)
    406             raise e
    407     else:
--> 408         raise e
    410 else:
    411     # connection didn't fail, confirm it's live status
...
--> 301 raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
    302     status_code, error_message, additional_info
    303 )

TransportError: TransportError(500, 'm_l_resource_not_found_exception', 'no eligible node found')

Am I supposed to have a specific multicpu set up to use the ml-commons portion of the demo? I believe the model upload portion was successful because I can get the model info and status:

Configuration:
Running opensearch cluster with docker:

version: "3"
services:
  opensearch-node:
    image: opensearchproject/opensearch:latest
    container_name: opensearch-node
    environment:
      - discovery.type=single-node
      - "DISABLE_INSTALL_DEMO_CONFIG=true"
      - "DISABLE_SECURITY_PLUGIN=true"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    ports:
      - 9200:9200

  opensearch-dashboards:
    image: opensearchproject/opensearch-dashboards:latest
    container_name: opensearch-dashboards
    ports:
      - 5601:5601
    expose:
      - "5601"
    environment:
      - "DISABLE_SECURITY_DASHBOARDS_PLUGIN=true"
      - "OPENSEARCH_HOSTS=http://opensearch-node:9200"

Relevant Logs or Screenshots:

realized I didn’t have an ml-node for the cluster so I just disabled the ml-commons plugin from running on an ml only node like so:

version: "3"
services:
  opensearch-node:
    image: opensearchproject/opensearch:latest
    container_name: opensearch-node
    environment:
      - discovery.type=single-node
      - "DISABLE_INSTALL_DEMO_CONFIG=true"
      - "DISABLE_SECURITY_PLUGIN=true"
      - plugins.ml_commons.only_run_on_ml_node=false
    ulimits:
      memlock:
        soft: -1
        hard: -1
    ports:
      - 9200:9200

  opensearch-dashboards:
    image: opensearchproject/opensearch-dashboards:latest
    container_name: opensearch-dashboards
    ports:
      - 5601:5601
    expose:
      - "5601"
    environment:
      - "DISABLE_SECURITY_DASHBOARDS_PLUGIN=true"
      - "OPENSEARCH_HOSTS=http://opensearch-node:9200"

and now i can load a model! :slight_smile:

if someone can advise, what is wrong with this setup below? I was unable to run two nodes (dedicating one for ml specifically) because it seemed the cluster could not allocate a clustermanager, even though I explicitly set it to be opensearch-node1. ideas anyone?

version: "3"
services:
  opensearch-node1:
    image: opensearchproject/opensearch:latest
    container_name: opensearch-node1
    environment:
      - cluster.name=opensearch-cluster # Name the cluster
      - discovery.seed_hosts=opensearch-node1,opensearch-ml-node      
      - cluster.initial_cluster_manager_nodes=opensearch-node1 # Nodes eligible to serve as cluster manager
      - "DISABLE_INSTALL_DEMO_CONFIG=true"
      - "DISABLE_SECURITY_PLUGIN=true"
      - bootstrap.memory_lock=true # Disable JVM heap memory swapping
      - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m" # Set min and max JVM heap sizes to at least 50% of system RAM

    ulimits:
      memlock:
        soft: -1
        hard: -1
    ports:
      - 9200:9200 # REST API
      - 9600:9600 # Performance Analyzer
    volumes:
      - opensearch-data1:/usr/share/opensearch/data # Creates volume called opensearch-data1 and mounts it to the container
    networks:
      - opensearch-net # All of the containers will join the same Docker bridge network
  opensearch-ml-node:
    image: opensearchproject/opensearch:latest # This should be the same image used for opensearch-node1 to avoid issues
    container_name: opensearch-ml-node
    environment:
      - cluster.name=opensearch-cluster
      - node.name=opensearch-ml-node
      - node.roles=[ ml ]
      - discovery.seed_hosts=opensearch-node1,opensearch-ml-node
      - cluster.initial_cluster_manager_nodes=opensearch-node1
      - "DISABLE_SECURITY_PLUGIN=true"
      - bootstrap.memory_lock=true
      - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m" # Set min and max JVM heap sizes to at least 50% of system RAM

    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - opensearch-data2:/usr/share/opensearch/data
    networks:
      - opensearch-net

volumes:
  opensearch-data1:
  opensearch-data2:

networks:
  opensearch-net:

Have you tried manually enabling the cluster manager with the following?

@nabs I created a sample docker-compose.yml file in this PR add docker-compose file for starting cluster with dedicated ML node by ylwu-amzn · Pull Request #799 · opensearch-project/ml-commons · GitHub. You can try it

Suggest disable native memory circuit breaker as this docker ML node has limited memory. You can also disable it by running this after starting cluster

PUT _cluster/settings
{
  "persistent" : {
    "plugins.ml_commons.native_memory_threshold" : 100 
  }
}

BTW, we published some pre-trained models from 2.6 which may save your effort to trace model Pretrained models - OpenSearch documentation.

If you see any issues, or have any suggestions/feature requirements , please don’t hesitate to tell us. We are glad to provide help.

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.