How to register local custom model?

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch 2.17

Describe the issue:
Im trying to register a custom model from hugging face but im having some issues with the URL. I created a zip file containing all the files from onnx folder from intfloat/multilingual-e5-large at main. Then i created a volume to add this zip file to the opensearch docker container. Then as URL for the model i simply wanted to use this local file but this didnt work. I also tried uploading this file to my google drive and using the link from google drive as URL but then im getting:
“error”: “zip END header not found”

Im guessing that i understood this URL and the file that is expected to be loaded from there completely wrong. Can someone please help me out here?

Here is the request to register the model:
{
“name”: “multilingual-e5-large”,
“version”: “1.0.0”,
“description”: “Multilingual E5-Large”,
“model_format”: “ONNX”,
“model_group_id”: “_Gjj3pIBJrX4nqy0y2ST”,
“model_content_hash_value”: “9aaa222c5a6529ae4a69ab6925c863af595d106edf565434c8280777b3126d64”,
“model_config”: {
“model_type”: “xlm-roberta”,
“embedding_dimension”: 1024,
“framework_type”: “sentence_transformers”,
“all_config”: “{"_name_or_path":"tmp/","architectures":["XLMRobertaModel"],"attention_probs_dropout_prob":0.1,"bos_token_id":0,"classifier_dropout":null,"eos_token_id":2,"hidden_act":"gelu","hidden_dropout_prob":0.1,"hidden_size":1024,"initializer_range":0.02,"intermediate_size":4096,"layer_norm_eps":0.00001,"max_position_embeddings":514,"model_type":"xlm-roberta","num_attention_heads":16,"num_hidden_layers":24,"output_past":true,"pad_token_id":1,"position_embedding_type":"absolute","torch_dtype":"float32","transformers_version":"4.29.0","type_vocab_size":1,"use_cache":true,"vocab_size":250002}”
},
“url”: “/usr/share/opensearch/models/multilingual-e5-large.zip”
}

You can try to use Opensearch-py-ml

Please follow this notebook: Demo Notebook to trace Sentence Transformers model — Opensearch-py-ml 1.1.0 documentation

Hey! Thanks for the answer. I had a couple of problems and most of them i could fix but now im stuck again at the model deployment step. I basically used the same code from the notebook from your link. I only changed the model itself. In the last step, when im trying to deploy the model im getting:

.venv/lib/python3.10/site-packages/opensearch_py_ml/ml_commons/ml_commons_client.py:357, in MLCommonClient.deploy_model(self, model_id, wait_until_deployed)
    355         print("Model deployed only partially")
    356     else:
--> 357         raise Exception("Model deployment failed")
    359 return self._get_task_info(task_id)

Then i tried to register the model only, without deployment (setting deploy_model param to False) and this worked. But then when i tried to deploy the model manually by calling:

https://localhost:9200/_plugins/_ml/models/aC9O-JIBJ8Vsn8aNmVZ-/_deploy

im getting:

{
    "model_id": "A51B-JIBVoJr-PiFnVNy",
    "task_type": "DEPLOY_MODEL",
    "function_name": "TEXT_EMBEDDING",
    "state": "FAILED",
    "worker_node": [
        "lNpYAlQ6T0Wu5ymYW4OQ4g"
    ],
    "create_time": 1730741918213,
    "last_update_time": 1730741921643,
    "error": "{\"lNpYAlQ6T0Wu5ymYW4OQ4g\":\"Error code - ORT_FAIL - message: Deserialize tensor onnx::MatMul_3915 failed.GetFileLength for /usr/share/opensearch/data/ml_cache/models_cache/models/A51B-JIBVoJr-PiFnVNy/9/intfloat/multilingual-e5-large/onnx__MatMul_3915 failed:Invalid fd was supplied: -1\"}",
    "is_async": true
}

I tried to use TORCH_SCRIPT instead of ONNX but im getting the same error from opensearch-py-ml and if i try to deploy the model manually:

{
    "model_id": "aC9O-JIBJ8Vsn8aNmVZ-",
    "task_type": "DEPLOY_MODEL",
    "function_name": "TEXT_EMBEDDING",
    "state": "FAILED",
    "worker_node": [
        "lNpYAlQ6T0Wu5ymYW4OQ4g"
    ],
    "create_time": 1730742866161,
    "last_update_time": 1730742933372,
    "error": "{\"lNpYAlQ6T0Wu5ymYW4OQ4g\":\"\\nUnknown builtin op: aten::scaled_dot_product_attention.\\nHere are some suggestions: \\n\\taten::_scaled_dot_product_attention\\n\\nThe original call is:\\n  File \\\"code/__torch__/transformers/models/xlm_roberta/modeling_xlm_roberta/___torch_mangle_885.py\\\", line 38\\n    x1 = torch.view(_10, [_12, int(_13), 16, 64])\\n    value_layer = torch.permute(x1, [0, 2, 1, 3])\\n    attn_output = torch.scaled_dot_product_attention(query_layer, key_layer, value_layer, attention_mask)\\n                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE\\n    attn_output0 = torch.transpose(attn_output, 1, 2)\\n    input = torch.reshape(attn_output0, [_0, _1, 1024])\\n\"}",
    "is_async": true
}

Update: I found a workaround by using optimum-cli to export the model to ONNX and repeating the above mentioned steps.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.