Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Mac Sonoma 14.5
OS
{
“name”: “opensearch-node1”,
“cluster_name”: “opensearch-cluster”,
“cluster_uuid”: “_Q9SlIo8RHirvzb-r0CkTA”,
“version”: {
“distribution”: “opensearch”,
“number”: “2.15.0”,
“build_type”: “tar”,
“build_hash”: “61dbcd0795c9bfe9b81e5762175414bc38bbcadf”,
“build_date”: “2024-06-20T03:27:32.562036890Z”,
“build_snapshot”: false,
“lucene_version”: “9.10.0”,
“minimum_wire_compatibility_version”: “7.10.0”,
“minimum_index_compatibility_version”: “7.0.0”
},
“tagline”: “The OpenSearch Project: https://opensearch.org/”
}
Describe the issue:
Step 1) Use the python ml lib to export a pretrained as torch script right from hugging face.
st = SentenceTransformerModel(folder_path=“work/export/pretrain/ts/”+dt, overwrite = True)
st.save_as_pt( sentences=tracerSentences,
model_id=“sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2”)
Step 2) Use the python lib to register/model
model_config_path = ‘config/torchscript.json’
model_path =“work/export/pt/paraphrase-multilingual-MiniLM-L12-v2.zip”
model_id_file_system = ml_client.register_model(model_path, model_config_path, isVerbose=True)
Result is the model registers but it fails to deploy, seemingly with an old module for modeling_bert.py. See logging below, and console log further below.
uploading chunk 48 of 49
Model id: {‘status’: ‘Uploaded’}
uploading chunk 49 of 49
Model id: {‘status’: ‘Uploaded’}
Model registered successfully
Traceback (most recent call last):
File “/Users/AIUEQ92/ngrepos/search/ml_mini/source/MLCluster-RegModelTorchScript.py”, line 46, in
model_id_file_system = ml_client.register_model(model_path, model_config_path, isVerbose=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/homebrew/lib/python3.11/site-packages/opensearch_py_ml/ml_commons/ml_commons_client.py”, line 157, in register_model
self.deploy_model(model_id, wait_until_deployed=wait_until_deployed)
File “/opt/homebrew/lib/python3.11/site-packages/opensearch_py_ml/ml_commons/ml_commons_client.py”, line 357, in deploy_model
raise Exception(“Model deployment failed”)
Exception: Model deployment failed
These same steps work fine for ONNX. My customized version deploys fine with ONNX but fails with the same error as the pretrained for torchscript.
Configuration
{
“name”: “sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2”,
“version”: “1.0.1”,
“model_group_id”: “BDCX6pABM3bRBCyhVgBd”,
“description”: “This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.”,
“model_task_type”: “TEXT_EMBEDDING”,
“model_format”: “TORCH_SCRIPT”,
“model_content_size_in_bytes”: 488121112,
“model_content_hash_value”: “ba760e701ccdf9b0d5742febbe4a3fbfe3190f01416b9a9f983ff06fd6a045ed”,
“model_config”: {
“model_type”: “bert”,
“embedding_dimension”: 384,
“framework_type”: “sentence_transformers”,
“all_config”: “{"_name_or_path":"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2","architectures":["BertModel"],"attention_probs_dropout_prob":0.1,"gradient_checkpointing":false,"hidden_act":"gelu","hidden_dropout_prob":0.1,"hidden_size":384,"initializer_range":0.02,"intermediate_size":1536,"layer_norm_eps":1e-12,"max_position_embeddings":512,"model_type":"bert","num_attention_heads":12,"num_hidden_layers":12,"pad_token_id":0,"position_embedding_type":"absolute","torch_dtype": "float32","transformers_version":"4.41.1","type_vocab_size":2,"use_cache":true,"vocab_size":250037}”
},
“created_time”: 1676326534702
}
Relevant Logs or Screenshots:
2024-07-29 09:53:26 Caused by: ai.djl.engine.EngineException:
2024-07-29 09:53:26 Unknown builtin op: aten::scaled_dot_product_attention.
2024-07-29 09:53:26 Here are some suggestions:
2024-07-29 09:53:26 aten::_scaled_dot_product_attention
2024-07-29 09:53:26
2024-07-29 09:53:26 The original call is:
2024-07-29 09:53:26 File “code/torch/transformers/models/bert/modeling_bert.py”, line 181
2024-07-29 09:53:26 x1 = torch.view(_40, [_42, int(_43), 12, 32])
2024-07-29 09:53:26 value_layer = torch.permute(x1, [0, 2, 1, 3])
2024-07-29 09:53:26 attn_output = torch.scaled_dot_product_attention(query_layer, key_layer, value_layer, attention_mask)
2024-07-29 09:53:26 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <— HERE
2024-07-29 09:53:26 attn_output0 = torch.transpose(attn_output, 1, 2)
2024-07-29 09:53:26 input = torch.reshape(attn_output0, [_30, _31, 384])
2024-07-29 09:53:26
2024-07-29 09:53:26 at ai.djl.pytorch.jni.PyTorchLibrary.moduleLoad(Native Method) ~[?:?]
2024-07-29 09:53:26 at ai.djl.pytorch.jni.JniUtils.loadModule(JniUtils.java:1756) ~[?:?]
2024-07-29 09:53:26 at ai.djl.pytorch.engine.PtModel.load(PtModel.java:99) ~[?:?]
2024-07-29 09:53:26 at ai.djl.repository.zoo.BaseModelLoader.loadModel(BaseModelLoader.java:166) ~[?:?]
2024-07-29 09:53:26 at ai.djl.repository.zoo.Criteria.loadModel(Criteria.java:174) ~[?:?]
2024-07-29 09:53:26 at org.opensearch.ml.engine.algorithms.DLModel.doLoadModel(DLModel.java:217) ~[?:?]
2024-07-29 09:53:26 at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:286) ~[?:?]
2024-07-29 09:53:26 … 14 more
2024-07-29 09:53:26 [2024-07-29T13:53:26,145][INFO ][o.o.m.a.d.TransportDeployModelOnNodeAction] [opensearch-node1] deploy model task done e9rE_pAB0mrnfbzNUcov
2024-07-29 09:53:26 [2024-07-29T13:53:26,160][ERROR][o.o.m.a.f.TransportForwardAction] [opensearch-node1] deploy model failed on all nodes, model id: etrE_pAB0mrnfbzNEsqo
2024-07-29 09:53:26 [2024-07-29T13:53:26,160][INFO ][o.o.m.a.f.TransportForwardAction] [opensearch-node1] deploy model done with state: DEPLOY_FAILED, model id: etrE_pAB0mrnfbzNEsqo
2024-07-29 09:53:58 [2024-07-29T13:53:58,747][INFO ][o.o.i.b.HierarchyCircuitBreakerService] [opensearch-node1] attempting to trigger G1GC due to high heap usage [510806168]
2024-07-29 09:53:58 [2024-07-29T13:53:58,759][INFO ][o.o.i.b.HierarchyCircuitBreakerService] [opensearch-node1] GC did bring memory usage down, before [510806168], after [236222352], allocations [23], duration [12]