Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Opensearch (docker-compose cluster running on osx arm64)
{
"distribution": "opensearch",
"number": "2.12.0",
"build_type": "tar",
"build_hash": "2c355ce1a427e4a528778d4054436b5c4b756221",
"build_date": "2024-02-20T02:20:12.084014282Z",
"build_snapshot": false,
"lucene_version": "9.9.2",
"minimum_wire_compatibility_version": "7.10.0",
"minimum_index_compatibility_version": "7.0.0"
}
local python 3.11.8 environment:
opensearch-py==2.4.2
opensearch-py-ml==1.1.0
onnx==1.15.0
onnxruntime==1.17.1
torch==2.2.1
sentence-transformers==2.5.1
transformers==4.38.2
Describe the issue:
When attempting to deploy a custom onnx model, an exception is thrown on opensearch that it cannot locate the onnxruntime.
Caused by: java.lang.ExceptionInInitializerError: Exception java.lang.UnsatisfiedLinkError: no onnxruntime in java.library.path: :/usr/share/opensearch/plugins/opensearch-knn/lib:/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
Full description
I want to load this huggingface model into opensearch for text embeddings.
I am running the following python code to convert to onnx and register the model on opensearch. I can see the model successfully registered in opensearch, but it fails during the deployment step.
from opensearchpy import OpenSearch
from opensearch_py_ml.ml_models import SentenceTransformerModel
from opensearch_py_ml.ml_commons import MLCommonClient
os_client = create_client()
ml_client = MLCommonClient(os_client)
# manually created model group
model_group_id = "..."
# test registering a model
model_hf_id = "microsoft/BiomedNLP-KRISSBERT-PubMed-UMLS-EL"
folder_path = "./models"
embedding_model = SentenceTransformerModel(model_id=model_hf_id, folder_path=folder_path, overwrite=True)
model_path_onnx = embedding_model.save_as_onnx(model_id=model_hf_id)
model_config_path_onnx = embedding_model.make_model_config_json(model_format="ONNX")
model_id = ml_client.register_model(model_path_onnx, model_config_path_onnx, isVerbose=True, model_group_id=model_group_id)
Logs for the local python code:
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Creating folder models/onnx
Using framework PyTorch: 2.2.1
Found input input_ids with shape: {0: 'batch', 1: 'sequence'}
Found input token_type_ids with shape: {0: 'batch', 1: 'sequence'}
Found input attention_mask with shape: {0: 'batch', 1: 'sequence'}
Found output output_0 with shape: {0: 'batch', 1: 'sequence'}
Found output output_1 with shape: {0: 'batch'}
Ensuring inputs are in correct order
position_ids is not present in the generated input list.
Generated inputs order: ['input_ids', 'attention_mask', 'token_type_ids']
zip file is saved to ./models/BiomedNLP-KRISSBERT-PubMed-UMLS-EL.zip
No sentence-transformers model found with name microsoft/BiomedNLP-KRISSBERT-PubMed-UMLS-EL. Creating a new one with MEAN pooling.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
ml-commons_model_config.json file is saved at : ./models/ml-commons_model_config.json
Total number of chunks 44
Sha1 value of the model file: 95ebbcb89d0c9883749a266f123fbbd34b8a67ce6dd3bfeeea02681aa01b2be7
Model meta data was created successfully. Model Id: 6lXVHo4BOkNyivbcAu-Y
uploading chunk 1 of 44
Model id: {'status': 'Uploaded'}
uploading chunk 2 of 44
Model id: {'status': 'Uploaded'}
...
uploading chunk 44 of 44
Model id: {'status': 'Uploaded'}
Model registered successfully
...
File "lib/python3.11/site-packages/opensearch_py_ml/ml_commons/ml_commons_client.py", line 157, in register_model
self.deploy_model(model_id, wait_until_deployed=wait_until_deployed)
File "lib/python3.11/site-packages/opensearch_py_ml/ml_commons/ml_commons_client.py", line 357, in deploy_model
raise Exception("Model deployment failed")
Exception: Model deployment failed
Checking the opensearch docker-compose logs:
opensearch-node1 | [2024-03-08T16:11:30,813][INFO ][o.o.m.a.u.MLModelChunkUploader] [opensearch-node1] Index model successful for 6lXVHo4BOkNyivbcAu-Y for chunk number 44
opensearch-node1 | [2024-03-08T16:11:30,834][INFO ][o.o.m.a.d.TransportDeployModelAction] [opensearch-node1] Will deploy model on these nodes: 7CRPceCOT16B8-dO2sNqlQ
opensearch-ml1 | [2024-03-08T16:11:30,898][ERROR][o.o.m.m.MLModelManager ] [opensearch-ml1] No controller is deployed because the model 6lXVHo4BOkNyivbcAu-Y is expected not having an enabled model controller. Please use the create model controller api to create one if this is unexpected.
opensearch-ml1 | [2024-03-08T16:11:34,830][ERROR][o.o.m.e.a.DLModel ] [opensearch-ml1] Failed to deploy model 6lXVHo4BOkNyivbcAu-Y
opensearch-ml1 | java.lang.NoClassDefFoundError: Could not initialize class ai.onnxruntime.OrtEnvironment$ThreadingOptions
opensearch-ml1 | at ai.djl.onnxruntime.engine.OrtEngine.<init>(OrtEngine.java:44) ~[onnxruntime-engine-0.21.0.jar:?]
opensearch-ml1 | at ai.djl.onnxruntime.engine.OrtEngine.newInstance(OrtEngine.java:64) ~[onnxruntime-engine-0.21.0.jar:?]
opensearch-ml1 | at ai.djl.onnxruntime.engine.OrtEngineProvider.getEngine(OrtEngineProvider.java:40) ~[onnxruntime-engine-0.21.0.jar:?]
opensearch-ml1 | at ai.djl.engine.Engine.getEngine(Engine.java:187) ~[api-0.21.0.jar:?]
opensearch-ml1 | at org.opensearch.ml.engine.algorithms.DLModel.doLoadModel(DLModel.java:185) ~[opensearch-ml-algorithms-2.12.0.0.jar:?]
opensearch-ml1 | at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:280) [opensearch-ml-algorithms-2.12.0.0.jar:?]
opensearch-ml1 | at java.base/java.security.AccessController.doPrivileged(AccessController.java:571) [?:?]
opensearch-ml1 | at org.opensearch.ml.engine.algorithms.DLModel.loadModel(DLModel.java:247) [opensearch-ml-algorithms-2.12.0.0.jar:?]
opensearch-ml1 | at org.opensearch.ml.engine.algorithms.DLModel.initModel(DLModel.java:139) [opensearch-ml-algorithms-2.12.0.0.jar:?]
opensearch-ml1 | at org.opensearch.ml.engine.MLEngine.deploy(MLEngine.java:125) [opensearch-ml-algorithms-2.12.0.0.jar:?]
opensearch-ml1 | at org.opensearch.ml.model.MLModelManager.lambda$deployModel$51(MLModelManager.java:1020) [opensearch-ml-2.12.0.0.jar:2.12.0.0]
opensearch-ml1 | at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.12.0.jar:2.12.0]
opensearch-ml1 | at org.opensearch.ml.model.MLModelManager.lambda$retrieveModelChunks$72(MLModelManager.java:1553) [opensearch-ml-2.12.0.0.jar:2.12.0.0]
opensearch-ml1 | at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.12.0.jar:2.12.0]
opensearch-ml1 | at org.opensearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:78) [opensearch-2.12.0.jar:2.12.0]
opensearch-ml1 | at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:913) [opensearch-2.12.0.jar:2.12.0]
opensearch-ml1 | at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.12.0.jar:2.12.0]
opensearch-ml1 | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
opensearch-ml1 | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
opensearch-ml1 | at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
opensearch-ml1 | Caused by: java.lang.ExceptionInInitializerError: Exception java.lang.UnsatisfiedLinkError: no onnxruntime in java.library.path: :/usr/share/opensearch/plugins/opensearch-knn/lib:/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib [in thread "opensearch[opensearch-ml1][opensearch_ml_deploy][T#5]"]
opensearch-ml1 | at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2458) ~[?:?]
opensearch-ml1 | at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:916) ~[?:?]
opensearch-ml1 | at java.base/java.lang.System.loadLibrary(System.java:2063) ~[?:?]
opensearch-ml1 | at ai.onnxruntime.OnnxRuntime.load(OnnxRuntime.java:338) ~[onnxruntime_gpu-1.14.0.jar:1.14.0]
opensearch-ml1 | at ai.onnxruntime.OnnxRuntime.init(OnnxRuntime.java:139) ~[onnxruntime_gpu-1.14.0.jar:1.14.0]
opensearch-ml1 | at ai.onnxruntime.OrtEnvironment$ThreadingOptions.<clinit>(OrtEnvironment.java:353) ~[onnxruntime_gpu-1.14.0.jar:1.14.0]
opensearch-ml1 | ... 20 more
opensearch-ml1 | [2024-03-08T16:11:34,869][ERROR][o.o.m.m.MLModelManager ] [opensearch-ml1] Failed to retrieve model 6lXVHo4BOkNyivbcAu-Y
opensearch-ml1 | org.opensearch.ml.common.exception.MLException: Failed to deploy model 6lXVHo4BOkNyivbcAu-Y
opensearch-ml1 | at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:294) ~[?:?]
opensearch-ml1 | at java.base/java.security.AccessController.doPrivileged(AccessController.java:571) ~[?:?]
opensearch-ml1 | at org.opensearch.ml.engine.algorithms.DLModel.loadModel(DLModel.java:247) ~[?:?]
opensearch-ml1 | at org.opensearch.ml.engine.algorithms.DLModel.initModel(DLModel.java:139) ~[?:?]
opensearch-ml1 | at org.opensearch.ml.engine.MLEngine.deploy(MLEngine.java:125) ~[?:?]
opensearch-ml1 | at org.opensearch.ml.model.MLModelManager.lambda$deployModel$51(MLModelManager.java:1020) ~[?:?]
opensearch-ml1 | at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.12.0.jar:2.12.0]
opensearch-ml1 | at org.opensearch.ml.model.MLModelManager.lambda$retrieveModelChunks$72(MLModelManager.java:1553) [opensearch-ml-2.12.0.0.jar:2.12.0.0]
opensearch-ml1 | at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.12.0.jar:2.12.0]
opensearch-ml1 | at org.opensearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:78) [opensearch-2.12.0.jar:2.12.0]
opensearch-ml1 | at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:913) [opensearch-2.12.0.jar:2.12.0]
opensearch-ml1 | at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.12.0.jar:2.12.0]
opensearch-ml1 | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
opensearch-ml1 | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
opensearch-ml1 | at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
opensearch-ml1 | Caused by: java.lang.NoClassDefFoundError: Could not initialize class ai.onnxruntime.OrtEnvironment$ThreadingOptions
opensearch-ml1 | at ai.djl.onnxruntime.engine.OrtEngine.<init>(OrtEngine.java:44) ~[?:?]
opensearch-ml1 | at ai.djl.onnxruntime.engine.OrtEngine.newInstance(OrtEngine.java:64) ~[?:?]
opensearch-ml1 | at ai.djl.onnxruntime.engine.OrtEngineProvider.getEngine(OrtEngineProvider.java:40) ~[?:?]
opensearch-ml1 | at ai.djl.engine.Engine.getEngine(Engine.java:187) ~[?:?]
opensearch-ml1 | at org.opensearch.ml.engine.algorithms.DLModel.doLoadModel(DLModel.java:185) ~[?:?]
opensearch-ml1 | at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:280) ~[?:?]
opensearch-ml1 | ... 14 more
opensearch-ml1 | Caused by: java.lang.ExceptionInInitializerError: Exception java.lang.UnsatisfiedLinkError: no onnxruntime in java.library.path: :/usr/share/opensearch/plugins/opensearch-knn/lib:/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib [in thread "opensearch[opensearch-ml1][opensearch_ml_deploy][T#5]"]
opensearch-ml1 | at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2458) ~[?:?]
opensearch-ml1 | at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:916) ~[?:?]
opensearch-ml1 | at java.base/java.lang.System.loadLibrary(System.java:2063) ~[?:?]
opensearch-ml1 | at ai.onnxruntime.OnnxRuntime.load(OnnxRuntime.java:338) ~[?:?]
opensearch-ml1 | at ai.onnxruntime.OnnxRuntime.init(OnnxRuntime.java:139) ~[?:?]
opensearch-ml1 | at ai.onnxruntime.OrtEnvironment$ThreadingOptions.<clinit>(OrtEnvironment.java:353) ~[?:?]
opensearch-ml1 | at ai.djl.onnxruntime.engine.OrtEngine.<init>(OrtEngine.java:44) ~[?:?]
opensearch-ml1 | at ai.djl.onnxruntime.engine.OrtEngine.newInstance(OrtEngine.java:64) ~[?:?]
opensearch-ml1 | at ai.djl.onnxruntime.engine.OrtEngineProvider.getEngine(OrtEngineProvider.java:40) ~[?:?]
opensearch-ml1 | at ai.djl.engine.Engine.getEngine(Engine.java:187) ~[?:?]
opensearch-ml1 | at org.opensearch.ml.engine.algorithms.DLModel.doLoadModel(DLModel.java:185) ~[?:?]
opensearch-ml1 | at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:280) ~[?:?]
opensearch-ml1 | ... 14 more
opensearch-node1 | [2024-03-08T16:11:34,883][ERROR][o.o.m.a.f.TransportForwardAction] [opensearch-node1] deploy model failed on all nodes, model id: 6lXVHo4BOkNyivbcAu-Y
opensearch-node1 | [2024-03-08T16:11:34,883][INFO ][o.o.m.a.f.TransportForwardAction] [opensearch-node1] deploy model done with state: DEPLOY_FAILED, model id: 6lXVHo4BOkNyivbcAu-Y
opensearch-ml1 | [2024-03-08T16:11:34,884][INFO ][o.o.m.a.d.TransportDeployModelOnNodeAction] [opensearch-ml1] deploy model task done 61XVHo4BOkNyivbceu-U
It appears that necessary onnxruntime libraries are not installed (or not properly configured) on the docker images.
Attempted fix
I didn’t see any libonnxruntime
files in the lib
directories, so I decided to copy an onnx runtime library (libonnxruntime.so.1.17.1
) into /usr/share/opensearch/plugins/opensearch-knn/lib
:
[opensearch@33bd679ce232 opensearch-knn]$ ls /usr/share/opensearch/plugins/opensearch-knn/lib
libgomp.so.1 libonnxruntime.so.1.17.1 libopensearchknn_faiss.so
libonnxruntime.so libopensearchknn_common.so libopensearchknn_nmslib.so
When I did this and tried to redeploy the model, I got a different error:
opensearch-ml1 | [2024-03-08T16:38:36,909][ERROR][o.o.m.e.a.DLModel ] [opensearch-ml1] Failed to deploy model 6lXVHo4BOkNyivbcAu-Y
opensearch-ml1 | java.lang.UnsatisfiedLinkError: no onnxruntime4j_jni in java.library.path: :/usr/share/opensearch/plugins/opensearch-knn/lib:/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
opensearch-ml1 | at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2458) ~[?:?]
opensearch-ml1 | at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:916) ~[?:?]
opensearch-ml1 | at java.base/java.lang.System.loadLibrary(System.java:2063) ~[?:?]
opensearch-ml1 | at ai.onnxruntime.OnnxRuntime.load(OnnxRuntime.java:338) ~[onnxruntime_gpu-1.14.0.jar:1.14.0]
opensearch-ml1 | at ai.onnxruntime.OnnxRuntime.init(OnnxRuntime.java:140) ~[onnxruntime_gpu-1.14.0.jar:1.14.0]
opensearch-ml1 | at ai.onnxruntime.OrtEnvironment$ThreadingOptions.<clinit>(OrtEnvironment.java:353) ~[onnxruntime_gpu-1.14.0.jar:1.14.0]
opensearch-ml1 | at ai.djl.onnxruntime.engine.OrtEngine.<init>(OrtEngine.java:44) ~[onnxruntime-engine-0.21.0.jar:?]
opensearch-ml1 | at ai.djl.onnxruntime.engine.OrtEngine.newInstance(OrtEngine.java:64) ~[onnxruntime-engine-0.21.0.jar:?]
opensearch-ml1 | at ai.djl.onnxruntime.engine.OrtEngineProvider.getEngine(OrtEngineProvider.java:40) ~[onnxruntime-engine-0.21.0.jar:?]
opensearch-ml1 | at ai.djl.engine.Engine.getEngine(Engine.java:187) ~[api-0.21.0.jar:?]
opensearch-ml1 | at org.opensearch.ml.engine.algorithms.DLModel.doLoadModel(DLModel.java:185) ~[opensearch-ml-algorithms-2.12.0.0.jar:?]
opensearch-ml1 | at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:280) [opensearch-ml-algorithms-2.12.0.0.jar:?]
opensearch-ml1 | at java.base/java.security.AccessController.doPrivileged(AccessController.java:571) [?:?]
opensearch-ml1 | at org.opensearch.ml.engine.algorithms.DLModel.loadModel(DLModel.java:247) [opensearch-ml-algorithms-2.12.0.0.jar:?]
opensearch-ml1 | at org.opensearch.ml.engine.algorithms.DLModel.initModel(DLModel.java:139) [opensearch-ml-algorithms-2.12.0.0.jar:?]
opensearch-ml1 | at org.opensearch.ml.engine.MLEngine.deploy(MLEngine.java:125) [opensearch-ml-algorithms-2.12.0.0.jar:?]
opensearch-ml1 | at org.opensearch.ml.model.MLModelManager.lambda$deployModel$51(MLModelManager.java:1020) [opensearch-ml-2.12.0.0.jar:2.12.0.0]
opensearch-ml1 | at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.12.0.jar:2.12.0]
opensearch-ml1 | at org.opensearch.ml.model.MLModelManager.lambda$retrieveModelChunks$72(MLModelManager.java:1553) [opensearch-ml-2.12.0.0.jar:2.12.0.0]
opensearch-ml1 | at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.12.0.jar:2.12.0]
opensearch-ml1 | at org.opensearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:78) [opensearch-2.12.0.jar:2.12.0]
opensearch-ml1 | at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:913) [opensearch-2.12.0.jar:2.12.0]
opensearch-ml1 | at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.12.0.jar:2.12.0]
opensearch-ml1 | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
opensearch-ml1 | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
opensearch-ml1 | at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
opensearch-ml1 | [2024-03-08T16:38:36,933][ERROR][o.o.m.m.MLModelManager ] [opensearch-ml1] Failed to retrieve model 6lXVHo4BOkNyivbcAu-Y
opensearch-ml1 | org.opensearch.ml.common.exception.MLException: Failed to deploy model 6lXVHo4BOkNyivbcAu-Y
opensearch-ml1 | at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:294) ~[?:?]
opensearch-ml1 | at java.base/java.security.AccessController.doPrivileged(AccessController.java:571) ~[?:?]
opensearch-ml1 | at org.opensearch.ml.engine.algorithms.DLModel.loadModel(DLModel.java:247) ~[?:?]
opensearch-ml1 | at org.opensearch.ml.engine.algorithms.DLModel.initModel(DLModel.java:139) ~[?:?]
opensearch-ml1 | at org.opensearch.ml.engine.MLEngine.deploy(MLEngine.java:125) ~[?:?]
opensearch-ml1 | at org.opensearch.ml.model.MLModelManager.lambda$deployModel$51(MLModelManager.java:1020) ~[?:?]
opensearch-ml1 | at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.12.0.jar:2.12.0]
opensearch-ml1 | at org.opensearch.ml.model.MLModelManager.lambda$retrieveModelChunks$72(MLModelManager.java:1553) [opensearch-ml-2.12.0.0.jar:2.12.0.0]
opensearch-ml1 | at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.12.0.jar:2.12.0]
opensearch-ml1 | at org.opensearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:78) [opensearch-2.12.0.jar:2.12.0]
opensearch-ml1 | at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:913) [opensearch-2.12.0.jar:2.12.0]
opensearch-ml1 | at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.12.0.jar:2.12.0]
opensearch-ml1 | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
opensearch-ml1 | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
opensearch-ml1 | at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
opensearch-ml1 | Caused by: java.lang.UnsatisfiedLinkError: no onnxruntime4j_jni in java.library.path: :/usr/share/opensearch/plugins/opensearch-knn/lib:/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
opensearch-ml1 | at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2458) ~[?:?]
opensearch-ml1 | at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:916) ~[?:?]
opensearch-ml1 | at java.base/java.lang.System.loadLibrary(System.java:2063) ~[?:?]
opensearch-ml1 | at ai.onnxruntime.OnnxRuntime.load(OnnxRuntime.java:338) ~[?:?]
opensearch-ml1 | at ai.onnxruntime.OnnxRuntime.init(OnnxRuntime.java:140) ~[?:?]
opensearch-ml1 | at ai.onnxruntime.OrtEnvironment$ThreadingOptions.<clinit>(OrtEnvironment.java:353) ~[?:?]
opensearch-ml1 | at ai.djl.onnxruntime.engine.OrtEngine.<init>(OrtEngine.java:44) ~[?:?]
opensearch-ml1 | at ai.djl.onnxruntime.engine.OrtEngine.newInstance(OrtEngine.java:64) ~[?:?]
opensearch-ml1 | at ai.djl.onnxruntime.engine.OrtEngineProvider.getEngine(OrtEngineProvider.java:40) ~[?:?]
opensearch-ml1 | at ai.djl.engine.Engine.getEngine(Engine.java:187) ~[?:?]
opensearch-ml1 | at org.opensearch.ml.engine.algorithms.DLModel.doLoadModel(DLModel.java:185) ~[?:?]
opensearch-ml1 | at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:280) ~[?:?]
opensearch-ml1 | ... 14 more
opensearch-node1 | [2024-03-08T16:38:36,945][ERROR][o.o.m.a.f.TransportForwardAction] [opensearch-node1] deploy model failed on all nodes, model id: 6lXVHo4BOkNyivbcAu-Y
opensearch-node1 | [2024-03-08T16:38:36,945][INFO ][o.o.m.a.f.TransportForwardAction] [opensearch-node1] deploy model done with state: DEPLOY_FAILED, model id: 6lXVHo4BOkNyivbcAu-Y
It seems like adding the libonnxruntime.so
file to the lib
directory worked to fix part of the issue. But now there is a different dependency that I am not sure how to include onnxruntime4j_jni
. When looking at the onnxruntime release, I can’t find this dependency. I think it has to do with the java build/dependency (jni = Java Native Interface). But I’m not a java expert, so I’m stuck here.
I don’t think this is at all a root cause fix, but shows that including the runtime .so
file is on the right track - the onnxruntime is not configured properly.
Configuration:
The docker-compose cluster is set up with 2 search nodes and 1 ml node. I can provide my full docker-compose.yaml
file if needed.
{
"persistent": {
"plugins": {
"ml_commons": {
"task_dispatch_policy": "round_robin",
"monitoring_request_count": "100",
"max_model_on_node": "20",
"sync_up_job_interval_in_seconds": "3",
"max_ml_task_per_node": "10",
"only_run_on_ml_node": "true",
"ml_task_timeout_in_seconds": "600",
"model_access_control_enabled": "true",
"native_memory_threshold": "100",
"allow_registering_model_via_local_file": "true",
"allow_registering_model_via_url": "true"
},
"index_state_management": {
"template_migration": {
"control": "-1"
}
}
}
},
"transient": {}
}
Relevant Logs or Screenshots:
Logs were posted in the description.