For clusters in a corporate setting, internet access is often restricted with an… egress firewall.
However, the ML commons plugin needs internet access to download dependencies, even when using a local model.
It would be good to improve the user experience in this situation. Some ideas:
- Document the behaviour of the plugin, so the network needs of the plugin can be accommodated by the user (eg, by whitelisting known dependencies - or if dependencies will be unpredictable, we could advise avoiding this plugin in environments with restricted network access)
- Provide a way to avoid downloading dependencies during model deployment (eg, is it possible to package dependencies?)
- Improve logging so that, if downloading dependencies fails, it is clear which URL was unreachable - this would make it easier to update the whitelist.
I see this behaviour when using the `all-MiniLM-L12-v2` model locally on OpenSearch 2.11.1, using the TorchScript model file and config from the [list of pre-trained models](https://opensearch.org/docs/latest/ml-commons-plugin/pretrained-models/#sentence-transformers), deploying from a local zip file with the steps from `opensearch-py-ml`'s [demo notebook](https://opensearch-project.github.io/opensearch-py-ml/examples/demo_ml_commons_integration.html). I have made some suggestions based on my experience below, but I'm not sure if the ONNX model would have different dependencies than the TorchScript model, or if other models have different dependencies (eg, whether `all-mpnet-base-v2` is going to have different dependencies than `all-MiniLM-L12-v2`).
**Packaging**
When using a local Torch model on a server with restricted internet access, deploying the model fails if the server cannot access `publish.djl.ai`. In ml-commons code, this URL is mentioned by the `pytorch-engine` library.
It might be possible to [package a fat jar with dependencies](https://github.com/deepjavalibrary/djl-demo/tree/master/development/fatjar) to avoid this issue? This was [previously discussed in the OpenSearch forums](https://forum.opensearch.org/t/model-deployment-failure-with-ml-commons-plugin-in-internet-disabled-environment/15428).
**Documentation**
It would be useful to document:
- Which domains need to be whitelisted for the ML plugin to function (or if the list of dependencies is not easily predictable and varies depending on which model is used, we could document that)
- Under what circumstances the plugin needs network access (only at deploy time?)
Currently, the plugin appears to need network access to the following URLs when deploying, even when using a local model:
- publish.djl.ai/pytorch
- mlrepo.djl.ai (lack of access to this doesn't prevent this model from deploying, but generates several warnings in OpenSearch logs like `[WARN ][a.d.h.z.HfModelZoo ] [ip-172-31-58-14.ec2.internal] Failed to download Huggingface model zoo index: NLP.FILL_MASK`; not sure if this has consequences later)
**Logging**
Another way to improve this experience would be to log more information when there is a failure downloading dependencies.
When deploying a local model, if an egress firewall is configured to drop packets to destinations that are not explicitly permitted, we get an error that doesn't tell us which destination we were trying to reach - from this, it is not obvious what address needs to be whitelisted. Here are the OpenSearch logs when deploying a local model under these circumstances:
```
[2024-01-23T00:10:53,793][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 1
[2024-01-23T00:10:54,582][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 2
[2024-01-23T00:10:55,342][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 3
[2024-01-23T00:10:55,922][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 4
[2024-01-23T00:10:56,444][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 5
[2024-01-23T00:10:56,997][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 6
[2024-01-23T00:10:57,481][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 7
[2024-01-23T00:10:57,840][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 8
[2024-01-23T00:10:58,215][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 9
[2024-01-23T00:10:58,612][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 10
[2024-01-23T00:10:58,988][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 11
[2024-01-23T00:10:59,399][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 12
[2024-01-23T00:10:59,786][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 13
[2024-01-23T00:10:59,977][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 14
[2024-01-23T00:11:00,014][INFO ][o.o.m.a.d.TransportDeployModelAction] [ip-172-31-62-254.ec2.internal] Will deploy model on these nodes: Q6DHrMfSTRyIEHJNDCnCsw
[2024-01-23T00:11:04,963][WARN ][a.d.u.c.CudaUtils ] [ip-172-31-62-254.ec2.internal] Access denied during loading cudart library.
[2024-01-23T00:11:29,623][INFO ][o.o.m.c.MLSyncUpCron ] [ip-172-31-62-254.ec2.internal] Refresh model state: {786nM40BUDoVia3UznyW=DEPLOY_FAILED}
[2024-01-23T00:11:39,584][INFO ][o.o.i.i.ManagedIndexCoordinator] [ip-172-31-62-254.ec2.internal] Cancel background move metadata process.
[2024-01-23T00:11:39,585][INFO ][o.o.i.i.ManagedIndexCoordinator] [ip-172-31-62-254.ec2.internal] Performing move cluster state metadata.
[2024-01-23T00:11:39,585][INFO ][o.o.i.i.MetadataService ] [ip-172-31-62-254.ec2.internal] Move metadata has finished.
[2024-01-23T00:11:39,618][INFO ][o.o.m.c.MLSyncUpCron ] [ip-172-31-62-254.ec2.internal] Refresh model state: {786nM40BUDoVia3UznyW=DEPLOYING}
[2024-01-23T00:11:59,623][INFO ][o.o.m.c.MLSyncUpCron ] [ip-172-31-62-254.ec2.internal] Refresh model state: {786nM40BUDoVia3UznyW=DEPLOY_FAILED}
[2024-01-23T00:12:09,622][INFO ][o.o.m.c.MLSyncUpCron ] [ip-172-31-62-254.ec2.internal] Refresh model state: {786nM40BUDoVia3UznyW=DEPLOYING}
[2024-01-23T00:12:29,621][INFO ][o.o.m.c.MLSyncUpCron ] [ip-172-31-62-254.ec2.internal] Refresh model state: {786nM40BUDoVia3UznyW=DEPLOY_FAILED}
[2024-01-23T00:12:39,625][INFO ][o.o.m.c.MLSyncUpCron ] [ip-172-31-62-254.ec2.internal] Refresh model state: {786nM40BUDoVia3UznyW=DEPLOYING}
[2024-01-23T00:13:09,624][INFO ][o.o.m.c.MLSyncUpCron ] [ip-172-31-62-254.ec2.internal] Refresh model state: {786nM40BUDoVia3UznyW=DEPLOY_FAILED}
[2024-01-23T00:13:14,922][ERROR][o.o.m.e.a.DLModel ] [ip-172-31-62-254.ec2.internal] Failed to deploy model 786nM40BUDoVia3UznyW
ai.djl.engine.EngineException: Failed to save pytorch index file
at ai.djl.pytorch.jni.LibUtils.downloadPyTorch(LibUtils.java:403) ~[pytorch-engine-0.21.0.jar:?]
at ai.djl.pytorch.jni.LibUtils.findNativeLibrary(LibUtils.java:286) ~[pytorch-engine-0.21.0.jar:?]
at ai.djl.pytorch.jni.LibUtils.getLibTorch(LibUtils.java:89) ~[pytorch-engine-0.21.0.jar:?]
at ai.djl.pytorch.jni.LibUtils.loadLibrary(LibUtils.java:77) ~[pytorch-engine-0.21.0.jar:?]
at ai.djl.pytorch.engine.PtEngine.newInstance(PtEngine.java:53) ~[pytorch-engine-0.21.0.jar:?]
at ai.djl.pytorch.engine.PtEngineProvider.getEngine(PtEngineProvider.java:40) ~[pytorch-engine-0.21.0.jar:?]
at ai.djl.engine.Engine.getEngine(Engine.java:187) ~[api-0.21.0.jar:?]
at org.opensearch.ml.engine.algorithms.DLModel.doLoadModel(DLModel.java:185) ~[opensearch-ml-algorithms-2.11.1.0.jar:?]
at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:275) [opensearch-ml-algorithms-2.11.1.0.jar:?]
at java.security.AccessController.doPrivileged(AccessController.java:569) [?:?]
at org.opensearch.ml.engine.algorithms.DLModel.loadModel(DLModel.java:242) [opensearch-ml-algorithms-2.11.1.0.jar:?]
at org.opensearch.ml.engine.algorithms.DLModel.initModel(DLModel.java:138) [opensearch-ml-algorithms-2.11.1.0.jar:?]
at org.opensearch.ml.engine.MLEngine.deploy(MLEngine.java:125) [opensearch-ml-algorithms-2.11.1.0.jar:?]
at org.opensearch.ml.model.MLModelManager.lambda$deployModel$52(MLModelManager.java:1003) [opensearch-ml-2.11.1.0.jar:2.11.1.0]
at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.11.1.jar:2.11.1]
at org.opensearch.ml.model.MLModelManager.lambda$retrieveModelChunks$58(MLModelManager.java:1123) [opensearch-ml-2.11.1.0.jar:2.11.1.0]
at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.11.1.jar:2.11.1]
at org.opensearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:78) [opensearch-2.11.1.jar:2.11.1]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908) [opensearch-2.11.1.jar:2.11.1]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.11.1.jar:2.11.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.net.ConnectException: Connection timed out
at sun.nio.ch.Net.connect0(Native Method) ~[?:?]
at sun.nio.ch.Net.connect(Net.java:579) ~[?:?]
at sun.nio.ch.Net.connect(Net.java:568) ~[?:?]
at sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:593) ~[?:?]
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327) ~[?:?]
at java.net.Socket.connect(Socket.java:633) ~[?:?]
at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:304) ~[?:?]
at sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:174) ~[?:?]
at sun.net.NetworkClient.doConnect(NetworkClient.java:183) ~[?:?]
at sun.net.www.http.HttpClient.openServer(HttpClient.java:533) ~[?:?]
at sun.net.www.http.HttpClient.openServer(HttpClient.java:638) ~[?:?]
at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:266) ~[?:?]
at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:380) ~[?:?]
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:193) ~[?:?]
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1242) ~[?:?]
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1128) ~[?:?]
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:179) ~[?:?]
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1665) ~[?:?]
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1589) ~[?:?]
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:224) ~[?:?]
at java.net.URL.openStream(URL.java:1161) ~[?:?]
at ai.djl.util.Utils.openUrl(Utils.java:461) ~[api-0.21.0.jar:?]
at ai.djl.util.Utils.openUrl(Utils.java:445) ~[api-0.21.0.jar:?]
at ai.djl.pytorch.jni.LibUtils.downloadPyTorch(LibUtils.java:398) ~[pytorch-engine-0.21.0.jar:?]
... 22 more
[2024-01-23T00:13:14,969][ERROR][o.o.m.m.MLModelManager ] [ip-172-31-62-254.ec2.internal] Failed to retrieve model 786nM40BUDoVia3UznyW
org.opensearch.ml.common.exception.MLException: Failed to deploy model 786nM40BUDoVia3UznyW
at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:289) ~[?:?]
at java.security.AccessController.doPrivileged(AccessController.java:569) ~[?:?]
at org.opensearch.ml.engine.algorithms.DLModel.loadModel(DLModel.java:242) ~[?:?]
at org.opensearch.ml.engine.algorithms.DLModel.initModel(DLModel.java:138) ~[?:?]
at org.opensearch.ml.engine.MLEngine.deploy(MLEngine.java:125) ~[?:?]
at org.opensearch.ml.model.MLModelManager.lambda$deployModel$52(MLModelManager.java:1003) ~[?:?]
at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.11.1.jar:2.11.1]
at org.opensearch.ml.model.MLModelManager.lambda$retrieveModelChunks$58(MLModelManager.java:1123) [opensearch-ml-2.11.1.0.jar:2.11.1.0]
at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.11.1.jar:2.11.1]
at org.opensearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:78) [opensearch-2.11.1.jar:2.11.1]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908) [opensearch-2.11.1.jar:2.11.1]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.11.1.jar:2.11.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: ai.djl.engine.EngineException: Failed to save pytorch index file
at ai.djl.pytorch.jni.LibUtils.downloadPyTorch(LibUtils.java:403) ~[?:?]
at ai.djl.pytorch.jni.LibUtils.findNativeLibrary(LibUtils.java:286) ~[?:?]
at ai.djl.pytorch.jni.LibUtils.getLibTorch(LibUtils.java:89) ~[?:?]
at ai.djl.pytorch.jni.LibUtils.loadLibrary(LibUtils.java:77) ~[?:?]
at ai.djl.pytorch.engine.PtEngine.newInstance(PtEngine.java:53) ~[?:?]
at ai.djl.pytorch.engine.PtEngineProvider.getEngine(PtEngineProvider.java:40) ~[?:?]
at ai.djl.engine.Engine.getEngine(Engine.java:187) ~[?:?]
at org.opensearch.ml.engine.algorithms.DLModel.doLoadModel(DLModel.java:185) ~[?:?]
at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:275) ~[?:?]
... 14 more
Caused by: java.net.ConnectException: Connection timed out
at sun.nio.ch.Net.connect0(Native Method) ~[?:?]
at sun.nio.ch.Net.connect(Net.java:579) ~[?:?]
at sun.nio.ch.Net.connect(Net.java:568) ~[?:?]
at sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:593) ~[?:?]
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327) ~[?:?]
at java.net.Socket.connect(Socket.java:633) ~[?:?]
at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:304) ~[?:?]
at sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:174) ~[?:?]
at sun.net.NetworkClient.doConnect(NetworkClient.java:183) ~[?:?]
at sun.net.www.http.HttpClient.openServer(HttpClient.java:533) ~[?:?]
at sun.net.www.http.HttpClient.openServer(HttpClient.java:638) ~[?:?]
at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:266) ~[?:?]
at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:380) ~[?:?]
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:193) ~[?:?]
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1242) ~[?:?]
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1128) ~[?:?]
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:179) ~[?:?]
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1665) ~[?:?]
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1589) ~[?:?]
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:224) ~[?:?]
at java.net.URL.openStream(URL.java:1161) ~[?:?]
at ai.djl.util.Utils.openUrl(Utils.java:461) ~[?:?]
at ai.djl.util.Utils.openUrl(Utils.java:445) ~[?:?]
at ai.djl.pytorch.jni.LibUtils.downloadPyTorch(LibUtils.java:398) ~[?:?]
at ai.djl.pytorch.jni.LibUtils.findNativeLibrary(LibUtils.java:286) ~[?:?]
at ai.djl.pytorch.jni.LibUtils.getLibTorch(LibUtils.java:89) ~[?:?]
at ai.djl.pytorch.jni.LibUtils.loadLibrary(LibUtils.java:77) ~[?:?]
at ai.djl.pytorch.engine.PtEngine.newInstance(PtEngine.java:53) ~[?:?]
at ai.djl.pytorch.engine.PtEngineProvider.getEngine(PtEngineProvider.java:40) ~[?:?]
at ai.djl.engine.Engine.getEngine(Engine.java:187) ~[?:?]
at org.opensearch.ml.engine.algorithms.DLModel.doLoadModel(DLModel.java:185) ~[?:?]
at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:275) ~[?:?]
... 14 more
[2024-01-23T00:13:14,981][ERROR][o.o.m.a.f.TransportForwardAction] [ip-172-31-62-254.ec2.internal] deploy model failed on all nodes, model id: 786nM40BUDoVia3UznyW
[2024-01-23T00:13:14,981][INFO ][o.o.m.a.f.TransportForwardAction] [ip-172-31-62-254.ec2.internal] deploy model done with state: DEPLOY_FAILED, model id: 786nM40BUDoVia3UznyW
[2024-01-23T00:13:14,983][INFO ][o.o.m.a.d.TransportDeployModelOnNodeAction] [ip-172-31-62-254.ec2.internal] deploy model task done 8M6nM40BUDoVia3U7nw0
```
Under this circumstance, GET `/_plugins/_ml/models/<model-id>` tells us the deploy failed, but does not provide a reason. (Not sure if the task API would provide more info - I couldn't see how to get opensearch-py-ml to give me the task ID.)
```
{
"name": "sentence-transformers/all-MiniLM-L12-v2",
"model_group_id": "pWZUEo0BgFhXOXZgeEi_",
"algorithm": "TEXT_EMBEDDING",
"model_version": "11",
"model_format": "TORCH_SCRIPT",
"model_state": "DEPLOY_FAILED",
"model_content_size_in_bytes": 134568911,
"model_content_hash_value": "f8012a4e6b5da1f556221a12160d080157039f077ab85a5f6b467a47247aad49",
"model_config": {
"model_type": "bert",
"embedding_dimension": 384,
"framework_type": "SENTENCE_TRANSFORMERS",
"all_config": "{\"_name_or_path\":\"microsoft/MiniLM-L12-H384-uncased\",\"attention_probs_dropout_prob\":0.1,\"gradient_checkpointing\":false,\"hidden_act\":\"gelu\",\"hidden_dropout_prob\":0.1,\"hidden_size\":384,\"initializer_range\":0.02,\"intermediate_size\":1536,\"layer_norm_eps\":1e-12,\"max_position_embeddings\":512,\"model_type\":\"bert\",\"num_attention_heads\":12,\"num_hidden_layers\":12,\"pad_token_id\":0,\"position_embedding_type\":\"absolute\",\"transformers_version\":\"4.8.2\",\"type_vocab_size\":2,\"use_cache\":true,\"vocab_size\":30522}"
},
"created_time": 1705968651923,
"last_updated_time": 1705968794982,
"last_deployed_time": 1705968794981,
"total_chunks": 14,
"planning_worker_node_count": 1,
"current_worker_node_count": 0,
"planning_worker_nodes": [
"Q6DHrMfSTRyIEHJNDCnCsw"
],
"deploy_to_all_nodes": true
}
```
Please note, the above is assuming that DNS is permitted. If the egress firewall is also preventing DNS, the error is more useful and does contain the domain that needs to be whitelisted:
```
[2024-01-18T05:41:30,534][ERROR][o.o.m.e.a.DLModel ] [ip-172-31-58-14.ec2.internal] Failed to deploy model W9EWG40Blv3ldtU8hMVo
ai.djl.engine.EngineException: Failed to save pytorch index file
at ai.djl.pytorch.jni.LibUtils.downloadPyTorch(LibUtils.java:403) ~[pytorch-engine-0.21.0.jar:?]
at ai.djl.pytorch.jni.LibUtils.findNativeLibrary(LibUtils.java:286) ~[pytorch-engine-0.21.0.jar:?]
at ai.djl.pytorch.jni.LibUtils.getLibTorch(LibUtils.java:89) ~[pytorch-engine-0.21.0.jar:?]
at ai.djl.pytorch.jni.LibUtils.loadLibrary(LibUtils.java:77) ~[pytorch-engine-0.21.0.jar:?]
at ai.djl.pytorch.engine.PtEngine.newInstance(PtEngine.java:53) ~[pytorch-engine-0.21.0.jar:?]
at ai.djl.pytorch.engine.PtEngineProvider.getEngine(PtEngineProvider.java:40) ~[pytorch-engine-0.21.0.jar:?]
at ai.djl.engine.Engine.getEngine(Engine.java:187) ~[api-0.21.0.jar:?]
at org.opensearch.ml.engine.algorithms.DLModel.doLoadModel(DLModel.java:185) ~[opensearch-ml-algorithms-2.11.1.0.jar:?]
at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:275) [opensearch-ml-algorithms-2.11.1.0.jar:?]
at java.security.AccessController.doPrivileged(AccessController.java:569) [?:?]
at org.opensearch.ml.engine.algorithms.DLModel.loadModel(DLModel.java:242) [opensearch-ml-algorithms-2.11.1.0.jar:?]
at org.opensearch.ml.engine.algorithms.DLModel.initModel(DLModel.java:138) [opensearch-ml-algorithms-2.11.1.0.jar:?]
at org.opensearch.ml.engine.MLEngine.deploy(MLEngine.java:125) [opensearch-ml-algorithms-2.11.1.0.jar:?]
at org.opensearch.ml.model.MLModelManager.lambda$deployModel$52(MLModelManager.java:1003) [opensearch-ml-2.11.1.0.jar:2.11.1.0]
at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.11.1.jar:2.11.1]
at org.opensearch.ml.model.MLModelManager.lambda$retrieveModelChunks$58(MLModelManager.java:1123) [opensearch-ml-2.11.1.0.jar:2.11.1.0]
at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.11.1.jar:2.11.1]
at org.opensearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:78) [opensearch-2.11.1.jar:2.11.1]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908) [opensearch-2.11.1.jar:2.11.1]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.11.1.jar:2.11.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.net.UnknownHostException: publish.djl.ai
at sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:572) ~[?:?]
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327) ~[?:?]
at java.net.Socket.connect(Socket.java:633) ~[?:?]
at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:304) ~[?:?]
at sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:174) ~[?:?]
at sun.net.NetworkClient.doConnect(NetworkClient.java:183) ~[?:?]
at sun.net.www.http.HttpClient.openServer(HttpClient.java:533) ~[?:?]
at sun.net.www.http.HttpClient.openServer(HttpClient.java:638) ~[?:?]
at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:266) ~[?:?]
at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:380) ~[?:?]
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:193) ~[?:?]
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1242) ~[?:?]
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1128) ~[?:?]
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:179) ~[?:?]
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1665) ~[?:?]
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1589) ~[?:?]
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:224) ~[?:?]
at java.net.URL.openStream(URL.java:1161) ~[?:?]
at ai.djl.util.Utils.openUrl(Utils.java:461) ~[api-0.21.0.jar:?]
at ai.djl.util.Utils.openUrl(Utils.java:445) ~[api-0.21.0.jar:?]
at ai.djl.pytorch.jni.LibUtils.downloadPyTorch(LibUtils.java:398) ~[pytorch-engine-0.21.0.jar:?]
... 22 more
```