Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch - 2.8.0
Server OS - Oracle Linux Server 7.9
Describe the issue:
I’m trying to run the OpenSearch cluster with ml-commons plugin in an environment where internet access is disabled. While trying to deploy a PyTorch model using local file, the ml-commons plugin is trying to download the PyTorch engine from the internet. Is there a way to build ml-commons in such a way that all the dependencies are already present in the tar/zip which is plugged into the OpenSearch image? Can we create a fat jar somehow to make sure that it doesn’t refer to the internet internally?
API used:
Issue happens with deployment after successful registering with below API:
POST _plugins/_ml/models/_register
{
“name”: “sentence-transformers/all-MiniLM-L6-v2”,
“model_group_id”: “tcV5fsf43751lzeNBC-wj”,
“version”: “1.0.1”,
“model_format”: “TORCH_SCRIPT”,
“model_content_hash_value”: “c15f0d2e62d872be5b5bc6c84d2e0f4921541e29fefbef51d59cc10a8ae30e0f”,
“model_config”: {
“model_type”: “bert”,
“embedding_dimension”: 384,
“framework_type”: “sentence_transformers”
},
“url”: “file:///usr/data/model/sentence-transformers_all-MiniLM-L6-v2-1.0.1-torch_script.zip”
}
Relevant Logs or Screenshots:
ai.djl.engine.EngineException: Failed to load PyTorch native library
at ai.djl.pytorch.engine.PtEngine.newInstance(PtEngine.java:82)
at ai.djl.pytorch.engine.PtEngineProvider.getEngine(PtEngineProvider.java:40)
at ai.djl.engine.Engine.getEngine(Engine.java:181)
at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:209)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at org.opensearch.ml.engine.algorithms.DLModel.loadModel(DLModel.java:176)
at org.opensearch.ml.engine.algorithms.DLModel.initModel(DLModel.java:135)
at org.opensearch.ml.engine.MLEngine.deploy(MLEngine.java:116)
at org.opensearch.ml.model.MLModelManager.lambda$deployModel$29(MLModelManager.java:671)
at org.opensearch.action.ActionListener$1.onResponse(ActionListener.java:80)
at org.opensearch.ml.model.MLModelManager.lambda$retrieveModelChunks$35(MLModelManager.java:776)
at org.opensearch.action.ActionListener$1.onResponse(ActionListener.java:80)
at org.opensearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:78)
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:806)
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.IllegalStateException: Failed to save pytorch index file
at ai.djl.pytorch.jni.LibUtils.downloadPyTorch(LibUtils.java:399)
at ai.djl.pytorch.jni.LibUtils.findNativeLibrary(LibUtils.java:278)
at ai.djl.pytorch.jni.LibUtils.getLibTorch(LibUtils.java:87)
at ai.djl.pytorch.jni.LibUtils.loadLibrary(LibUtils.java:75)
at ai.djl.pytorch.engine.PtEngine.newInstance(PtEngine.java:53)
... 17 more
Caused by: java.net.ConnectException: Connection timed out (Connection timed out)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:412)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:255)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:237)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.base/java.net.Socket.connect(Socket.java:608)
at org.bouncycastle.jsse.provider.ProvSSLSocketDirect.connect(ProvSSLSocketDirect.java:170)
at java.base/java.net.Socket.connect(Socket.java:557)
at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:182)
at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:510)
at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:605)
at java.base/sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:265)
at java.base/sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:372)
at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:207)
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187)
at java.base/sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1071)
at java.base/sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1069)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/java.security.AccessController.doPrivilegedWithCombiner(AccessController.java:795)
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1068)
at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:193)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1592)
at java.base/sun.net.www.protocol.http.HttpURLConnection$9.run(HttpURLConnection.java:1512)
at java.base/sun.net.www.protocol.http.HttpURLConnection$9.run(HttpURLConnection.java:1510)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/java.security.AccessController.doPrivilegedWithCombiner(AccessController.java:795)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1509)
at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:250)
at java.base/java.net.URL.openStream(URL.java:1165)
at ai.djl.util.Utils.openUrl(Utils.java:459)
at ai.djl.util.Utils.openUrl(Utils.java:443)
at ai.djl.pytorch.jni.LibUtils.downloadPyTorch(LibUtils.java:394)
... 21 more
Failed to deploy model zS8RtIkBdyx9l2hZOMXg
Below are some useful articles I found: