Offline deployment of pretrained ML models in OpenSearch 2.18.0 — resolving DJL JNI library loading failure

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

  1. OpenSearch: 2.18.0
  2. ML Commons Plugin: 2.18.0.0
  3. Server OS: Linux (x86_64)
  4. Python: 3.9
  5. PyTorch: 2.2.2
  6. DJL: 0.28.0

Describe the issue:

While deploying pretrained sentence-transformer models on an air-gapped server, the registration step succeeded using an internal mirror. However, the deployment failed due to the plugin attempting to download DJL JNI libraries from the internet, resulting in a libdjl_torch.so access error.

Configuration:

  1. Internal mirror configured for model zip files.

  2. Environment variables set for PyTorch and DJL offline mode.

  3. Node configured with ml role.

  4. Environment variables set:

    export PYTORCH_LIBRARY_PATH=$HOME/.local/lib/python3.9/site-packages/torch/lib/
    export PYTORCH_VERSION=2.2.2
    export PYTORCH_FLAVOR=cpu
    export ENGINE_CACHE_DIR=$HOME/.djl.ai/
    export DJL_OFFLINE=true
    

Relevant Logs or Screenshots:

Deployment initially failed with:

EngineException: Cannot download jni files: https://publish.djl.ai/pytorch/1.13.1/jnilib/0.21.0/linux-x86_64/cpu/libdjl_torch.so

Previous related post (now closed): Offline deployment pretrained of models

Update / Resolution:

To resolve the DJL JNI library loading failure in offline mode, follow these steps:

1. On a local laptop (or internet-enabled machine):

curl -O https://publish.djl.ai/pytorch/2.2.2/jnilib/0.28.0/linux-x86_64/cpu/libdjl_torch.so
curl -O https://repo1.maven.org/maven2/ai/djl/pytorch/pytorch-native-cpu/2.2.2/pytorch-native-cpu-2.2.2-linux-x86_64.jar

2. Copy the downloaded files to your OpenSearch host and place them in:

mkdir ~/opensearch/os_on_9200/opensearch-2.18.0/config/.djl.ai/
cd ~/opensearch/os_on_9200/opensearch-2.18.0/config/.djl.ai/
cp /tmp/pytorch-native-cpu-2.2.2-linux-x86_64.jar .
cp /tmp/libdjl_torch.so .
unzip pytorch-native-cpu-2.2.2-linux-x86_64.jar
cp pytorch/cpu/linux-x86_64/* .

3. Ensure you have these library files and they all have linked dependencies

~/opensearch/os_on_9200/opensearch-2.18.0/config/.djl.ai]$ll
total 595456
-rwxr-----. 1 euser euser   1099488 Sep 29 19:58 libc10.so
-rwxr-----. 1 euser euser   4037888 Sep 29 19:58 libdjl_torch.so
-rw-r-----. 1 euser euser    283265 Sep 29 19:58 libgomp-98b21ff3.so.1
-rwxr-----. 1 euser euser 477712785 Sep 29 19:58 libtorch_cpu.so
-rwxr-----. 1 euser euser      7584 Sep 29 19:58 libtorch.so

~/opensearch/os_on_9200/opensearch-2.18.0/config/.djl.ai]$ldd libdjl_torch.so
	linux-vdso.so.1 (0x00007ffc2d3b5000)
	libc10.so => /home/euser/opensearch/os_on_9200/opensearch-2.18.0/config/.djl.ai/libc10.so (0x00007f8b7631f000)
	libtorch_cpu.so => /home/euser/opensearch/os_on_9200/opensearch-2.18.0/config/.djl.ai/libtorch_cpu.so (0x00007f8b5ec00000)
	libtorch.so => /home/euser/opensearch/os_on_9200/opensearch-2.18.0/config/.djl.ai/libtorch.so (0x00007f8b76641000)
	libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f8b5e800000)
	libm.so.6 => /usr/lib64/libm.so.6 (0x00007f8b5eb25000)
	libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00007f8b76305000)
	libc.so.6 => /usr/lib64/libc.so.6 (0x00007f8b5e400000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f8b76646000)
	libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007f8b76300000)
	librt.so.1 => /usr/lib64/librt.so.1 (0x00007f8b762fb000)
	libdl.so.2 => /usr/lib64/libdl.so.2 (0x00007f8b762f6000)
	libgomp-98b21ff3.so.1 => /home/euser/opensearch/os_on_9200/opensearch-2.18.0/config/.djl.ai/libgomp-98b21ff3.so.1 (0x00007f8b5eade000)

4. Update your ~/.profile with:

export ENGINE_CACHE_DIR=$HOME/opensearch/os_on_9200/opensearch-2.18.0/config/.djl.ai/
export DJL_OFFLINE=true
export DJL_CACHE_DIR=$HOME/opensearch/os_on_9200/opensearch-2.18.0/config/.djl.ai/
export PYTORCH_LIBRARY_PATH=$HOME/opensearch/os_on_9200/opensearch-2.18.0/config/.djl.ai/
export PYTORCH_VERSION=2.2.2
export PYTORCH_FLAVOR=cpu
export LD_LIBRARY_PATH=/usr/lib64:/lib64:/usr/lib:/lib:$HOME/opensearch/os_on_9200/opensearch-2.18.0/config/.djl.ai

5. Add Java Security Manager permissions:

grant {
    permission java.io.FilePermission "/home/elastic/opensearch/os_on_9200/opensearch-2.18.0/config/.djl.ai/tokenizers", "read,write,delete";
    permission java.io.FilePermission "/home/elastic/opensearch/os_on_9200/opensearch-2.18.0/config/.djl.ai/tokenizers/-", "read,write,delete";
};

6. Restart the node:

Restart all nodes in the cluster

After these steps, the model deployment completed successfully in offline mode. Hope this helps others facing similar issues!

That’s Awesome! Do you mind contributing in our ml-commons repo. It would be nice if you add a doc in our doc section