Offline deployment pretrained of models

OpenSearch 2.12.0 :

Attempting to deploy the pretrained models on a server which has no internet access:
Model being deployed https://artifacts.opensearch.org/models/ml-models/huggingface/sentence-transformers/all-distilroberta-v1/1.0.1/torch_script/sentence-transformers_all-distilroberta-v1-1.0.1-torch_script.zip

Followed the below steps as mentioned in the Set up an ML language model doc

  • Prerequisite settings
PUT _cluster/settings
{
  "persistent": {
    "plugins": {
      "ml_commons": {
        "only_run_on_ml_node": "true",
        "model_access_control_enabled": "true",
        "native_memory_threshold": "99",
        "allow_registering_model_via_url": "true"
      }
    }
  }
}
  • Register a model group
POST /_plugins/_ml/model_groups/_register
{
  "name": "ml_model_group_sentence_transformers",
  "description": "A model group for sentence transformer",
  "access_mode": "public"
}
  • Check model group
GET _plugins/_ml/model_groups/jFICf48BvNyDZcmaRMm1

{
  "name": "ml_model_group_sentence_transformers",
  "latest_version": 12,
  "description": "A model group for sentence transformer",
  "owner": {
    "name": "admin",
    "backend_roles": [
      "admin"
    ],
    "roles": [
      "own_index",
      "all_access"
    ],
    "custom_attribute_names": [],
    "user_requested_tenant": "admin_tenant"
  },
  "access": "public",
  "created_time": 1715822806196,
  "last_updated_time": 1716260811689
}
  • Register a model.
POST /_plugins/_ml/models/_register
{
  "name": "huggingface/sentence-transformers/all-distilroberta-v1",
  "version": "1.0.1",
  "model_group_id": "IE5fX48BvNyDZcmaO4Wy",
  "model_format": "TORCH_SCRIPT"
}

  • This step was failing as the the plugin was trying to download the model from the internet.
{
  "task_type": "REGISTER_MODEL",
  "function_name": "TEXT_EMBEDDING",
  "state": "FAILED",
  "worker_node": [
    "_dpa206sRXeuoA6LVIVvgA"
  ],
  "create_time": 1715292140960,
  "last_update_time": 1715292273277,
  "error": "Connection timed out",
  "is_async": true
}

So I had to configure an internal mirror for the ML model artifacts and then the following worked

  • Registering the model again
POST /_plugins/_ml/models/_register
{
  "name": "huggingface/sentence-transformers/all-MiniLM-L6-v2",
  "version": "1.0.1",
  "description": "This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.",
  "model_task_type": "TEXT_EMBEDDING",
  "model_format": "TORCH_SCRIPT",
  "model_content_size_in_bytes": 91790008,
  "model_content_hash_value": "c15f0d2e62d872be5b5bc6c84d2e0f4921541e29fefbef51d59cc10a8ae30e0f",
  "model_config": {
    "model_type": "bert",
    "embedding_dimension": 384,
    "framework_type": "sentence_transformers",
    "all_config": """{"_name_or_path":"nreimers/MiniLM-L6-H384-uncased","architectures":["BertModel"],"attention_probs_dropout_prob":0.1,"gradient_checkpointing":false,"hidden_act":"gelu","hidden_dropout_prob":0.1,"hidden_size":384,"initializer_range":0.02,"intermediate_size":1536,"layer_norm_eps":1e-12,"max_position_embeddings":512,"model_type":"bert","num_attention_heads":12,"num_hidden_layers":6,"pad_token_id":0,"position_embedding_type":"absolute","transformers_version":"4.8.2","type_vocab_size":2,"use_cache":true,"vocab_size":30522}"""
  },
  "created_time": 1676328997102,
  "url": "https://some-internal-mirror.com/opensearch/models/ml-models/huggingface/sentence-transformers/all-MiniLM-L6-v2/1.0.1/torch_script/sentence-transformers_all-MiniLM-L6-v2-1.0.1-torch_script.zip",
  "model_group_id": "jFICf48BvNyDZcmaRMm1"
}
  • Model registration completed
{
  "model_id": "kaxYmY8BPuGHzr9Two-b",
  "task_type": "REGISTER_MODEL",
  "function_name": "TEXT_EMBEDDING",
  "state": "COMPLETED",
  "worker_node": [
    "bXBS993MSreq8GU4eW8dhw"
  ],
  "create_time": 1716264682119,
  "last_update_time": 1716264695414,
  "is_async": true
}
  • The Model deployment step does not work
POST /_plugins/_ml/models/kaxYmY8BPuGHzr9Two-b/_deploy

GET /_plugins/_ml/tasks/BltZmY8BvNyDZcmadPbu
  • Initially the the failure was because of the plugins inability to download the pytorch libraries. I had then installed it using pip and added environment variables to point to the location
export PYTORCH_LIBRARY_PATH=$HOME/.local/lib/python3.9/site-packages/torch/lib/
export PYTORCH_VERSION=1.13.1
export PYTORCH_FLAVOR=cpu

I also had to grant permissions to the Java Security Manager to the above location.

  • Now the plugin is able to read the pytorch shared objects. It is currently failing to load the DJL JNI library

org.opensearch.ml.common.exception.MLException: Failed to deploy model kaxYmY8BPuGHzr9Two-b
at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:294) ~[?:?]
at java.base/java.security.AccessController.doPrivileged(AccessController.java:569) ~[?:?]
at org.opensearch.ml.engine.algorithms.DLModel.loadModel(DLModel.java:247) ~[?:?]
at org.opensearch.ml.engine.algorithms.DLModel.initModel(DLModel.java:139) ~[?:?]
at org.opensearch.ml.engine.MLEngine.deploy(MLEngine.java:125) ~[?:?]
at org.opensearch.ml.model.MLModelManager.lambda$deployModel$51(MLModelManager.java:1020) ~[?:?]
at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.12.0.jar:2.12.0]
at org.opensearch.ml.model.MLModelManager.lambda$retrieveModelChunks$72(MLModelManager.java:1553) [opensearch-ml-2.12.0.0.jar:2.12.0.0]
at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.12.0.jar:2.12.0]
at org.opensearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:78) [opensearch-2.12.0.jar:2.12.0]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:913) [opensearch-2.12.0.jar:2.12.0]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.12.0.jar:2.12.0]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
Caused by: ai.djl.engine.EngineException: Cannot download jni files: https://publish.djl.ai/pytorch/1.13.1/jnilib/0.21.0/linux-x86_64/cpu/libdjl_torch.so

  • I tried adding the additional environment variables hoping that the pytorch engine can locate these libraries (I’ve placed the libdjl_torch.so in that path ), however that it did not work

export ENGINE_CACHE_DIR=$HOME/.djl.ai/
export DJL_OFFLINE=true

Let me know what is the procedure to allow these libraries to be loaded in offline mode. Am I missing something here.

1 Like

Hello ark202, have you been able to proceed any further in your endeavour to deploy a ML Model on an offline system?
I am running in similar issues.

@ark202 @xprtslpr
Hi, it’s because OpenSearch deploys the model you’d registered using DJL(Deep Java Library).

As you can see the code for LibUtils.java in DJL(code), _deploy API internally invokes downloadPyTorch method if DJL_OFFLINE or ai.djl.offline system env is false(code).

I think @ark202 tried to download the pytorch libraries (same with OpenSearch’s pretrained models) before DJL does, but error logs told you that loadModel which needs internet connection in DLModel.java was invoked.

Can you try offline mode by injecting DJL_OFFLINE env and share your result?

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.