NullPointerException at MLEngine.getRegisterModelPath() when registering local ONNX model in OpenSearch 2.19

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

Describe the issue:

I am attempting to register a local ONNX model (BGE-Small) from the filesystem in OpenSearch 2.19.
Folloed the below link.

The registration fails with a NullPointerException in the path resolution logic.

Error:
NullPointerException at org.opensearch.ml.engine.MLEngine.getRegisterModelPath(MLEngine.java:78)

Steps to Reproduce:

  1. Enable Local File Registration

Bash

curl -X PUT “http://x.x.x.x:50140/_cluster/settings
-H “Content-Type: application/json”
-d ‘{
“persistent”: {
“plugins.ml_commons.allow_registering_model_via_local_file”: “true”,
“plugins.ml_commons.only_run_on_ml_node”: “false”,
“plugins.ml_commons.model_access_control_enabled”: “false”,
“plugins.ml_commons.native_memory_threshold”: “99”
}
}’

  1. Register Model

Bash

curl -X POST “http://x.x.x.x:50140/_plugins/_ml/models/_register
-H “Content-Type: application/json”
-d ‘{
“name”: “bge-small-local”,
“version”: “1.0.0”,
“description”: “BGE Small EN v1.5 loaded from local filesystem”,
“function_name”: “TEXT_EMBEDDING”,
“model_format”: “ONNX”,
“model_content_size_in_bytes”: 77673724,
“model_content_hash_value”: “a7d620bc80dbc79b4d1770958e9a01dc3b96dd3f6e82e5d43f8fc9585df64fd7”,
“model_config”: {
“model_type”: “bert”,
“embedding_dimension”: 384,
“framework_type”: “sentence_transformers”,
“all_config”: “{”_name_or_path":“BAAI/bge-small-en-v1.5”,“architectures”:[“BertModel”],“attention_probs_dropout_prob”:0.1,“classifier_dropout”:null,“gradient_checkpointing”:false,“hidden_act”:“gelu”,“hidden_dropout_prob”:0.1,“hidden_size”:384,“initializer_range”:0.02,“intermediate_size”:1536,“layer_norm_eps”:1e-12,“max_position_embeddings”:512,“model_type”:“bert”,“num_attention_heads”:12,“num_hidden_layers”:12,“pad_token_id”:0,“position_embedding_type”:“absolute”,“transformers_version”:“4.33.1”,“type_vocab_size”:2,“use_cache”:true,“vocab_size”:30522}"
},
“model_path”: “/home/sasalunkhe/bge-small-en-v1.5-onnx.zip”
}’

Model File Details

Property
Value
File Path
/home/sasalunkhe/bge-small-en-v1.5-onnx.zip
File premission:
-rw-r–r-- 1 opensearch opensearch 77673724 Jun 23 08:07 bge-small-en-v1.5-onnx.zip
File Size
77,673,724 bytes
SHA256 Hash
a7d620bc80dbc79b4d1770958e9a01dc3b96dd3f6e82e5d43f8fc9585df64fd7
Contents
model.onnx + tokenizer.json
Model Type
BGE-Small EN v1.5 (ONNX format )
Embedding Dimension
384

Full Stack Trace:
2026-06-23T08:23:14,658 orker][T#12] ERROR org.ope.ml.mod.MLModelManager - Failed to update model group appName:qoswaf_eng_sjc01_p16
java.lang.NullPointerException
at java.base/sun.nio.fs.UnixPath.normalizeAndCheck(Unknown Source)
at java.base/sun.nio.fs.UnixPath.(Unknown Source)
at java.base/sun.nio.fs.UnixFileSystem.getPath(Unknown Source)
at java.base/java.nio.file.Path.resolve(Unknown Source)
at org.opensearch.ml.engine.MLEngine.getRegisterModelPath(MLEngine.java:78)
at org.opensearch.ml.engine.ModelHelper.lambda$downloadPrebuiltModelMetaList$2(ModelHelper.java:228)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at org.opensearch.ml.engine.ModelHelper.downloadPrebuiltModelMetaList(ModelHelper.java:226)
at org.opensearch.ml.model.MLModelManager.registerPrebuiltModel(MLModelManager.java:968)
at org.opensearch.ml.model.MLModelManager.uploadModel(MLModelManager.java:795)
at org.opensearch.ml.model.MLModelManager.lambda$registerMLModel$15(MLModelManager.java:517)

Questions:
Is there a known issue with local model registration in OpenSearch 2.19?

Please help me in resolving this issue.

Configuration:

  • OpenSearch 2.19
  • ML Commons enabled
  • Model: BGE-Small EN v1.5 (ONNX format)
  • File: /home/sasalunkhe/bge-small-en-v1.5-onnx.zip (77.6 MB)

Relevant Logs or Screenshots:

@santosh For a single _register call with a local file, can you try to set allow_registering_model_via_url, and reference the file with a file:// URL, see example below:

# 1. Enable URL-based registration (not allow_registering_model_via_local_file)
curl -X PUT "http://localhost:9200/_cluster/settings" \
  -H "Content-Type: application/json" \
  -d '{
    "persistent": {
      "plugins.ml_commons.allow_registering_model_via_url": "true",
      "plugins.ml_commons.only_run_on_ml_node": "false",
      "plugins.ml_commons.model_access_control_enabled": "false",
      "plugins.ml_commons.native_memory_threshold": "99"
    }
  }'

# 2. Register, with "version" included and the file referenced via a file:// url
curl -X POST "http://localhost:9200/_plugins/_ml/models/_register" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "bge-small-local",
    "version": "1.0.0",
    "description": "BGE Small EN v1.5 loaded from local filesystem",
    "function_name": "TEXT_EMBEDDING",
    "model_format": "ONNX",
    "model_content_size_in_bytes": 77673724,
    "model_content_hash_value": "a7d620bc80dbc79b4d1770958e9a01dc3b96dd3f6e82e5d43f8fc9585df64fd7",
    "model_config": { ... },
    "url": "file:///home/sasalunkhe/bge-small-en-v1.5-onnx.zip"
  }'

model_content_size_in_bytes and model_content_hash_value must match the real zip (stat -f%z file.zip / shasum -a 256 file.zip), and the zip needs to be readable by the opensearch process user at that exact path.

@Anthony , Thanks for reply. I tried the suggested steps. http://x.x.x.x:50140/_plugins/_ml/models/_register
{
“name”: “bge-local”,
“version”: “1.0.0”,
“description”: “BGE loaded from local filesystem”,
“function_name”: “TEXT_EMBEDDING”,
“model_format”: “ONNX”,
//“is_local_model”: true,
“model_config”: {
“model_type”: “xlm-roberta”,
“embedding_dimension”: 8194,
“framework_type”: “sentence_transformers”
},
“model_content_size_in_bytes”: 1840941330,
“model_content_hash_value”: “65d1520f8335f501fbcbc9cfef0be85dd4a616910b3b4ed29776233f21c42295”,
“url”: “file:///home/sasalunkhe/bge-m3-onnx_new.zip”
}

response
{
“task_id”: “c_5TAJ8B6DVfCvr3U1Gv”,
“status”: “CREATED”
}

check the status of taskid immediately

http://x.x.x.x:50140/_plugins/_ml/tasks/c_5TAJ8B6DVfCvr3U1Gv

response {
“error”: {
“root_cause”: [
{
“type”: “status_exception”,
“reason”: “Failed to find task”
}
],
“type”: “status_exception”,
“reason”: “Failed to find task”
},
“status”: 404
}

http://x.x.x.x:50140/_cluster/settings

{
“transient”: {
“logger.org.opensearch.ml”: “TRACE”
}
}

after this i see , below log line in the log

2026-06-25T20:06:18,056 gement][T#3] DEBUG act.syn.TransportSyncUpOnNodeAction - Found 1 models in cache folder: [B7tiAJ8BxUqOKIWlwmfF] appName:qoswaf_eng_sjc01_p16

try to deploy model using model id B7tiAJ8BxUqOKIWlwmfF as mentioned above.try to deploy model using model id B7tiAJ8BxUqOKIWlwmfF as mentioned above.( please correct me here if I am doing something wrong as I didn’t get model_id from task status api .I considered this model id B7tiAJ8BxUqOKIWlwmfF and proceed futher)

http://x.x.x.x:50140/_plugins/_ml/models/B7tiAJ8BxUqOKIWlwmfF/_deploy

After this, I could below error message in the log file.

2026-06-25T20:07:43,893 orker][T#14] INFO org.ope.ml.mod.MLModelManager - Successfully updated the model with ID: B7tiAJ8BxUqOKIWlwmfF appName:qoswaf_eng_sjc01_p16

2026-06-25T20:07:43,893 orker][T#14] INFO org.ope.ml.mod.MLModelManager - Successfully updated the model with ID: B7tiAJ8BxUqOKIWlwmfF appName:qoswaf_eng_sjc01_p16
2026-06-25T20:07:43,902 gement][T#1] DEBUG ep.TransportDeployModelOnNodeAction - start deploying model B7tiAJ8BxUqOKIWlwmfF appName:qoswaf_eng_sjc01_p16
2026-06-25T20:07:43,903 gement][T#1] DEBUG org.ope.ml.mod.MLModelCacheHelper - init model state for model B7tiAJ8BxUqOKIWlwmfF, state: DEPLOYING appName:qoswaf_eng_sjc01_p16
2026-06-25T20:07:43,905 gement][T#1] DEBUG org.ope.ml.tas.MLTaskManager - Task id: d_5lAJ8B6DVfCvr3QVHp, current running task DEPLOY_MODEL: 0 appName:qoswaf_eng_sjc01_p16
2026-06-25T20:07:43,918 orker][T#12] INFO org.ope.ml.tas.MLTaskManager - Successfully updated the task with ID: d_5lAJ8B6DVfCvr3QVHp appName:qoswaf_eng_sjc01_p16
2026-06-25T20:07:43,918 orker][T#12] DEBUG org.ope.ml.tas.MLTaskManager - Updated ML task successfully: OK, taskId: d_5lAJ8B6DVfCvr3QVHp, updatedFields: {state=RUNNING} appName:qoswaf_eng_sjc01_p16
2026-06-25T20:07:43,925 deploy][T#3] DEBUG org.ope.ml.mod.MLModelCacheHelper - Setting the quota flag for Model B7tiAJ8BxUqOKIWlwmfF appName:qoswaf_eng_sjc01_p16
2026-06-25T20:07:43,925 deploy][T#3] DEBUG org.ope.ml.mod.MLModelCacheHelper - Removing the rate limiter for Model B7tiAJ8BxUqOKIWlwmfF appName:qoswaf_eng_sjc01_p16
2026-06-25T20:07:43,925 deploy][T#3] DEBUG org.ope.ml.mod.MLModelCacheHelper - Removing the ML guard from Model B7tiAJ8BxUqOKIWlwmfF appName:qoswaf_eng_sjc01_p16
2026-06-25T20:07:43,925 deploy][T#3] DEBUG org.ope.ml.mod.MLModelManager - Model interface for model: B7tiAJ8BxUqOKIWlwmfF loaded into cache. appName:qoswaf_eng_sjc01_p16
2026-06-25T20:07:43,925 deploy][T#3] DEBUG org.ope.ml.mod.MLModelCacheHelper - Removing the ML Interface from Model B7tiAJ8BxUqOKIWlwmfF appName:qoswaf_eng_sjc01_p16
2026-06-25T20:07:43,930 deploy][T#3] ERROR org.ope.ml.mod.MLModelManager - No controller is deployed because the model B7tiAJ8BxUqOKIWlwmfF is expected not having an enabled model controller. Please use the create model controller api to create one if this is unexpected. appName:qoswaf_eng_sjc01_p16
2026-06-25T20:07:43,930 deploy][T#3] DEBUG org.ope.ml.mod.MLModelManager - No controller is deployed because the model B7tiAJ8BxUqOKIWlwmfF is expected not having an enabled model controller. appName:qoswaf_eng_sjc01_p16

===============

2026-06-25T20:09:49,199 deploy][T#6] WARN ai.djl.onn.eng.OrtEngine - CUDA is not supported OnnxRuntime engine: Error code - ORT_RUNTIME_EXCEPTION - message: /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1193 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory
appName:qoswaf_eng_sjc01_p16
2026-06-25T20:09:49,202 deploy][T#6] DEBUG org.ope.ml.eng.alg.DLModel - load model B7tiAJ8BxUqOKIWlwmfF to device 0: cpu() appName:qoswaf_eng_sjc01_p16
2026-06-25T20:09:49,262 deploy][T#6] WARN ai.djl.onn.eng.OrtEngine - CUDA is not supported OnnxRuntime engine: Error code - ORT_RUNTIME_EXCEPTION - message: /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1193 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory
appName:qoswaf_eng_sjc01_p16
2026-06-25T20:09:49,273 deploy][T#6] WARN ai.djl.pyt.jni.LibUtils - Override PyTorch version: 1.13.1. appName:qoswaf_eng_sjc01_p16
2026-06-25T20:09:49,339 deploy][T#6] ERROR org.ope.ml.eng.alg.DLModel - Failed to deploy model B7tiAJ8BxUqOKIWlwmfF appName:qoswaf_eng_sjc01_p16
java.lang.ExceptionInInitializerError
at ai.djl.onnxruntime.engine.OrtEngine.newBaseManager(OrtEngine.java:146)
at ai.djl.onnxruntime.engine.OrtEngine.newModel(OrtEngine.java:134)
at ai.djl.Model.newInstance(Model.java:99)
at ai.djl.repository.zoo.BaseModelLoader.createModel(BaseModelLoader.java:224)
at ai.djl.repository.zoo.BaseModelLoader.loadModel(BaseModelLoader.java:165)
at ai.djl.repository.zoo.Criteria.loadModel(Criteria.java:151)
at org.opensearch.ml.engine.algorithms.DLModel.doLoadModel(DLModel.java:217)
at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:286)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at org.opensearch.ml.engine.algorithms.DLModel.loadModel(DLModel.java:252)
at org.opensearch.ml.engine.algorithms.DLModel.initModel(DLModel.java:142)
at org.opensearch.ml.engine.MLEngine.deploy(MLEngine.java:144)
at org.opensearch.ml.model.MLModelManager.lambda$deployModel$49(MLModelManager.java:1274)
at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82)
at org.opensearch.ml.model.MLModelManager.lambda$retrieveModelChunks$77(MLModelManager.java:2150)
at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82)
at org.opensearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:78)
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1014)
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: ai.djl.engine.EngineException: Failed to save pytorch index file
at ai.djl.pytorch.jni.LibUtils.downloadPyTorch(LibUtils.java:429)
at ai.djl.pytorch.jni.LibUtils.findNativeLibrary(LibUtils.java:314)
at ai.djl.pytorch.jni.LibUtils.getLibTorch(LibUtils.java:93)
at ai.djl.pytorch.jni.LibUtils.loadLibrary(LibUtils.java:81)
at ai.djl.pytorch.engine.PtEngine.newInstance(PtEngine.java:53)
at ai.djl.pytorch.engine.PtEngineProvider.getEngine(PtEngineProvider.java:41)
at ai.djl.engine.Engine.getEngine(Engine.java:190)
at ai.djl.onnxruntime.engine.OrtEngine.getAlternativeEngine(OrtEngine.java:83)
at ai.djl.ndarray.BaseNDManager.(BaseNDManager.java:64)
at ai.djl.onnxruntime.engine.OrtNDManager.(OrtNDManager.java:42)
at ai.djl.onnxruntime.engine.OrtNDManager.(OrtNDManager.java:35)
at ai.djl.onnxruntime.engine.OrtNDManager$SystemManager.(OrtNDManager.java:177)
at ai.djl.onnxruntime.engine.OrtNDManager.(OrtNDManager.java:37)
… 22 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
at java.base/java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.base/java.net.SocksSocketImpl.connect(Unknown Source)
at java.base/java.net.Socket.connect(Unknown Source)
at java.base/sun.security.ssl.SSLSocketImpl.connect(Unknown Source)
at java.base/sun.security.ssl.BaseSSLSocketImpl.connect(Unknown Source)
at java.base/sun.net.NetworkClient.doConnect(Unknown Source)
at java.base/sun.net.www.http.HttpClient.openServer(Unknown Source)
at java.base/sun.net.www.http.HttpClient.openServer(Unknown Source)
at java.base/sun.net.www.protocol.https.HttpsClient.(Unknown Source)
at java.base/sun.net.www.protocol.https.HttpsClient.New(Unknown Source)
at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(Unknown Source)
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(Unknown Source)
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(Unknown Source)
at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(Unknown Source)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown Source)
at ai.djl.util.Utils.openUrl(Utils.java:519)
at ai.djl.util.Utils.openUrl(Utils.java:498)
at ai.djl.util.Utils.openUrl(Utils.java:487)
at ai.djl.pytorch.jni.LibUtils.downloadPyTorch(LibUtils.java:424)
… 34 more

Please help me in understanding this backtrace and provide me solution to register and deploy local onnx model successfully. Any help would be appreciated.

Hi Team,

The above-mentioned issue is blocker for me. Please help me in resolving this issue.

@santosh is this an air-gapped environment?

Glad the registration fix worked! On the deployment failures, I tested this end-to-end in a local OpenSearch 2.19.5 docker environment, see findings below.

Even with CPUExecutionProvider set and an ONNX model, OpenSearch ML Commons uses DJL (Deep Java Library) as its ML runtime. When framework_type: "sentence_transformers" is specified, DJL initialises both its ONNX Runtime and its PyTorch engine on first use. It downloads these native libraries from the internet on first deployment:

Library Size
PyTorch 1.13.1 (libtorch_cpu.so + supporting libs) ~162 MB
HuggingFace Tokenizers 0.20.3 (libtokenizers.so) ~13 MB

If the node has no outbound internet access, this download fails with “Connection refused”, which is exactly what you saw.

These land in: $OPENSEARCH_DATA_DIR/ml_cache/pytorch/ and $OPENSEARCH_DATA_DIR/ml_cache/tokenizers/

DJL fully caches after the first download, Once those directories are populated, every subsequent deployment reuses them with no network calls at all. I confirmed this by deploying a second model immediately after, completed in ~1 second, zero network activity, cache file timestamps unchanged.

For air-gapped nodes, you can pre-seed the DJL cache before your first deployment. Two options:

Option 1: Copy from a machine that has internet

On any machine with internet access, spin up an OpenSearch 2.19 instance, deploy any ONNX text-embedding model once, then copy the cache directories to your air-gapped node:

$OPENSEARCH_DATA_DIR/ml_cache/pytorch/
$OPENSEARCH_DATA_DIR/ml_cache/tokenizers/

The exact subdirectory name is CPU-architecture-dependent, on x86_64 it will be 1.13.1-cpu-precxx11-linux-x86_64, on ARM it will be 1.13.1-cpu-precxx11-linux-aarch64. The rest of the path is the same.

Option 2: Docker volume mount (if running in Docker)

Pre-populate the cache on a connected machine and mount it into the air-gapped container:

services:
  opensearch:
    volumes:
      - ./djl-cache-pytorch:/usr/share/opensearch/data/ml_cache/pytorch
      - ./djl-cache-tokenizers:/usr/share/opensearch/data/ml_cache/tokenizers

Full working request (combining the registration fix + CPU provider)

# 1. Cluster settings — note allow_registering_model_via_url, not via_local_file
PUT _cluster/settings
{
  "persistent": {
    "plugins.ml_commons.allow_registering_model_via_url": "true",
    "plugins.ml_commons.only_run_on_ml_node": "false",
    "plugins.ml_commons.native_memory_threshold": "99"
  }
}

# 2. Register using "url" with file:// scheme (not "model_path") and include "version"
POST _plugins/_ml/models/_register
{
  "name": "bge-small-local",
  "version": "1.0.0",
  "function_name": "TEXT_EMBEDDING",
  "model_format": "ONNX",
  "model_content_size_in_bytes": <output of: stat -c%s your-model.zip>,
  "model_content_hash_value": "<output of: sha256sum your-model.zip>",
  "model_config": {
    "model_type": "bert",
    "embedding_dimension": 384,
    "framework_type": "sentence_transformers",
    "onnx_execution_providers": ["CPUExecutionProvider"]
  },
  "url": "file:///absolute/path/to/bge-small-en-v1.5-onnx.zip"
}

The onnx_execution_providers field tells DJL to skip GPU provider loading entirely (avoids the libcublasLt.so.11 error on CPU-only nodes). With the DJL cache pre-seeded, this should register and deploy successfully in a fully air-gapped environment.

Hope this helps

Hi @Anthony ,

Thanks for your reply. This will certainly help. Will try it out and let you know.

Best Regards,

Santosh