How to deploy an INT8 ONNX CPU-only model in OpenSearch ML Commons?

Uncommonness · July 17, 2025, 7:45am

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

OpenSearch: 2.19.2 (Docker image opensearchproject/opensearch:latest)
Host OS: Windows 10
Docker: 28.3.2

Describe the issue:
I’m trying to upload the INT8 ONNX model from the Hugging Face repository dewdev/mdeberta-v3-base-squad2-onnx (specifically the file model_int8.onnx). The model appears in the Dashboards Pretrained models list, but its status is always Not responding. I’ve tried packaging the model with config.json and tokenizer.json, and also just with config.json, but both attempts fail. I’ve also tried using the opensearch-py-ml client and sending the multipart upload via Postman directly to the REST API, with the same “Not responding” result. My goal is to run inference on CPU only (no GPU).

Configuration:

Model repository: dewdev/mdeberta-v3-base-squad2-onnx
Packaged files:
- model_int8.onnx (INT8 quantized ONNX)
- config.json
- (optional) tokenizer.json
Upload methods tested:
1. Dashboards UI → Upload model (zip archive)
2. opensearch-py-ml’s register_pretrained_model / upload_model APIs
3. Postman multipart POST https://localhost:9200/_plugins/_ml/models/_upload
Running in a Docker single-node cluster with ML Commons plugin installed
Desired execution providers: CPU only (plugins.ml_commons.onnx_runtime_execution_providers: ["CPUExecutionProvider"] in opensearch.yml)

Relevant Logs or Screenshots:
CUDA is not supported OnnxRuntime engine:
Error code - ORT_RUNTIME_EXCEPTION - message:
onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library
libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory

My expectation is for the model to load and run on CPU only, but it never progresses past Not responding. Any guidance on how to configure ML Commons or package/upload the model correctly to achieve CPU-only inference would be greatly appreciated.

Topic		Replies	Views
Deploying custom sentence embedding model - onnxruntime not found Machine Learning	4	725	July 6, 2024
How to upload an ML model in offline mode on OpenSearch 2.17? Machine Learning	10	26	July 17, 2025
How to register local custom model? Machine Learning	4	353	January 4, 2025
Failed to Deploy Model OpenSearch	0	561	September 5, 2023
Model deployment failing - java.lang.NoClassDefFoundError: Could not initialize class ai.djl.onnxruntime.engine.OrtNDManager Machine Learning	1	114	January 11, 2025

How to deploy an INT8 ONNX CPU-only model in OpenSearch ML Commons?

Related topics