How to deploy an INT8 ONNX CPU-only model in OpenSearch ML Commons?

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

  • OpenSearch: 2.19.2 (Docker image opensearchproject/opensearch:latest)
  • Host OS: Windows 10
  • Docker: 28.3.2

Describe the issue:
I’m trying to upload the INT8 ONNX model from the Hugging Face repository dewdev/mdeberta-v3-base-squad2-onnx (specifically the file model_int8.onnx). The model appears in the Dashboards Pretrained models list, but its status is always Not responding. I’ve tried packaging the model with config.json and tokenizer.json, and also just with config.json, but both attempts fail. I’ve also tried using the opensearch-py-ml client and sending the multipart upload via Postman directly to the REST API, with the same “Not responding” result. My goal is to run inference on CPU only (no GPU).

Configuration:

  • Model repository: dewdev/mdeberta-v3-base-squad2-onnx
  • Packaged files:
    • model_int8.onnx (INT8 quantized ONNX)
    • config.json
    • (optional) tokenizer.json
  • Upload methods tested:
    1. Dashboards UI → Upload model (zip archive)
    2. opensearch-py-ml’s register_pretrained_model / upload_model APIs
    3. Postman multipart POST https://localhost:9200/_plugins/_ml/models/_upload
  • Running in a Docker single-node cluster with ML Commons plugin installed
  • Desired execution providers: CPU only (plugins.ml_commons.onnx_runtime_execution_providers: ["CPUExecutionProvider"] in opensearch.yml)

Relevant Logs or Screenshots:
CUDA is not supported OnnxRuntime engine:
Error code - ORT_RUNTIME_EXCEPTION - message:
onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library
libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory

  • My expectation is for the model to load and run on CPU only, but it never progresses past Not responding. Any guidance on how to configure ML Commons or package/upload the model correctly to achieve CPU-only inference would be greatly appreciated.