OpenSearch 2.9 ML Framework Model Upload Not Working

asfoorial · July 25, 2023, 1:25pm

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OS 2.9/Windows 11 & Rocky Linux 8/Chromium Edge

Describe the issue:

Can we have a fully working example on 2.9?

I tried the steps in ML framework - OpenSearch documentation,
I executed the below

POST /_plugins/_ml/models/_upload
{
“name”: “all-MiniLM-L6-v2”,
“version”: “1.0.0”,
“description”: “test model”,
“model_format”: “TORCH_SCRIPT”,
“model_config”: {
“model_type”: “bert”,
“embedding_dimension”: 384,
“framework_type”: “sentence_transformers”
},
“url”: “https://github.com/opensearch-project/ml-commons/raw/2.x/ml-algorithms/src/test/resources/org/opensearch/ml/engine/algorithms/text_embedding/all-MiniLM-L6-v2_torchscript_sentence-transformer.zip?raw=true”
}

It never worked for me. I got the below error in the log. In addition, when I load a model, sometimes it disappears and when I try to load it again it says that the same ID already exists!

I appreciate your input if you got it working. By the way, it was working in previous versions (2.4 in particular)

Failed to index chunk file
java.security.PrivilegedActionException: null
at java.security.AccessController.doPrivileged(AccessController.java:573) ~[?:?]
at org.opensearch.ml.engine.ModelHelper.downloadAndSplit(ModelHelper.java:197) [opensearch-ml-algorithms-2.9.0.0.jar:?]
at org.opensearch.ml.model.MLModelManager.registerModel(MLModelManager.java:526) [opensearch-ml-2.9.0.0.jar:2.9.0.0]
at org.opensearch.ml.model.MLModelManager.lambda$registerModelFromUrl$19(MLModelManager.java:498) [opensearch-ml-2.9.0.0.jar:2.9.0.0]
at org.opensearch.action.ActionListener$1.onResponse(ActionListener.java:80) [opensearch-2.9.0.jar:2.9.0]
at org.opensearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:78) [opensearch-2.9.0.jar:2.9.0]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908) [opensearch-2.9.0.jar:2.9.0]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.9.0.jar:2.9.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.nio.file.NoSuchFileException: /opensearch/opensearch-2.9.0/data/ml_cache/models_cache/register/s1Y1jYkBxklcJkLMTjwU/1/all-MiniLM-L6-v2.zip

I am also using the below configs in addition to the defaults.
plugins.ml_commons.only_run_on_ml_node: false
plugins.ml_commons.allow_registering_model_via_url: true

dhrubo · July 25, 2023, 4:10pm

Can you please try:

POST /_plugins/_ml/models/_register
{
  "name": "huggingface/sentence-transformers/all-MiniLM-L12-v2",
  "version": "1.0.1",
  "model_format": "TORCH_SCRIPT"
}

We are supporting few pre-trained models: Pretrained models - OpenSearch documentation

Note, we updated api endpoint name (while _upload is also supported for a while) to _register

Updated api endpoints are here: Redirecting…

Thanks.

Spboppan · July 26, 2023, 2:18am

@dhrubo When I am trying this in OS 2.8, Any help on this please
POST /_plugins/_ml/models/_register
{
“name”: “huggingface/sentence-tramsformers/all-MiniLM-L12-v2”,
“model_group_id”: “Mmtzj4kBUMHslIHqUQaR”,
“version”: “1.0.1”,
“model_format”: “TORCH_SCRIPT”
}
It gives me the task id as response and I am using this taskid for GET, it says connection timed out, any reason why ?
GET /_plugins/_ml/tasks/sBHHjokBS-NchEYQO9Nz
{
“task_type”: “DEPLOY_MODEL”,
“function_name”: “TEXT_EMBEDDING”,
“state”: “FAILED”,
“worker_node”: [
“sNgMP15nRkOtp8ZFFwAg1g”
],
“create_time”: 1690332460326,
“last_update_time”: 1690332588709,
“error”: “Connection timed out (Connection timed out)”,
“is_async”: true
}

When I search the logs I see this on datanodes
[2023-07-26T00:08:47,306][ERROR][o.o.m.m.MLModelManager ] [opensearch-data-1] Failed to update model group
java.security.PrivilegedActionException: null
at java.security.AccessController.doPrivileged(Native Method) ~[?:?]
at org.opensearch.ml.engine.ModelHelper.downloadPrebuiltModelMetaList(ModelHelper.java:166) ~[?:?]
at org.opensearch.ml.model.MLModelManager.registerPrebuiltModel(MLModelManager.java:510) ~[?:?]
at org.opensearch.ml.model.MLModelManager.uploadModel(MLModelManager.java:357) ~[?:?]
at org.opensearch.ml.model.MLModelManager.lambda$registerMLModel$11(MLModelManager.java:324) ~[?:?]
at org.opensearch.action.ActionListener$1.onResponse(ActionListener.java:80) ~[opensearch-2.8.0.jar:2.8.0]
at org.opensearch.action.support.TransportAction$1.onResponse(TransportAction.java:113) ~[opensearch-2.8.0.jar:2.8.0]
at org.opensearch.action.support.TransportAction$1.onResponse(TransportAction.java:107) ~[opensearch-2.8.0.jar:2.8.0]
at com.oracle.pic.opensearch.actions.MonitoringActionFilter.lambda$apply$0(MonitoringActionFilter.java:68) ~[?:?]
at org.opensearch.action.ActionListener$1.onResponse(ActionListener.java:80) ~[opensearch-2.8.0.jar:2.8.0]

ylwu · July 26, 2023, 5:28am

Can you check if the model group id is valid or not? Suggest try to use OS2.9 the latest released version.

asfoorial · July 26, 2023, 8:25am

Thanks @dhrubo , I can register a model now from a URL. However, I keep getting errors when registering my custom model from a local directory.

POST /_plugins/_ml/models/_register
{
“name”: “all-MiniLM-L6-v2-128”,
“version”: “1.0.0”,
“description”: “test model”,
“model_format”: “TORCH_SCRIPT”,
“model_config”: {
“model_type”: “bert”,
“embedding_dimension”: 128,
“framework_type”: “sentence_transformers”
},
“url”: “file://home/hasan/all-MiniLM-L6-v2-128.zip”
}

Failed to index chunk file
java.security.AccessControlException: access denied (“java.io.FilePermission” “/home/myuser/all-MiniLM-L6-v2-128.zip” “read”)
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:485) ~[?:?]
at java.security.AccessController.checkPermission(AccessController.java:1068) ~[?:?]
at java.lang.SecurityManager.checkPermission(SecurityManager.java:416) ~[?:?]
at java.lang.SecurityManager.checkRead(SecurityManager.java:756) ~[?:?]
at java.io.File.isDirectory(File.java:860) ~[?:?]
at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:78) ~[?:?]
at sun.net.www.protocol.file.FileURLConnection.initializeHeaders(FileURLConnection.java:106) ~[?:?]
at sun.net.www.protocol.file.FileURLConnection.getContentLengthLong(FileURLConnection.java:164) ~[?:?]
at ai.djl.training.util.DownloadUtils.download(DownloadUtils.java:73) ~[api-0.21.0.jar:?]
at ai.djl.training.util.DownloadUtils.download(DownloadUtils.java:52) ~[api-0.21.0.jar:?]
at org.opensearch.ml.engine.ModelHelper.lambda$downloadAndSplit$3(ModelHelper.java:203) ~[opensearch-ml-algorithms-2.9.0.0.jar:?]
at java.security.AccessController.doPrivileged(AccessController.java:56

dhrubo · July 26, 2023, 3:07pm

asfoorial:

Failed to index chunk file
java.security.AccessControlException: access denied (“java.io.FilePermission” “/home/myuser/all-MiniLM-L6-v2-128.zip” “read”)
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:485) ~[?:?]
at java.security.AccessController.checkPermission(AccessController.java:1068) ~[?:?]
at java.lang.SecurityManager.checkPermission(SecurityManager.java:416) ~[?:?]
at java.lang.SecurityManager.checkRead(SecurityManager.java:756) ~[?:?]
at java.io.File.isDirectory(File.java:860) ~[?:?]
at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:78) ~[?:?]
at sun.net.www.protocol.file.FileURLConnection.initializeHeaders(FileURLConnection.java:106) ~[?:?]
at sun.net.www.protocol.file.FileURLConnection.getContentLengthLong(FileURLConnection.java:164) ~[?:?]
at ai.djl.training.util.DownloadUtils.download(DownloadUtils.java:73) ~[api-0.21.0.jar:?]
at ai.djl.training.util.DownloadUtils.download(DownloadUtils.java:52) ~[api-0.21.0.jar:?]
at org.opensearch.ml.engine.ModelHelper.lambda$downloadAndSplit$3(ModelHelper.java:203) ~[opensearch-ml-algorithms-2.9.0.0.jar:?]
at java.security.AccessController.doPrivileged(AccessController.java:56

From the log it looks like Java security settings are denying read access to the file /home/myuser/all-MiniLM-L6-v2-128.zip

Java has a built-in security manager that, when enabled, can control actions like file I/O. In your case, it seems like the security manager is denying the read access to your file.

Can you please ensure that the file or directory has the correct permissions at the operating system level. For example, in a Unix-like operating system, you could use the chmod command to give read permission to all users:

chmod a+r /home/myuser/all-MiniLM-L6-v2-128.zip

dhrubo · July 26, 2023, 3:09pm

You can check out this notebook also: Demo Notebook for MLCommons Integration — Opensearch-py-ml 1.0.0 documentation

ylwu · July 26, 2023, 6:43pm

OpenSearch security manager doesn’t allow read this file “url”: “file://home/hasan/all-MiniLM-L6-v2-128.zip”.

You can try to upload your model to somewhere like Github/S3, then you can use that public URL to upload model.

If you need to upload a model file from your local, you can follow the demo notebook shared by @dhrubo Demo Notebook for MLCommons Integration — Opensearch-py-ml 1.0.0 documentation

asfoorial · July 27, 2023, 3:16am

I noticed that OpenSearch downloads some djl packages after registering and loading a huggingface model. Is there a way to run completely offline? That is, have all required library files and models made available offline and placed in the correct folder?

asfoorial · July 27, 2023, 4:07am

@ylwu I followed the Python guide but I got the below error. It looks like the opensearch-py-ml is not up to date in pypi. I even removed and reinstalled it and got the same error.

ml_client.register_model(model_path, model_config_path_torch, isVerbose=True)

AttributeError: ‘MLCommonClient’ object has no attribute ‘register_model’

ylwu · July 27, 2023, 6:35am

I noticed that OpenSearch downloads some djl packages after registering and loading a huggingface model. Is there a way to run completely offline? That is, have all required library files and models made available offline and placed in the correct folder?

For now, no. To support different platforms, we have to dynamically download the library files. If this is a must-have, you can create a Github issue on ml-commons repo.Issues · opensearch-project/ml-commons · GitHub

ylwu · July 27, 2023, 6:36am

AttributeError: ‘MLCommonClient’ object has no attribute ‘register_model’

@dhrubo Can you help take a look?

To unblock you, can you try pretrained model? Pretrained models - OpenSearch documentation

POST /_plugins/_ml/models/_register
{
  "name": "huggingface/sentence-transformers/all-MiniLM-L12-v2",
  "version": "1.0.1",
  "model_format": "TORCH_SCRIPT"
}

asfoorial · July 27, 2023, 8:40am

Thanks I will try to log an issue. But at least for now, can you point me to the location in which these files get downloaded. My environment is Linux (CentOS/RedHat/Rocky Linux 8)

dhrubo · July 28, 2023, 12:05am

Hi, We just released a version (1.1.0) for opensearch-py-ml. Can you please update the plugin and give a try?

Thanks
Dhrubo

ylwu · July 29, 2023, 8:00am

you should see in <OS_HOME>/data/ml_cache folder

asfoorial · July 29, 2023, 10:01am

But the ml_cache as far as I understood is a temporary location that can be cleared automatically. My point is, I want to make sure that subsequent calls to deploy ml model will always be offline. Is that something possible?

ylwu · July 29, 2023, 4:58pm

Only first call will download, the downloaded library won’t be deleted from cache folder (unless you delete it manually). So it will be offline for subsequent calls.

asfoorial · July 31, 2023, 7:33am

The API is quite confusing! Now it is not working anymore. I cleaned the data directory and started again with the below config and then executed the below POSTs

######## End OpenSearch Security Demo Configuration ########
plugins.ml_commons.only_run_on_ml_node: false
plugins.ml_commons.allow_registering_model_via_local_file: true
plugins.ml_commons.allow_registering_model_via_url: true
node.roles: [cluster_manager, data, ingest, ml ]

POST /_plugins/_ml/models/_register
{
“name”: “huggingface/sentence-transformers/all-MiniLM-L12-v2”,
“version”: “1.0.1”,
“model_format”: “TORCH_SCRIPT”
}

It generates ghP9kYkB_Tms6nnSMlbo model id.

POST /_plugins/_ml/models/ghP9kYkB_Tms6nnSMlbo/_deploy

The after deploying it I get failed deployment. Any idea how to fix it? Currently the only way it works for me is through the Python client! However, I have to do the conversions myself if I am going after huggingface models.

Downloading: 100% |========================================| all-MiniLM-L12-v2.zip
[2023-07-31T10:29:32,655][ERROR][o.o.m.m.MLModelManager ] [LAPTOP-C0J5I3L2] Failed to index chunk file
java.security.PrivilegedActionException: null
at java.security.AccessController.doPrivileged(Native Method) ~[?:?]
at org.opensearch.ml.engine.ModelHelper.downloadAndSplit(ModelHelper.java:197) [opensearch-ml-algorithms-2.9.0.0.jar:?]
at org.opensearch.ml.model.MLModelManager.registerModel(MLModelManager.java:526) [opensearch-ml-2.9.0.0.jar:2.9.0.0]
at org.opensearch.ml.model.MLModelManager.lambda$registerModelFromUrl$19(MLModelManager.java:498) [opensearch-ml-2.9.0.0.jar:2.9.0.0]
at org.opensearch.action.ActionListener$1.onResponse(ActionListener.java:80) [opensearch-2.9.0.jar:2.9.0]
at org.opensearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:78) [opensearch-2.9.0.jar:2.9.0]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908) [opensearch-2.9.0.jar:2.9.0]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.9.0.jar:2.9.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: java.nio.file.NoSuchFileException: D:\my-files\opensearch-releases\opensearch-2.9.0\data\ml_cache\models_cache\register\NcvUqokBdU6kBA_MJEKP\1\huggingface\sentence-transformers\all-MiniLM-L12-v2.zip

ylwu · August 1, 2023, 7:06am

I tried this on 2.9 , it works

POST /_plugins/_ml/models/_register
{
  "name": "huggingface/sentence-transformers/all-MiniLM-L12-v2",
  "version": "1.0.1",
  "model_format": "TORCH_SCRIPT"
}

This API call will return task id, sample response

{
  "task_id" : "lcPmr4kB4eSCtCCDmCD8", 
  "status" : "CREATED"
}

Then use get task API to retrieve task information, find model id in the response

GET _plugins/_ml/tasks/lcPmr4kB4eSCtCCDmCD8

Sample response

{
  "model_id": "fwDmr4kBotjijrTimerP",
  "task_type": "REGISTER_MODEL",
  "function_name": "TEXT_EMBEDDING",
  "state": "COMPLETED",
  "worker_node": [
    "yYG_ae4URcaSo3k59_IsFQ"
  ],
  "create_time": 1690873272569,
  "last_update_time": 1690873280963,
  "is_async": true
}

Then use deploy API to deploy model fwDmr4kBotjijrTimerP

POST _plugins/_ml/models/fwDmr4kBotjijrTimerP/_deploy

This API will return task id. Similarly , use get task API to retrieve task info, wait for the task status changed to COMPLETED. Then call predict API

POST _plugins/_ml/models/fwDmr4kBotjijrTimerP/_predict
{
  "text_docs": ["hello world"],
  "return_number": true,
  "target_response": ["sentence_embedding"]
}

One possible reason is your network latency is too high and ml-commons cron job (runs every 10 seconds by default) cleaned up your local cache file before it’s fully downloaded. Suggest to tune the cron job internal to a larger value, for example the below API will increase it to 600 seconds.

PUT /_cluster/settings
{
	"persistent": {
		"plugins.ml_commons.sync_up_job_interval_in_seconds": 600
	}
}

system · September 30, 2023, 7:07am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to upload an ML model in offline mode on OpenSearch 2.17? Machine Learning	10	49	July 17, 2025
How can we deploy ML model (.zip) to nodes locally, not via SSL or the firewall OpenSearch discuss , troubleshoot , configure , install	8	486	August 19, 2024
Could not upload model to opensearch cluster Machine Learning	2	1004	August 8, 2023
Offline deployment pretrained of models Machine Learning	3	454	September 23, 2024
"error": "unable to find valid certification path to requested target", OpenSearch troubleshoot	32	2120	October 11, 2024

OpenSearch 2.9 ML Framework Model Upload Not Working

Related topics