Vectorizing big chunk of data returns errors

rathankalluri · February 8, 2024, 7:45am

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

Opensearch 2.11.0 is the version, and currently deployed in my Windows 11 machine, not a docket deployment.

Describe the issue:
I have successfully, deployed the ML model using the Neural Plugin tutorial

But when ingesting the data, I get this error :
The size of tensor a (2792) must match the size of tensor b (512) at non-singleton dimension 1
The model i used is the huggingface/sentence-transformers/msmarco-distilbert-base-tas-b, from the documentation.

The one thing i could understand is the data which i am trying to upload to vector is a huge document is around 13000 characters.

Please let me know how can i achieve vectorizing this huge data.

dhrubo · February 9, 2024, 11:44pm

Hi @rathankalluri, I assume for the huggingface/sentence-transformers/msmarco-distilbert-base-tas-b model you are using version 1.0.1 which does not have truncation feature. Could you please use version 1.0.2?

Please let me know if you are still facing the issue after using 1.0.2?

rathankalluri · February 12, 2024, 4:10am

Thanks a lot @dhrubo … It worked!

system · April 12, 2024, 4:11am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Could not upload model to opensearch cluster Machine Learning	2	979	August 8, 2023
[Feedback] Machine Learning Model Serving Framework - Experimental Release General Feedback releases	48	2941	July 12, 2023
Uploading a sentence transformer model of medical domain to OpenSearch Machine Learning troubleshoot	2	398	January 29, 2024
Help Needed: Fine-Tuning and Deploying a Model into OpenSearch Machine Learning discuss , troubleshoot , configure , install	2	244	August 12, 2024
[Feedback] Neural Search plugin - experimental release General Feedback releases	42	3558	July 18, 2023

Vectorizing big chunk of data returns errors

Related topics