Model weights for sparse encoders

Hi all,

Could you please provide us with sparse encoders weights , I remember in the related issue it was stated that the weights will be released. Can you also provide links to existing models we can try with the sparse encoding feature?

Thanks

We apologize for the late delivery, please wait for some extra days util we deal all the internal open-source release clearance. The weights are on their way.

Is there any other models that we can try out for now? I tried this Intel/bert-base-uncased-sparse-90-unstructured-pruneofa · Hugging Face but I could not upload it and got the below response

POST /_plugins/_ml/models/_register
{
“name”: “Intel/bert-large-uncased-sparse-90-unstructured-pruneofa”,
“version”: “1.0.1”,
“model_group_id”: “XL4wQYsBDV2us3MwXfy9”,
“model_format”: “TORCH_SCRIPT”
}

{
“task_type”: “REGISTER_MODEL”,
“function_name”: “TEXT_EMBEDDING”,
“state”: “FAILED”,
“worker_node”: [
“H-LKxNKoT52yZttSBnoLdA”
],
“create_time”: 1697608915265,
“last_update_time”: 1697608916467,
“error”: “This model is not in the pre-trained model list, please check your parameters.”,
“is_async”: true
}

Thanks for your attention on sparse encoding! We allow customers to use their own model. But we need you to obey the request body requirement of registering model. You need to give the url of your torch script artifact zip file. Also, we will need to offer the hash value of the artifact. Inside your zip file, we will need to contain the torch script pt file and tokenizer json file. I see your model name and I think it is currently a pytorch bin file. So you need to convert it to torch script and register your own model like:
{
“name”: “amazon/neural-sparse/opensearch-neural-sparse-encoding-v1”,
“version”: “1.0.0”,
“description”: “This is a neural sparse encoding model: It transfers text into sparse vector, and then extract nonzero index and value to entry and weights. It serves in both ingestion and search”,
“model_format”: “TORCH_SCRIPT”,
“function_name”: “SPARSE_ENCODING”,
“model_content_hash_value”: “9a41adb6c13cf49a7e3eff91aef62ed5035487a6eca99c996156d25be2800a9a”,
“url”: “https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1/1.0.0/torch_script/opensearch-neural-sparse-encoding-doc-v1-1.0.0-torch_script.zip
}
This is one of the pretrained model. We provide bi-encoder model, doc model and tokenizer model.
We already release the model so you can use it. We will update our documentation of pretrained model and how to upload your own sparse encoding model. Thanks.

Thank you so much. It works now. Do you have any guide to finetune this model?

I got nice results, however indexing was quite slow. is there anyway to make it faster?

Currently we will not release any guidance about finetuning. We may release it in the future.

What do you mean about indexing? Do you use ingestion pipeline?

I did use a pipeline as per the documentation.

PUT /_ingest/pipeline/got-pipeline
{
“description”: “An sparse encoding ingest pipeline”,
“processors”: [
{
“sparse_encoding”: {
“model_id”: “DtpBSYsBgoSEiZHXptMY”,
“field_map”: {
“content”: “embedding”
}
}
}
]
}

PUT got_sparse
{
“mappings”: {
“properties”: {
“content”: {
“type”: “text”
},
“embedding”: {
“type”: “rank_features”
}
}
},
“settings”:
{
“index”: {
“replication”: {
“type”: “DOCUMENT”
},
“number_of_shards”: “1”,
“default_pipeline”: “got-pipeline”,
“number_of_replicas”: “1”
}
}

}

PUT got_sparse/_doc/1
{
“content”: “”“kills Theon Greyjoy, and prepares to strike down Bran. However, the Night King is ambushed and killed by Arya Stark with the Valyrian steel dagger that Bran had previously given her (“The Spoils of War”), which causes both him and the other White Walkers to shatter and results in the complete obliteration of the Army of the Dead.”“”
}

The sometimes take over 300 seconds and sometimes 500 seconds. The I tried the same with a dense vector (using sentence-transformers/msmarco-distilbert-cos-v5 model) as shown below.

PUT got_index_dense_v512_2/_doc/1
{
“content”: “”“kills Theon Greyjoy, and prepares to strike down Bran. However, the Night King is ambushed and killed by Arya Stark with the Valyrian steel dagger that Bran had previously given her (“The Spoils of War”), which causes both him and the other White Walkers to shatter and results in the complete obliteration of the Army of the Dead.”“”
}

The above took 160 seconds!

Also one other thing I noticed is that if I try direct inference using ml-common API then the sparse model gives me results faster! The below is using the sparse model and it returned results in 378.

POST /_plugins/_ml/models/DtpBSYsBgoSEiZHXptMY/_predict
{
“text_docs”:[ “”“kills Theon Greyjoy, and prepares to strike down Bran. However, the Night King is ambushed and killed by Arya Stark with the Valyrian steel dagger that Bran had previously given her (“The Spoils of War”), which causes both him and the other White Walkers to shatter and results in the complete obliteration of the Army of the Dead.”“”]
}

The I tried the same with the sbert model and it returned in 444 seconds!

POST /_plugins/_ml/models/Ado3SYsBgoSEiZHXidNO/_predict
{
“text_docs”:[ “”“kills Theon Greyjoy, and prepares to strike down Bran. However, the Night King is ambushed and killed by Arya Stark with the Valyrian steel dagger that Bran had previously given her (“The Spoils of War”), which causes both him and the other White Walkers to shatter and results in the complete obliteration of the Army of the Dead.”“”]
}

Am I doing something wrong above? Is there anything I can do to make it go faster?

Note that this is currently running on a single CPU-only machine.

Can you provide some detail numbers such as latency or throughput?

Since sparse encoder will conduct deep language model inference at the ingestion time, the computation cost will be high, usually we employ extra ML nodes for such computation.

BTW, would you share your hardware configuration?

Our encoding model is twice larger as bert base model. So it would be larger latency compared with distilbert-cos-v5 model. But it’s unusual to take so much time to inference.

What is the max text length can the OpenSearch sparse model handle? I know for instance that bert based models can handle 512 tokens at a time. The remaining tokens will be ignored. Is it the case for this model?

Yes. We will truncate the sentence into 512 tokens.

Is that something automatically done behind the scene. I mean if I have long text, say 3000 tokens, then what would the sparse vector represent in this case? Would it represent the the whole 3000 tokens or only the first 512 and the remaining are ignored?

Yes. Only the first 512 and the remaining are ignored. We do it inside our model.

The current test I did was on a laptop but I am planning to test it on a bigger cluster.

I was expecting some slowness compared to a normal Lucene index but the difference in ingestion time is too big. Lucene is done in seconds while sparse fields take several minutes for the same data.

I understand your scenario. Sparse encoding will still need to inference data using model like text embedding so it costs lots of time. But if you chose the option: amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1 model in ingestion and amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1 model in search, you can get accuracy as semantic search using text embedding while small latency like Lucene. Please see model list here Pretrained models - OpenSearch documentation

The amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1(https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1/1.0.0/torch_script/opensearch-neural-sparse-tokenizer-v1-1.0.0.zip) failed the deployment. I checked the zip file, it doesn’t contain the pt file.

Can you please check and upload a correct model?

That’s because tokenizer only contains json file for tokenizing. I think you may use the wrong function name. You should use “SPARSE_TOKENIZE” when using tokenizer model.

you are right. I used the wrong function name. Thanks.