Reducing embedding size in ml-commons

asfoorial · March 19, 2023, 7:25pm

Hi all,

I was wondering if there is an option to reduce embedding size in ml-commons. Documents in most case contain many sentences and pages which lead to huge index size if a 700 or even 300 embedding sizes are used.

An example was shared in the community earlier showing to deploy/serve an sbert model with a default size on OpenSearch to serve text embeddings. I was wondering if ml-common has (or will have) options to reduce the default size using techniques such as PCA, t-SNE (t-SNE-Java), LDA, Autoencoders etc.

mshyani · March 21, 2023, 2:00am

Hello!

The index size should not scale with the size of the documents, since only 1 vector is stored per document. Recall that we apply a pooling layer at the end which results into a single vector. For a fp32 bit precision, a 738 dimensional vector will lead to around (738 * 4 bytes =) 3KB per document.

Nevertheless PCA is a great technique and we can definitely add it to our near future goals. Thank you for the suggestion. Although I imagine that finetuning a smaller model with fewer embedding dimensions might be better than using a large model followed by PCA, but I am not sure.

asfoorial · March 21, 2023, 2:45am

One pooled vector is not ideal for a multi-pages large document. Suppose that you are indexing books then you will need at least a vector per page, even though I would go paragraph level.

As for fine-tuning, the cost of it is outside the operational scope since it is done offline.

mshyani · March 21, 2023, 3:11am

True, for very long documents one can benefit from multiple vectors.

I’d still recommend using a fine-tuned small model or have a fine-tuned linear layer (i.e, a simple encoding layer) on top of a large model that projects to smaller dimensions.

If we use something like PCA to reduce the dimensions, it is not clear whether relevant (queries, passages) will stay close together in the smaller dimensional space after projection. t-SNE is better for such cases (since it would preserve the structure) but t-SNE cannot be used at runtime without more additional effort.

asfoorial · March 21, 2023, 4:54am

I actually tried sbert with PCA as 128 and the result was pretty close to the 738 model. 64 wasn’t that good though. I used GOT as the dataset.

mshyani · March 21, 2023, 6:03am

Thanks for sharing. I have tried dimensionality reduction (since contextualized representations often reside in a small-dimensional cone How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings - ACL Anthology) and found not so bad results but I have not tested it extensively and so cant vouch for their quality!

asfoorial · March 21, 2023, 9:22am

Interesting work. Thanks for sharing. I think that even if the results are not as good as the default model, it is still good to have as a feature in ml-common to serve a reduced model. It can be very useful in the books scenario when you are searching within the book and not only against abstracts.

dylan · March 28, 2023, 10:34pm

@asfoorial, have you tried enabling pq on faiss indexes: k-NN Index - OpenSearch documentation? PQ can help reduce memory requirements by trading-off accuracy. We are also actively developing quantization capabilities for 8-bit vector encodings.

asfoorial · March 30, 2023, 1:50pm

@dylan thanks. PQ is great but it only applies on memory but not on disk. Looking forward to see and test the future quantization developments.

system · May 29, 2023, 1:51pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Any way of reducing the disk space for embbeding vectors in OpenSearch 3.0 Machine Learning releases , discuss	7	27	June 19, 2025
Support for MRL Embedding Models Machine Learning feature-request	3	361	November 13, 2024
Vectorizing big chunk of data returns errors Machine Learning	3	280	April 12, 2024
Inconsistent similarity scores using L2 space type and larger embedding model OpenSearch troubleshoot	0	148	October 17, 2024
Performance and scaling of ML models and dense vector data Machine Learning discuss	6	715	May 12, 2023

Reducing embedding size in ml-commons

Related topics