Reducing the Disk space of Indexes for Vector embeddings through pretrained models

tejashu · February 10, 2025, 3:52am

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch 2.18

Describe the issue:
I wanted to know if there is a way to reduce the on-disk storage of Vector embeddings by converting them to byte/16 bit vector when i am getting the vector embedding using the pre-trained models provided by the hugging face.
I tried applying the Faiss sq16 scalar quantization method but it did not convert the fp_32 vectors to fp_16 vectors hence the size of on-disk remained the same.

Could you please help in configuring this ?

navtat · February 10, 2025, 6:32pm

@tejashu the Faiss sqfp16 Quantization technique serializes the fp32 into fp16 internally which helps to reduce the memory requirements upto 50%. There will be also a minor drop in storage consumption but not 50% like the drop in memory because the source file(original document) ingested will be saved as a .fdt segment file which will consume more or less same amount of disk space for both fp32 and other quantization techniques. For instance, based on my previous experiments comparing fp32 and sqfp16 there was appx. 17% drop in disk usage for a 100M 768 dimension dataset.

If you want to further reduce your disk usage, try to quantize fp32 vectors into byte sized vectors and ingest them using lucene byte vector feature. But, it doesn’t reduce it to 75% as it depends on the size of source file.

Note - The 50% or 75% savings we specify in the documentation or blogs refers to the reduction in memory consumption and not the storage.

tejashu · February 14, 2025, 5:58am

@navtat Thanks for the response.

I wanted to know if there is a way to do the hybrid search with byte vectors indexed in opensearch?
Also could you please let know why the byte quantization is not handled internally in the opensearch itself as there is support for byte vectors.

Thanks in advance,
Tejas

navtat · February 14, 2025, 7:51pm

Yes, AFAIK you should be able to do hybrid search on an index with byte vectors.

OpenSearch Vectorsearch supports online quantization which accepts fp32 vectors and quantizes them into byte vectors(using lucene engine for now and will be supported with Faiss from 3.0.0) but this consumes extra disk space as it needs to store both fp32 vectors and quantized byte vectors. I didn’t suggest this earlier as you are concerned about disk space.

tejashu · February 17, 2025, 3:02am

Could you please give an example on how the query can be formed for hybrid search with byte vector or if any documentation is available it would be helpful,

I wanted to to store only Byte vectors and not both fp32 and byte.

Thanks in advance

navtat · February 17, 2025, 4:53am

I don’t think there will be any difference in the query mapping for byte vectors. This is the documentation for hybrid search - Hybrid search - OpenSearch Documentation

If you don’t want to store both fp32 and byte vectors on disk then use the byte vectors - k-NN vector - OpenSearch Documentation

system · April 18, 2025, 4:54am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Any way of reducing the disk space for embbeding vectors in OpenSearch 3.0 Machine Learning releases , discuss	13	53	July 3, 2025
Lucene InBuilt Scalar Quantization k-NN	16	102	October 15, 2024
Byte size vector with neural search on the fly k-NN	3	334	December 24, 2023
How does `osknnqstate` file work for reducing memory when knn searching? k-NN discuss , troubleshoot , configure , index-management	1	18	January 26, 2025
Knn_vector takes too much disk space k-NN	4	932	November 15, 2024

Reducing the Disk space of Indexes for Vector embeddings through pretrained models

Related topics