Describe the issue:
I wanted to know if there is a way to reduce the on-disk storage of Vector embeddings by converting them to byte/16 bit vector when i am getting the vector embedding using the pre-trained models provided by the hugging face.
I tried applying the Faiss sq16 scalar quantization method but it did not convert the fp_32 vectors to fp_16 vectors hence the size of on-disk remained the same.
@tejashu the Faiss sqfp16 Quantization technique serializes the fp32 into fp16 internally which helps to reduce the memory requirements upto 50%. There will be also a minor drop in storage consumption but not 50% like the drop in memory because the source file(original document) ingested will be saved as a .fdt segment file which will consume more or less same amount of disk space for both fp32 and other quantization techniques. For instance, based on my previous experiments comparing fp32 and sqfp16 there was appx. 17% drop in disk usage for a 100M 768 dimension dataset.
If you want to further reduce your disk usage, try to quantize fp32 vectors into byte sized vectors and ingest them using lucene byte vector feature. But, it doesn’t reduce it to 75% as it depends on the size of source file.
Note - The 50% or 75% savings we specify in the documentation or blogs refers to the reduction in memory consumption and not the storage.
I wanted to know if there is a way to do the hybrid search with byte vectors indexed in opensearch?
Also could you please let know why the byte quantization is not handled internally in the opensearch itself as there is support for byte vectors.
Yes, AFAIK you should be able to do hybrid search on an index with byte vectors.
OpenSearch Vectorsearch supports online quantization which accepts fp32 vectors and quantizes them into byte vectors(using lucene engine for now and will be supported with Faiss from 3.0.0) but this consumes extra disk space as it needs to store both fp32 vectors and quantized byte vectors. I didn’t suggest this earlier as you are concerned about disk space.
Could you please give an example on how the query can be formed for hybrid search with byte vector or if any documentation is available it would be helpful,
I wanted to to store only Byte vectors and not both fp32 and byte.
I don’t think there will be any difference in the query mapping for byte vectors. This is the documentation for hybrid search - Hybrid search - OpenSearch Documentation