OpenSearch 2.16.0 introduced InBuilt Scalar Quantization which stores and retrieve quantized vectors automatically. However, I spotted the below mentioned in the documentation. I was expecting this feature to shrink disk space at the cost of small decrease of recall, but according to the documentation it would increase the disk utilization! If that is the case, then what is the use of this feature? And is there a way to keep only the quantized vector and ignore the original?
k-NN vector quantization - OpenSearch Documentation
“Quantization can decrease the memory footprint by a factor of 4 in exchange for some loss in recall. Additionally, quantization slightly increases disk usage because it requires storing both the raw input vectors and the quantized vectors.”
@asfoorial Thanks for your question. We need to store the raw vectors along with quantized vectors because during forcemerge or if the distribution of the input data to be ingested changes over the period of time, the quantiles will change and might need to recompute the quantiles and requantize the data (using raw vectors).
The motive of this feature is to reduce memory utilization and quantize the data dynamically, unlike the Lucene byte vector feature where users needs to quantize the data into byte range before ingesting into the index which has become a bottleneck if the min max values or distribution of their input float data changes over the period of time.
Also, memory is expensive compared to disk. But, if you are concerned about disk usage then this feature comes with a tradeoff.
My search can combine both. But let us assume that I am only using semantic search only, is there a way to avoid storing the original vector and keep only the quantized version, just like it is done in faiss scalar quantization? I noticed around a 25% shrink in storage size there.
Then you should quantize the vectors into byte sized vectors before ingesting into the index and use the lucene byte vector feature. This helps to reduce both memory and disk usage besides reducing latencies at a cost of drop in recall.
I think @asfoorial can reduce disk usage via Lucene’s byte_vector, but since cohere api needs external internet connection to https://api.cohere.ai/v1/embed, the offline environment cannot deal with this method.
@navtat
From the version 2.16,
the k-NN plugin supports built-in scalar quantization for the Lucene engine. Unlike the Lucene byte vector, which requires you to quantize vectors before ingesting the documents, the Lucene scalar quantizer quantizes input vectors in OpenSearch during ingestion.
When defining an index for vector search, you can omit _source of vector field using “excludes” and “recovery_source_excludes” settings. Since Lucene-based engines(OpenSearch, Elasticsearch) store original data as ‘_source’ field, we can easily reindex existed indices.
If it’s okay to give up this convenience and reduce disk space, you can use the settings of index mapping.
(WARN : Disabling the _recovery_source may lead to failures during peer-to-peer recovery)
Thanks for the hint. It works even better this way and reduces almost 50% of the storage size as shown in the experiment below! I also noticed that having scalar quantization will result in little additional storage overhead (~6%).
However, is any implication and risks to this? Is this any functionality to fail other than calling _reindex API?
Yes, as @yeonghyeonKo suggested you can either disable _source or exclude source fields to reduce disk usage. But, you can’t reindex the data. However, once this new feature is added you can reindex the data even after disabling the source. Feel free to +1 if you are interested in it. Thanks!