Lucene InBuilt Scalar Quantization

asfoorial · August 9, 2024, 7:05am

OpenSearch 2.16.0 introduced InBuilt Scalar Quantization which stores and retrieve quantized vectors automatically. However, I spotted the below mentioned in the documentation. I was expecting this feature to shrink disk space at the cost of small decrease of recall, but according to the documentation it would increase the disk utilization! If that is the case, then what is the use of this feature? And is there a way to keep only the quantized vector and ignore the original?

k-NN vector quantization - OpenSearch Documentation
“Quantization can decrease the memory footprint by a factor of 4 in exchange for some loss in recall. Additionally, quantization slightly increases disk usage because it requires storing both the raw input vectors and the quantized vectors.”

navtat · August 9, 2024, 3:55pm

@asfoorial Thanks for your question. We need to store the raw vectors along with quantized vectors because during forcemerge or if the distribution of the input data to be ingested changes over the period of time, the quantiles will change and might need to recompute the quantiles and requantize the data (using raw vectors).

The motive of this feature is to reduce memory utilization and quantize the data dynamically, unlike the Lucene byte vector feature where users needs to quantize the data into byte range before ingesting into the index which has become a bottleneck if the min max values or distribution of their input float data changes over the period of time.

Also, memory is expensive compared to disk. But, if you are concerned about disk usage then this feature comes with a tradeoff.

yeonghyeonKo · August 12, 2024, 11:41am

Is your goal just doing Semantic Search, not Hybrid Search which is a combination of Semantic and Lexical(aka keyword) search?

asfoorial · August 12, 2024, 12:28pm

My search can combine both. But let us assume that I am only using semantic search only, is there a way to avoid storing the original vector and keep only the quantized version, just like it is done in faiss scalar quantization? I noticed around a 25% shrink in storage size there.

navtat · August 12, 2024, 4:14pm

Then you should quantize the vectors into byte sized vectors before ingesting into the index and use the lucene byte vector feature. This helps to reduce both memory and disk usage besides reducing latencies at a cost of drop in recall.

asfoorial · August 12, 2024, 4:54pm

The problem with external quantization is that I won’t be able to use neural-search query.

Also, is there a way to perform quantization through ml-common?

navtat · August 12, 2024, 5:27pm

Yes, there is an option through ml-commons using cohere-v3 model. Please take a look at this tutorial
ml-commons/docs/tutorials/semantic_search/semantic_search_with_byte_quantized_vector.md at main · opensearch-project/ml-commons · GitHub?

asfoorial · August 12, 2024, 6:59pm

Unfortunately that requires internet connectivity. I need to do that offline. But thanks for sharing this interesting tutorial, worth exploring.

Could you please confirm if neural-search can work with any byte vector option?

navtat · August 12, 2024, 10:31pm

Lucene byte vector will work with neural-search. Can you pls explain what are your limitations to use it with neural-search ?

yeonghyeonKo · August 13, 2024, 1:06am

I think @asfoorial can reduce disk usage via Lucene’s byte_vector, but since cohere api needs external internet connection to https://api.cohere.ai/v1/embed, the offline environment cannot deal with this method.

yeonghyeonKo · August 16, 2024, 6:12am

@navtat
From the version 2.16,
the k-NN plugin supports built-in scalar quantization for the Lucene engine. Unlike the Lucene byte vector, which requires you to quantize vectors before ingesting the documents, the Lucene scalar quantizer quantizes input vectors in OpenSearch during ingestion.

yeonghyeonKo · August 16, 2024, 6:20am

When defining an index for vector search, you can omit _source of vector field using “excludes” and “recovery_source_excludes” settings. Since Lucene-based engines(OpenSearch, Elasticsearch) store original data as ‘_source’ field, we can easily reindex existed indices.

If it’s okay to give up this convenience and reduce disk space, you can use the settings of index mapping.

(WARN : Disabling the _recovery_source may lead to failures during peer-to-peer recovery)

  PUT /<index_name>/_mappings
  {
      "_source": {
        "excludes": ["location"],
        "recovery_source_excludes": ["location"]
      },
      "properties": {
          "location": {
              "type": "knn_vector",
              "dimension": 2,
              "method": {
                  "name": "hnsw",
                  "space_type": "l2",
                  "engine": "faiss"
              }
          }
      }
  }

asfoorial · August 16, 2024, 3:21pm

Thanks for the hint. It works even better this way and reduces almost 50% of the storage size as shown in the experiment below! I also noticed that having scalar quantization will result in little additional storage overhead (~6%).

However, is any implication and risks to this? Is this any functionality to fail other than calling _reindex API?

yeonghyeonKo · August 16, 2024, 3:42pm

I don’t think so. It would be okay to index data without _source if data have no change.

navtat · August 16, 2024, 4:46pm

Yes, as @yeonghyeonKo suggested you can either disable _source or exclude source fields to reduce disk usage. But, you can’t reindex the data. However, once this new feature is added you can reindex the data even after disabling the source. Feel free to +1 if you are interested in it. Thanks!

github.com/opensearch-project/k-NN

Reuse KNNVectorFieldData for reduce disk usage

opensearch-project:main ← luyuncheng:DVFieldData

opened 04:00PM - 20 Mar 24 UTC

luyuncheng

+1121 -1

### Description in some scenarios, we want to `reduce the disk usage` and `io t…hroughput` for the source field. so, we would excludes knn fields in mapping which do not store the source like( this would make knn field can not be retrieve and rebuild) ``` "mappings": { "_source": { "excludes": [ "target_field1", "target_field2", ] } } ``` so I propose to use doc_values field for the vector fields. like: ``` POST some_index/_search { "docvalue_fields": [ "vector_field1", "vector_field2", ], "_source": false }' ``` ### Proposal 1. Rewrite `KNNVectorDVLeafFieldData` get data from docvalues i rewrite `KNNVectorDVLeafFieldData` and make `KNN80BinaryDocValues` can return the specific knn `docvalue_fields` like: (`vector_field1` is knn field type) ``` "hits":[{"_index":"test","_id":"1","_score":1.0,"fields":{"vector_field1":["1.5","2.5"]}},{"_index":"test","_id":"2","_score":1.0,"fields":{"vector_field1":["2.5","1.5"]}}] ``` **optimize result:** 1m SIFT dataset, 1 shard, **with source store: 1389MB** **without source store: 1055MB(-24%)** for the continues dive in to `knndocvalues` fields, I think when use faiss engine, we can use `reconstruct_n` interface to retrieve the specific doc values and save the disk usage for `BinaryDocValuesFormat`. or like this issue comments for redesign a `KnnVectorsFormat` 2. composite vector field to _source I added `KNNFetchSubPhase` and add a processor like `FetchSourcePhase#FetchSubPhaseProcessor` to combine the `docvalue_fields` into `_source` something like `synthetic` logic ### Issues Resolved #1087 #1572 - 1st I made `KNNVectorDVLeafFieldData` can return the vectorDocValue fields like script do. - 2nd I write a KNNFetchSubPhase class which add a process in fetch phase, and it can fulfill the `_source` with 1st step docValues fields response. and this way something like `synthetic source` but need explicit add value from search body like `docvalue_fields` ### Check List - [ ] New functionality includes testing. - [ ] All tests pass - [ ] New functionality has been documented. - [ ] New functionality has javadoc added - [ ] Commits are signed as per the DCO using --signoff By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check [here](https://github.com/opensearch-project/k-NN/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

asfoorial · August 16, 2024, 5:08pm

Thanks, I would certainly like such feature.

system · October 15, 2024, 5:09pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Byte size vector with neural search on the fly k-NN	3	342	December 24, 2023
Reducing the Disk space of Indexes for Vector embeddings through pretrained models k-NN discuss , configure	6	202	April 18, 2025
How does `osknnqstate` file work for reducing memory when knn searching? k-NN discuss , troubleshoot , configure , index-management	1	28	January 26, 2025
Is the k-NN plugin based on Apache OpenNLP k-NN	13	695	May 8, 2023
Knn_vector takes too much disk space k-NN	4	975	November 15, 2024

Lucene InBuilt Scalar Quantization

Related topics