Any way of reducing the disk space for embbeding vectors in OpenSearch 3.0

Is there any way we can reduce the disk space of indices stored for embedding vectors?

Our normal index of 100mb is getting shot up to 1gb if embedded with one field of description.

This is 10x more disk space needed. So is there any solution to lessen the disk space while embedding?

you can try this setting in documentation

Hi @tejashu

you should enable derived source feature. Here is the recent blog: https://opensearch.org/blog/do-more-with-less-save-up-to-3x-on-storage-with-derived-vector-source/

@vinod0x01 Removing vectors from _source limits capabilties like re-indexing, updates etc. To overcome those limitations with 2.19(as experimental) and 3.0 as GA we added derived source feature for Vectors. Please take a look at that feature: https://opensearch.org/blog/do-more-with-less-save-up-to-3x-on-storage-with-derived-vector-source/

Thanks for the blog.
@Navneet
Is it possible to only store byte vectors so that the storage size is reduced more instead of the FP32 vectors.
Does OpenSearch provide such a configuration?

Thanks in advance

Hi @Navneet thanks for sharing, this is a very helpful, I will try this in my indexing and lets see the performance

I am sending documents in 10k batches.
For the second 10k it throws an error

Failed to Validate Access for ModelId WLfEgZcBwC17uYVSzQPT
org.opensearch.core.concurrency.OpenSearchRejectedExecutionException: rejected execution of org.opensearch.ml.task.MLPredictTaskRunner$$Lambda/0x00007ff0bce46528@3c594a35 on OpenSearchThreadPoolExecutor[name = /opensearch_ml_predict, queue capacity = 10000, org.opensearch.common.util.concurrent.OpenSearchThreadPoolExecutor@10408ea[Running, pool size = 12, active threads = 12, queued tasks = 10000, completed tasks = 90419]]

Is there any setting to increase the queue ?

@ylwu is there a setting to increase the threadpool queue size? As far as I remember there is none.

Ref: ml-commons/plugin/src/main/java/org/opensearch/ml/plugin/MachineLearningPlugin.java at main · opensearch-project/ml-commons · GitHub

No there is no settings to increase threadpool queue size in ml-commons.

@tejashu these are the all settings ml-commons have: ML Commons cluster settings - OpenSearch Documentation

If you want any more settings to be available, please feel free to cut an issue : GitHub · Where software is built

Hello ,

Is there a mechanism to show only byte vectors or Opensearch has any api that allows to get the byte quantized vectors from FP32 vectors as this can still reduce the disk storage.

Thanks in advance

I am bit confused on the question I will try to ans based on best of my understanding. If you still have questions feel free to put them.

OpenSearch do support different data types for vectors which include fp32, byte and binary. So if you already have vectors within this range I would suggest directly using the data_type field while creating the index mappings. This will reduce the memory and disk footprint since vectors are already present as byte or binary.

If your vectors are in fp32 but you want to quantize then to byte, binary or fp16. OpenSearch provides different quantization technique to do that. This also reduce the memory footprint and disk but not same as the first option I provided. Since we store both fp32 vectors and quantized vectors on disk. During search we are using only quantized vectors.

Now in quantization if you are asking as a user you can view what were the values of quantized vectors via some API then ans is no, you cannot do that.

I was asking since the pretrained models returns and stores only FP32 vectors. Is there any way we can capture this FP32 vectors and quantize to byte/binary and store them.
Would this cause accuracy of the data to be lost during search time?

@tejashu if your index mappings has quantization in it, then after the model has converted the text to fp32 vector system will convert it.

If you want to do quantization of your own using some custom code in between, you can use Model PostProcessor. Here is a reference: ml-commons/docs/tutorials/semantic_search/semantic_search_with_byte_quantized_vector.md at main · opensearch-project/ml-commons · GitHub

would this reduce the disk storage?
or would it index both the FP32 and byte vector?

Also the _source excludes if set in mappings is increasing the storage size by 2X.
Is this behavior expected or is this a bug?

Ex:

"mappings": {
  "_source": {
    "excludes": [
      "Description*"
    ]
  },
  "properties": {

increased the storage size for 9662 docs from 22mB to 53.7mB.

Could you please let me know your feddback,
Thanks in advance