Any way of reducing the disk space for embbeding vectors in OpenSearch 3.0

tejashu · June 17, 2025, 4:06am

Is there any way we can reduce the disk space of indices stored for embedding vectors?

Our normal index of 100mb is getting shot up to 1gb if embedded with one field of description.

This is 10x more disk space needed. So is there any solution to lessen the disk space while embedding?

vinod0x01 · June 17, 2025, 6:58am

you can try this setting in documentation

Navneet · June 17, 2025, 7:24am

you should enable derived source feature. Here is the recent blog: https://opensearch.org/blog/do-more-with-less-save-up-to-3x-on-storage-with-derived-vector-source/

@vinod0x01 Removing vectors from _source limits capabilties like re-indexing, updates etc. To overcome those limitations with 2.19(as experimental) and 3.0 as GA we added derived source feature for Vectors. Please take a look at that feature: https://opensearch.org/blog/do-more-with-less-save-up-to-3x-on-storage-with-derived-vector-source/

tejashu · June 18, 2025, 3:50am

Thanks for the blog.
@Navneet
Is it possible to only store byte vectors so that the storage size is reduced more instead of the FP32 vectors.
Does OpenSearch provide such a configuration?

Thanks in advance

vinod0x01 · June 18, 2025, 5:16am

Hi @Navneet thanks for sharing, this is a very helpful, I will try this in my indexing and lets see the performance

tejashu · June 18, 2025, 7:14am

I am sending documents in 10k batches.
For the second 10k it throws an error

Failed to Validate Access for ModelId WLfEgZcBwC17uYVSzQPT
org.opensearch.core.concurrency.OpenSearchRejectedExecutionException: rejected execution of org.opensearch.ml.task.MLPredictTaskRunner$$Lambda/0x00007ff0bce46528@3c594a35 on OpenSearchThreadPoolExecutor[name = /opensearch_ml_predict, queue capacity = 10000, org.opensearch.common.util.concurrent.OpenSearchThreadPoolExecutor@10408ea[Running, pool size = 12, active threads = 12, queued tasks = 10000, completed tasks = 90419]]

Is there any setting to increase the queue ?

Navneet · June 19, 2025, 6:04am

@ylwu is there a setting to increase the threadpool queue size? As far as I remember there is none.

Ref: ml-commons/plugin/src/main/java/org/opensearch/ml/plugin/MachineLearningPlugin.java at main · opensearch-project/ml-commons · GitHub

dhrubo · June 19, 2025, 4:17pm

No there is no settings to increase threadpool queue size in ml-commons.

@tejashu these are the all settings ml-commons have: ML Commons cluster settings - OpenSearch Documentation

If you want any more settings to be available, please feel free to cut an issue : GitHub · Where software is built

tejashu · July 2, 2025, 6:24am

Hello ,

Is there a mechanism to show only byte vectors or Opensearch has any api that allows to get the byte quantized vectors from FP32 vectors as this can still reduce the disk storage.

Thanks in advance

Navneet · July 2, 2025, 5:08pm

I am bit confused on the question I will try to ans based on best of my understanding. If you still have questions feel free to put them.

OpenSearch do support different data types for vectors which include fp32, byte and binary. So if you already have vectors within this range I would suggest directly using the data_type field while creating the index mappings. This will reduce the memory and disk footprint since vectors are already present as byte or binary.

If your vectors are in fp32 but you want to quantize then to byte, binary or fp16. OpenSearch provides different quantization technique to do that. This also reduce the memory footprint and disk but not same as the first option I provided. Since we store both fp32 vectors and quantized vectors on disk. During search we are using only quantized vectors.

Now in quantization if you are asking as a user you can view what were the values of quantized vectors via some API then ans is no, you cannot do that.

tejashu · July 2, 2025, 5:14pm

I was asking since the pretrained models returns and stores only FP32 vectors. Is there any way we can capture this FP32 vectors and quantize to byte/binary and store them.
Would this cause accuracy of the data to be lost during search time?

Navneet · July 2, 2025, 7:19pm

@tejashu if your index mappings has quantization in it, then after the model has converted the text to fp32 vector system will convert it.

If you want to do quantization of your own using some custom code in between, you can use Model PostProcessor. Here is a reference: ml-commons/docs/tutorials/semantic_search/semantic_search_with_byte_quantized_vector.md at main · opensearch-project/ml-commons · GitHub

tejashu · July 3, 2025, 3:24am

would this reduce the disk storage?
or would it index both the FP32 and byte vector?

tejashu · July 3, 2025, 6:29am

Also the _source excludes if set in mappings is increasing the storage size by 2X.
Is this behavior expected or is this a bug?

Ex:

"mappings": {
  "_source": {
    "excludes": [
      "Description*"
    ]
  },
  "properties": {

increased the storage size for 9662 docs from 22mB to 53.7mB.

Could you please let me know your feddback,
Thanks in advance

system · September 1, 2025, 6:30am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Reducing the Disk space of Indexes for Vector embeddings through pretrained models k-NN discuss , configure	6	250	April 18, 2025
Lucene InBuilt Scalar Quantization k-NN	16	174	October 15, 2024
Store Only PQ Vectors Without Original Embeddings & Update Non-Vector Fields in OpenSearch KNN OpenSearch	0	16	September 11, 2025
FP16 Quantization Not Reducing Memory Usage OpenSearch discuss , troubleshoot , configure	8	61	December 29, 2025
Exclude source in Dense vector embedding increases the storage size Machine Learning discuss , troubleshoot	3	33	December 28, 2025

Any way of reducing the disk space for embbeding vectors in OpenSearch 3.0

Related topics