KNN Derived Source and on_disk vectors - Expectations

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): Opensearch 3.1, local docker image.

Describe the issue:

Creating an index with

“index.knn.derived_source.enabled":true

does not seem to yeild any storage savings.

What should expectations be for index.knn.derived_source.enabled, maybe it doesnt work in combination with on_disk compressed vectors?

Configuration:

2 indices with identical configurations and roughly 100k 1024dim vectors, and a few text & keyword fields.

One has derived fields turned on, but no storage savings can be seen.

The KNN definition for both indices are:

            "cohere_v3": {
              "type": "knn_vector",
              "dimension": 1024,
              "mode": "on_disk",
              "compression_level": "32x",
              "space_type": "l2"
            }

Relevant Logs or Screenshots:

The index `replication-32x-no-derived` has the index.knn.derived_source.enabled set to false.

The index `replication-32x-derived` has the setting turned on.

Im using ElasticVue to check the sizes, but the primary shard size in bytes is the same aswell.

Thanks!

Minor additions

just Ingesting the documents without indexing the vector field, the index results in 1.15 GB.

I would expect that using derived source would end up being somewhere between 1.15GB and the 1.63 GB.

Screenshot 2025-11-03 at 17.47.00

Using an older version of Opensearch, 2.17 and creating a identical index, without knn-derived-source, as it didnt exist yet, gives me basically the same index size as using Opensearch 3.1 with knn-derived-source.

Furthering my suspicion that I should be seeing some difference :slight_smile:

Screenshot 2025-11-03 at 17.50.03

Hopefully someone can help me with some tools to investigate, or can tell me what to expect storage-wise :slight_smile:

@jakob_b could you verify the index settings to ensure derived source is enabled. Ideally you should see difference

Hello! Yes, it returns the values true for the derived source enabled index, and false for the disabled index.

@jakob_b
Do you have permissions to estimate the size of directory which contains vector segments?

Im using the no-auth configuration and docker compose to run the clusters, but that could be a thing I can verify! Can you point me to how I can verify this?

You can execute a container using -it option so that the size of segments would be verified.

The vector graph data is ultimately stored within the Lucene segment files in the OpenSearch data directory, typically located under /usr/share/opensearch/data/nodes/0/indices/{indexId}/{shardId} path.