Knn_vector takes too much disk space

Hi!
I’m trying to upload 10s of million 512-dim vectors using the OpenSearch KNN plugin.
The problem is disk storage gets filled too quickly.
For example, I use 2 configurations, one with the “content” knn_vector field, and one without. I do not configure any methods, so there shouldn’t be any indices taking up the space.
Field config looks like:

{
    "type": "knn_vector",
    "dimension": 512,
},

With this field, the index takes 10+gb per 1 million rows and without it, it’s close to zero (several Mbs).
I’d expect it to consume ~2Gb per 1 million rows since one 512-dim vector should be 512*4 bytes.
Could you please clarify if this is expected behavior and what could be the reason behind high disk usage?

Hi @PaulNarbe,

You can find the memory calculations based on this formula Performance Tuning - Open Distro Documentation

1.1 * (4 * dimension + 8 * M)
M=16(default)

Behind the scenes Hierarchical graphs gets created per segment for running approximate k-NN search

Hi, @vamshin, thank you for the reply!

Behind the scenes Hierarchical graphs gets created per segment for running approximate k-NN search

I do not specify ANN index in props:

"properties": {
    "field_1": {
        "type": "keyword",
        "index": False,
        "norms": False,
        "ignore_above": 36
    },
    "field_2": {
        "type": "keyword",
        "norms": False,
        "ignore_above": 36
    },
    "vector_field": {
        "type": "knn_vector",
        "dimension": 512
    },
}

Using the calculation (in my case dimension=512, M=16), 1 million vectors should require ~2.4Gb, but, as I mentioned, in my test it’s 10+Gb.

Those are stats for my test instance, 2573511 docs uploaded, ~29Gb taken, which means every doc is ~12Kb:

"primaries": {
	"docs": {
		"count": 2573511,
		"deleted": 0
	},
	"store": {
		"size_in_bytes": 31388403696,
		"reserved_in_bytes": 0
	},
	"indexing": {
		"index_total": 2573511,
		"index_time_in_millis": 1477518,
		"index_current": 0,
		"index_failed": 0,
		"delete_total": 0,
		"delete_time_in_millis": 0,
		"delete_current": 0,
		"noop_update_total": 0,
		"is_throttled": false,
		"throttle_time_in_millis": 0
	},
	"get": {
		"total": 0,
		"time_in_millis": 0,
		"exists_total": 0,
		"exists_time_in_millis": 0,
		"missing_total": 0,
		"missing_time_in_millis": 0,
		"current": 0
	},
	"search": {
		"open_contexts": 0,
		"query_total": 54,
		"query_time_in_millis": 55,
		"query_current": 0,
		"fetch_total": 21,
		"fetch_time_in_millis": 14,
		"fetch_current": 0,
		"scroll_total": 0,
		"scroll_time_in_millis": 0,
		"scroll_current": 0,
		"suggest_total": 0,
		"suggest_time_in_millis": 0,
		"suggest_current": 0
	},
	"merges": {
		"current": 0,
		"current_docs": 0,
		"current_size_in_bytes": 0,
		"total": 26,
		"total_time_in_millis": 3576609,
		"total_docs": 408553,
		"total_size_in_bytes": 4984037156,
		"total_stopped_time_in_millis": 0,
		"total_throttled_time_in_millis": 215428,
		"total_auto_throttle_in_bytes": 102951098
	},
	"refresh": {
		"total": 329,
		"total_time_in_millis": 1571172,
		"external_total": 268,
		"external_total_time_in_millis": 650299,
		"listeners": 0
	},
	"flush": {
		"total": 45,
		"periodic": 40,
		"total_time_in_millis": 8580797
	},
	"warmer": {
		"current": 0,
		"total": 263,
		"total_time_in_millis": 1
	},
	"query_cache": {
		"memory_size_in_bytes": 0,
		"total_count": 10,
		"hit_count": 0,
		"miss_count": 10,
		"cache_size": 0,
		"cache_count": 0,
		"evictions": 0
	},
	"fielddata": {
		"memory_size_in_bytes": 0,
		"evictions": 0
	},
	"completion": {
		"size_in_bytes": 0
	},
	"segments": {
		"count": 132,
		"memory_in_bytes": 160112,
		"terms_memory_in_bytes": 67584,
		"stored_fields_memory_in_bytes": 82496,
		"term_vectors_memory_in_bytes": 0,
		"norms_memory_in_bytes": 0,
		"points_memory_in_bytes": 0,
		"doc_values_memory_in_bytes": 10032,
		"index_writer_memory_in_bytes": 0,
		"version_map_memory_in_bytes": 0,
		"fixed_bit_set_memory_in_bytes": 0,
		"max_unsafe_auto_id_timestamp": -1,
		"file_sizes": {}
	},
	"translog": {
		"operations": 0,
		"size_in_bytes": 275,
		"uncommitted_operations": 0,
		"uncommitted_size_in_bytes": 275,
		"earliest_last_modified_age": 77278667
	},
	"request_cache": {
		"memory_size_in_bytes": 0,
		"evictions": 0,
		"hit_count": 0,
		"miss_count": 0
	},
	"recovery": {
		"current_as_source": 0,
		"current_as_target": 0,
		"throttle_time_in_millis": 0
	}
},

Discussion moved to Github issue [BUG] Unexpectedly high memory consumption by Lucene · Issue #507 · opensearch-project/k-NN · GitHub