Cluster becomes too slow when indexing knn_vector data

Hi all,

I have a 6 node cluster (2 master and 4 data) and I created an index with nested knn_vector field. I am noticing the whole cluster becoming too slow during indexing. Once indexing is complete, the performance gradually becomes better.

Note that I am indexing around 36000 document with each document may have 100 nested vectors. Each vector is of size 64.

Any recommendation on how to overcome such slowness?

Thanks

Hi,

Generally it’s expected behavior as search graphs are rebuilt during data ingestion and that takes some resources from nodes. You can play with segment merge and index refresh settings, like changing max_segments and refresh interval.

Merging segments should improve search latencies and decrease memory required for graphs, but it will consume resources and cause slowness if search is running in parallel. The less max_segments you set the more lengthy segment merge will be.
You can also save on resource usage with refresh settings. By default refresh interval is 1 second, which may be too frequent. If you don’t need ingested data right away you can disable refresh by setting “-1” as refresh interval and then do one refresh API call after all data has been ingested.