Locate reindex bottleneck

sezuan2 · September 23, 2021, 3:16pm

Hello,

I’m need some help to locate the reason for slow reindexing speed. I’m reindexing one index from a 3 node to cluster to another 3 node cluster in the same datacenter. Relevant changes that should affect the speed are disabling of the replicas and the refreshing:

      "index.number_of_replicas" : "0",
      "index.refresh_interval" : "-1",

Neither the CPU, nor the disk io nor the network are even remotely saturated. I get a pretty constant indexing rate of about 1240 documents/s.
The index itself is a bit special since it is for reasons heavily overshared. There are 960 primary shards + 1 replica.

How can I identify the bottleneck?

best regards,
Matthias

searchymcsearchface · September 24, 2021, 1:48pm

Anything unusual about this index itself - e.g. large docs? stored fields?

sezuan2 · September 29, 2021, 10:47am

That’s a good point! The doc size very different, from like 1KB to 10s of MB.

My next approach was to split the documents in groups of a certain size, like 0 to 10000 bytes, 10000 to 15000 bytes… This also allows to run the reindexing in parallel and to set a proper batch size. That was necessary, because reindexing everything at once was often interrupted because the 100mb buffer was exceeded.

With this approach I’ve achieved a reindexing rate of about 6K/s, which sounds reasonable.

searchymcsearchface · September 30, 2021, 1:37pm

Yeah - OpenSearch is more aligned to doing constant ingestion of documents rather than batches like what you originally described. Seems like you are doing a good job now but there are lots of optimizations strategies that are possible, but you often need to tune it according to your specific document quirks.

Topic		Replies	Views
Question Abour Duration Reindex OpenSearch discuss	2	588	April 27, 2023
Can I use _reindex api to copy more than 10000 documents OpenSearch	1	449	April 24, 2024
ReIndexing index data from ElasticSearch 7.14.0 to OpenSearch 2.14.0 using ReIndex API Open Source Elasticsearch and Kibana migration	8	609	September 21, 2024
Reindex query matched documents to new index in real-time OpenSearch discuss , configure	0	327	July 13, 2022
Reindex job failing with search phase execution exception OpenSearch	4	1549	August 19, 2024

Locate reindex bottleneck

Related topics