Opensearch Replication & Recovery performance issue. tooks so long (100GiB -> 2 ~ 3 Hours)

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

  • Opensearch : 3.2.0
  • Opensearch Operator : 2.8.0
  • Environment : EKS 1.30
  • Instance Type : m6g
  • Volume : GP3 (IOPS, ThroughPut MAX)

Describe the issue:

Hi.

I`m migrating from elasticsearch to opensearch.

I am in the process of testing the speed of recovery and replication.

We are checking the replication speed with the following settings.

Test Index is almost 100GiB.

PUT /my_embedding_index/_settings
{
  "number_of_replicas": 1
}

What I’ve done: Change recovery speed, set file chunks, and more with Cluster Settings

Changing settings such as max_bytes_per_sec, node_concurrent_recoveries, etc.

speeds up only in the beginning and then slows down exponentially from a certain point in time.

Nothing worked.

The peculiarities I found are as follows.

  1. Network Bandwidth is not problem
  2. Disk IO Exceeded and Throughput is not problem
  3. EBS Idle time is almost zero


The difference between Elastic Search and Open Search

[ Opensearch : 3~4k ]

Read IOPs are stuck at some point.

[ ElasticSearch : 60k ]

ElasticSearch is far more disk IOPs than Opensearch.

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
nvme2n1           0.00     0.00 1750.00    0.00  7000.00     0.00     8.00     0.98    0.56    0.56    0.00   0.57  99.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.59    0.00    1.05    5.97    0.00   90.39

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
nvme2n1           0.00     8.00 1746.00   22.00  6984.00   120.00     8.04     1.01    0.57    0.56    1.50   0.54  96.00

I don`t know why opensearch uses so high Disk IOPS and So high Utilization.

Configuration:

ElasticSearch and Opensearch is almost same configuration and Computing spec.

Relevant Logs or Screenshots: