[Aiven Opensearch sink Connecotor] Using this connector indexing is too slow to make stream pipeline

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

Opensearch Version3.2

Describe the issue:

I am currently working on migrating from Elasticsearch to OpenSearch. I’m using OpenSearch version 3.2 and the Aiven OpenSearch Kafka Sink Connector. In the process of completing this migration by replacing the existing ES Sink Connector with Aiven’s OpenSearch Sink Connector, I’m experiencing indexing speeds that are approximately 20 times slower than the ES Sink Connector.

I have adjusted various settings including:

  • Connector settings (bulk size, linger.ms, max.bulk.size, Kafka topic partition number, task.max, etc.)

  • OpenSearch settings (refresh interval, replica, etc.)

Despite changing these various options, I’m still experiencing a performance degradation of approximately 15-20 times slower speeds.

The total indexing volume is 30 million records. While the existing ES Sink Connector took 5 hours and 30 minutes to process this amount, the OpenSearch Sink Connector takes 55 hours, showing a difference of more than 11 times.

My Opensearch has 3 data node data nodes has 8 core CPU and 20Gbi Memory each other.

Configuration:
“bulk.size.bytes”: “10485760”,

“batch.size”: “3000”,

“max.buffered.records”: “20000”,

“read.timeout.ms”: “5000”,

“linger.ms”: “1000”

“max.in.flight.requests”: “5”,

Relevant Logs or Screenshots:

Yes, we have exactly the same issue with that connector (we are on OS 2.19).

Looking at the source code, I suspect that the connector is just not so great at managing multithreading as the one from Confluent. The confluent connector is built on top of “org.elasticsearch.client”, but the Aiven connector has a lot of synchronized code in its implementation.