Reindex API Unexpected Timeouts

Hello, I am trying to reindex data from an (old) ElasticSearch 7.16.2 cluster to a (new) OpenSearch 1.2.2 cluster. I have quite a bit of data I want to migrate and so I’ve been experimenting with a small index on the old cluster.

The reindex operation only sometimes works (about 50% of the time, the other 50% is an obscure connection timeout error).

Walkthrough Scenario:

  1. I send the following cURL request:
curl -X POST "https://<opensearch_node>:9200/_reindex/?pretty=true&wait_for_completion=true&timeout=2m" -H 'Content-Type: application/json' --data @reindex_body.json --cacert ca-certs.pem -u admin

Here are the contents of reindex_body.json:

  "source": {
    "remote": {
      "host": "https://<elasticsearch_node>:9200",
      "socket_timeout": "2m",
      "connect_timeout": "2m",
      "username": "<redacted>",
      "password": "<redacted>"
    "index": "sw-reports-new-test"
  "dest": {
    "index": "sw-reports-new-test92"
  1. The first time it usually works, and I get this:
  "took" : 443,
  "timed_out" : false,
  "total" : 9,
  "updated" : 9,
  "created" : 0,
  "deleted" : 0,
  "batches" : 1,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
  1. A few seconds later, I send the same request again, and after about 10 seconds I get this:
  "error" : {
    "root_cause" : [
        "type" : "i_o_exception",
        "reason" : "Connection timed out"
    "type" : "i_o_exception",
    "reason" : "Connection timed out"
  "status" : 500
  1. When I check the server logs I see a very strange and non-specific connection timeout error with no context, really. I don’t even know which connection timed out: Connection timed out
        at Method) ~[?:?]
        at ~[?:?]
        at ~[?:?]
        at ~[?:?]
        at ~[?:?]
        at ~[?:?]
        at org.apache.http.nio.reactor.ssl.SSLIOSession.receiveEncryptedData( ~[?:?]
        at org.apache.http.nio.reactor.ssl.SSLIOSession.isAppInputReady( ~[?:?]
        at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady( ~[?:?]
        at org.apache.http.impl.nio.reactor.BaseIOReactor.readable( ~[?:?]
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent( ~[?:?]
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents( ~[?:?]
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute( ~[?:?]
        at org.apache.http.impl.nio.reactor.BaseIOReactor.execute( ~[?:?]
        at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$ ~[?:?]
        at [?:?]

What I’ve Tried:
I’ve tried the following things:

  • Reindex and not wait for completion, then querying the task ID. Same result, works about 50% of the time.
  • Used multiple timeout values: I’ve set the timeout query parameter as well as the source.remote.socket_timeout and source.remote.connect_timeout values in the body. No differences observed.
  • Tried reindexing against different source/destination hosts
  • Tried restarting all nodes on source/dest cluster
  • Tried using a different destination index for each reindex test.

What I’ve Thought About Doing:

  • Using different versions of Java on the old and/or new cluster. The old cluster runs Java 8 and the new cluster runs on the bundled jdk (Java 15 I think).

Has anyone had experience with something like this? Any advice is much appreicated.

1 Like

I have reindexed massive indices successfully with logstash, instead of the built-in _reindex API feature.
I’ve found the logstash approach to be a bit more flexible, and responsive to backpressure from the target cluster. Basically, you’re setting up an elasticsearch{} input and output in a logstash pipeline. It’s handling one big giant scroll search for you, which you can tune with a refined query and/or the size of each scroll chunk. Heck, you could even parallelize this approach by splitting up the timeframe of the query among different logstash pipelines.

If it’s really a big job and you really want it to get done right, I also recommend throwing kafka info the mix. Have a logstash pipeline pull from the target and output into a kafka topic. Have another logstash pipeline pull from the kafka topic and into your target cluster.

1 Like