CCR bootstrap of large clusters

gchakkalakkal1 · June 23, 2026, 3:10am

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): 3.2

Describe the issue:CCR bootstrap starts successfully and begins transferring shard segment files from the leader to the follower cluster.

For some large shards, file transfer fails with a CorruptIndexException (or related index corruption error).
The affected follower shard transitions to a failed state.
CCR replication for the shard does not automatically recover and requires manual intervention.
In large clusters with hundreds or thousands of shards, even a small number of shard failures can prevent successful completion of the bootstrap process.

Configuration:

Relevant Logs or Screenshots:

Anthony · June 24, 2026, 11:06am

@gchakkalakkal1 This looks like it could be a known CCR bug (#1465, #1482): leader-side segment reads used a Lucene IndexInput opened with IOContext.READONCE, which on newer JDKs backs onto a thread-confined memory segment. When a large shard’s transfer spans multiple chunk requests handled by different threads, accessing it cross-thread throws IllegalStateException: confined, which can surface as a corrupt/incomplete file. Fixed upstream in PR #1520, merged April 2025 and included in the official 3.2.0.0 plugin release.

Two things that would help confirm:

Does your actual stack trace mention IllegalStateException: confined or MemorySegmentIndexInput?
Are you on the official OpenSearch distribution, or an AWS Managed Service? Managed offerings have shipped older CCR plugin builds under a given version label even after the upstream fix landed.

Topic		Replies	Views
CCR and "shard could not be allocated to any of the nodes" problem OpenSearch	4	91	May 1, 2025
ResourceAlreadyExistsException already exist Cross-Cluster Replication	2	702	March 21, 2024
CCR error "Unable to initiate restore call for " on start replication Cross-Cluster Replication	2	721	December 14, 2021
Cross Cluster Replication not working OpenSearch troubleshoot	0	90	December 28, 2024
CCR not able to connect with "handshake failed because connection reset" Cross-Cluster Replication	2	1093	September 5, 2023

CCR bootstrap of large clusters

Related topics