Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
v 2.6.0
Describe the issue:
I’m trying to setup CCR in opensearch and running into issue when starting the replication.
I have created leader and follower clusters and indices on leader cluster. Also created the certs and followed all steps as per documentation. When I try to update the cluster settings with the ‘proxy’ mode I'm able to see the connection when I do a GET /_remote/info. And when I start the replication on follower cluster I’m getting Gateway Timeout error. On checking logs on follower cluster I see below errors. What I noticed is it fails as soon as I start replication(it validates the access first and fails straightaway with failed to restore snapshot
org.opensearch.repositories.RepositoryMissingException: [replication-remote-repo-new-5-connection-alias] missing) . Also at the same time on Leader cluster logs I see the logs from this call and dont see any errors there…Does CCR initiation needs access to create snapshot repo on leader and follower cluster? I gave unlimited access for leader and follower.
Configuration:
PUT /_plugins/_replication/follower-04/_start
{
"leader_alias": "new-5-connection-alias",
"leader_index": "leader-03",
"use_roles": {
"leader_cluster_role": "cross_cluster_replication_leader_full_access",
"follower_cluster_role": "all_access_copy"
}
}
Relevant Logs or Screenshots:
{
"statusCode": 502,
"error": "Bad Gateway",
"message": "Client request timeout"
}
[2024-05-14T02:33:20,522][WARN ][o.o.s.RestoreService ] [opensearch-cluster-follower-master-2] [replication-remote-repo-new-5-connection-alias:replication-remote-snapshot] failed to restore snapshot
org.opensearch.repositories.RepositoryMissingException: [replication-remote-repo-new-5-connection-alias] missing
[2024-05-14T02:33:20,525][DEBUG][o.o.c.s.MasterService ] [opensearch-cluster-follower-master-2] executing cluster state update for [update task state [replication:index:follower-04]]
[2024-05-14T02:33:20,526][DEBUG][o.o.c.s.MasterService ] [opensearch-cluster-follower-master-2] took [0s] to compute cluster state update for [update task state [replication:index:follower-04]]
[2024-05-14T02:33:20,526][DEBUG][o.o.c.s.MasterService ] [opensearch-cluster-follower-master-2] cluster state updated, version [1806], source [update task state [replication:index:follower-04]]
[2024-05-14T02:33:20,600][DEBUG][o.o.c.s.MasterService ] [opensearch-cluster-follower-master-2] took [0s] to notify listeners on successful publication of cluster state (version: 1806, uuid: O_3_yAoQQhGNv2oPFNMenA) for [update task state [replication:index:follower-04]]
[2024-05-14T02:33:20,638][WARN ][o.o.p.PersistentTasksClusterService] [opensearch-cluster-follower-master-2] persistent task replication:index:follower-04 failed
at org.opensearch.replication.task.index.IndexReplicationTask$failReplication$2.invokeSuspend(IndexReplicationTask.kt:286) ~[?:?]
at org.opensearch.replication.util.CoroutinesKt$waitForTaskCondition$2$1.test(Coroutines.kt:163) ~[?:?]
at org.opensearch.replication.util.CoroutinesKt$waitForTaskCondition$2$1.test(Coroutines.kt:163) ~[?:?]
[2024-05-14T02:33:50,507][ERROR][o.o.r.a.i.TransportReplicateIndexClusterManagerNodeAction] [opensearch-cluster-follower-master-2] Failed to trigger replication for follower-04 - java.lang.IllegalStateException: Timed out when waiting for persistent task after 30s
[2024-05-14T02:33:50,509][WARN ][r.suppressed ] [opensearch-cluster-follower-master-1] path: /_plugins/_replication/follower-04/_start, params: {pretty=true, index=follower-04}