[ERROR] Can't start cross cluster replication

I have installed the cross cluster plugin on both the clusters, and my ES version is 7.10.2, opendistron version is 1.13.2.
I am follew to cross cluster connectivity as mentioned in step (cross-cluster-replication/HANDBOOK.md at main · opendistro-for-elasticsearch/cross-cluster-replication · GitHub)

When I try the steps to “Start replication”, I get the following error
curl -k -u testuser:testuser -XPUT “https://${FOLLOWER}/_opendistro/_replication/follower-01/_start?pretty” -H ‘Content-type: application/json’ -d’{“remote_cluster”: “leader-cluster”, “remote_index”: “leader-01”}’
{
“error” : {
“root_cause” : [
{
“type” : “action_not_found_transport_exception”,
“reason” : “No handler for action [internal:indices/admin/opendistro/replication/index/start]”
}
],
“type” : “action_not_found_transport_exception”,
“reason” : “No handler for action [internal:indices/admin/opendistro/replication/index/start]”
},
“status” : 500
}

Please help me on this issue please.Thanks,
Cindy

@ccr-devs Any thoughts here?

Hi Cindy,

Apologies for the delay. It looks like you are missing the cross-cluster-replication plugin. You can confirm this by running the following command.

curl -k -u testuser:testuser -XGET “https://${FOLLOWER}/_cat/plugins

curl -k -u testuser:testuser -XGET “https://${LEADER}/_cat/plugins

The current CCR plugin is experimental and need to be installed explicitly. Can you try the instructions if you haven’t already?

Please let us know if these steps didn’t help.

Dear @krishna_ggk

I use internal user-testuser will show no permission, I get the following error
curl -k -u testuser:testuser -XGET https://${FOLLOWER}/_cat/plugins
{
“error” : {
“root_cause” : [
{
“type” : “security_exception”,
“reason” : “no permissions for [cluster:monitor/nodes/info] and User [name=testuser, backend_roles=, requestedTenant=null]”
}
],
“type” : “security_exception”,
“reason” : “no permissions for [cluster:monitor/nodes/info] and User [name=testuser, backend_roles=, requestedTenant=null]”
},
“status” : 403
}

So I run the following command by admin, I get the following response
curl -k -u admin:admin -XGET https://${LEADER}/_cat/plugins?pretty
LEADER

I find LEADER ndoe1 and FOLLOWER node3 CCR plugin aren’t installed explicitly installed explicitly.
So i run command to check in /usr/share/elasticsearch/bin/elasticsearch-plugin, here is showing already install.
q01-list

Thanks for your response!
Cindy

Hi @krishna_ggk

Update my experiment!
LEADER ndoe1 and FOLLOWER node3 CCR plugin aren’t installed explicitlyinstalled explicitly with using this command to find out curl -k -u admin:admin -XGET https://${LEADER}/_cat/plugins?pretty
I stopped LEADER ndoe1 and FOLLOWER node3,then CCR was successful.

I added doc with leader-01 index, i didn’t why follower-03 don’t replicated from leader-01 doc.
(leader and follower cluster doc count as shown below)

1 Like

I guess we called it is a feature, not a bug :joy:

BTW I test CCR in the dev tool in 2 clusters, each cluster has 3 nodes. ( i used admin user so I guess it already has all required permission )

[2021-07-15T11:59:08,719][WARN ][o.e.s.InternalSnapshotsInfoService] [adt-sys-kienlt-dev-92-67] failed to retrieve shard size for [snapshot=opendistro-remote-repo-leader-cluster:opendistro-remote-snapshot/262b3eb7-92a2-3e1c-b326-4b314730ed32, index=[leader-01/Zl5NOk4RRMuIpzWDrZ5hYw], shard=[follower-01][0]]
org.elasticsearch.ElasticsearchSecurityException: No user found for indices:monitor/stats

[2021-07-15T11:59:09,242][WARN ][o.e.p.PersistentTasksClusterService] [adt-sys-kienlt-dev-92-67] persistent task replication:index:follower-01 failed
org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper: replication_exception: Remote restore failed: shard could not be allocated to any of the nodes
	at com.amazon.elasticsearch.replication.task.index.IndexReplicationTask.waitForRestore(IndexReplicationTask.kt:277) ~[?:?]
	at com.amazon.elasticsearch.replication.task.index.IndexReplicationTask$waitForRestore$1.invokeSuspend(IndexReplicationTask.kt) ~[?:?]
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) ~[?:?]
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:56) ~[?:?]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:684) ~[elasticsearch-7.10.2.jar:7.10.2]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
	at java.lang.Thread.run(Thread.java:832) [?:?]
[2021-07-15T11:59:09,262][INFO ][o.e.c.m.MetadataDeleteIndexService] [adt-sys-kienlt-dev-92-67] [follower-01/lxS3ZMJkQLGAUulx6lkMag] deleting index

since the log above appears, no index in the follower cluster appears.
Source guide: cross-cluster-replication/HANDBOOK.md at main · opendistro-for-elasticsearch/cross-cluster-replication · GitHub

Edit: nvm. Add those line into elastiscearch.yml :smiley:

opendistro_security.unsupported.inject_user.enabled: true
opendistro_security.nodes_dn_dynamic_config_enabled: true
node.remote_cluster_client: true

Yes it doesn’t replicate new data to replicated cluster.
My example:
Create test_ccr index in main cluster.
Start in replicate cluster like:

PUT _opendistro/_replication/test_ccr/_start?pretty
{
  "remote_cluster": "leader-cluster",
  "remote_index": "test_ccr"
}

It does replicates all document from main cluster. But when i keep insert more document, in replicate cluster doesn’t change.

I tried to stop and start again but it says

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Cant use same index again for replication. Either close or delete the index:test_ccr"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "Cant use same index again for replication. Either close or delete the index:test_ccr"
  },
  "status" : 400
}

I have tried restarting the replication after closing the index but still getting the same error as above

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Cant use same index again for replication. Either close or delete the index:follower-test"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "Cant use same index again for replication. Either close or delete the index:follower-test"
  },
  "status" : 400
}

but If I delete the index then it works fine. Can somebody help with the close thing not working?

@rivanshu I believe you are using the old opendistro version of the replication plugin.
In the opensearch replication plugin this has been changed and you need to delete the index (closing the index will not work).
Code reference: cross-cluster-replication/TransportReplicateIndexClusterManagerNodeAction.kt at main · opensearch-project/cross-cluster-replication · GitHub

@soosinha yes I am using the old opendistro version which I guess supports the replication on closed indices but is not working for me. Do you have any leads on what could be the issue?

As per the code, it checks for the cluster state for the presence of the index before starting replication. But the index will be present in the cluster state even if the index is closed. So it needs to be deleted before starting replication. Although, the validation messaging may be incorrect.
So I guess you will need to delete the index if you want to used the same index name.
Note that the opendistro CCR plugin was experimental and there was no actual release for the plugin. I would recommend you to use the OpenSearch CCR plugin

Is this new plugin compatible with the regular Elasticsearch or only OpenSearch?

It is compatible with OpenSearch only