Bi-directional replication

Hello I am using OpenSearch 1.2.4.
I am trying to build bidirectional replication. I have 2 clusters, first on-prem, and second as EC2 in AWS. I want to auto-follow every index without a system index and long-term index reindexed index, which I think can make by:

POST /_plugins/_replication/_autofollow?pretty
{
“leader_alias” : “connection-to-on-prem”,
“name”: “on-prem-to-aws”,
“pattern”: “index*”,
“use_roles”:{
“leader_cluster_role”: “all_access”,
“follower_cluster_role”: “all_access”
}}

1) My first question is if I can make pattern like * and exclude some index.
Bcs I want to replicate all new indexes that I am making by rollover, but not to replicate the one that is long-term and where I reindex all older indexes.

2) Point is that I have many errors in my log.
Everyone looks like this:
[WARN ][o.o.r.t.s.ShardReplicationTask] [aplogdb-node1] [stp-int-000003][0] Encountered a failure while executing in org.opensearch.replication.action.changes.GetChangesRequest@3dbe2036. Retrying in 10 seconds.
Jun 29 14:19:01 aplogdb01-aws-spc aplogdb-node1-elastic: org.opensearch.OpenSearchTimeoutException: 1m
Jun 29 14:19:01 aplogdb01-aws-spc aplogdb-node1-elastic: #011at org.opensearch.replication.util.CoroutinesKt$waitForGlobalCheckpoint$2$listener$1.accept(Coroutines.kt:113) ~[?:?]
Jun 29 14:19:01 aplogdb01-aws-spc aplogdb-node1-elastic: #011at org.opensearch.index.shard.GlobalCheckpointListeners.lambda$notifyListener$3(GlobalCheckpointListeners.java:240) ~[opensearch-1.2.4.jar:1.2.4]
Jun 29 14:19:01 aplogdb01-aws-spc aplogdb-node1-elastic: #011at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:95) ~[?:?]
Jun 29 14:19:01 aplogdb01-aws-spc aplogdb-node1-elastic: #011at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571) ~[?:?]
Jun 29 14:19:01 aplogdb01-aws-spc aplogdb-node1-elastic: #011at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:738) ~[?:?]
Jun 29 14:19:01 aplogdb01-aws-spc aplogdb-node1-elastic: #011at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678) ~[?:?]
Jun 29 14:19:01 aplogdb01-aws-spc aplogdb-node1-elastic: #011at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665) ~[?:?]

3)
When I create auto-follow and run I cant stop that real-time or pause.
I tried to stop and that made affect but after last index got rollovered
And when I try to stop, looks like I didn’t find API to do that.

stop replication API operation.

I started auto-follow by:
POST /_plugins/_replication/_autofollow?pretty
{
“leader_alias” : “connection-to-on-prem”,
“name”: “on-prem-to-aws”,
“pattern”: “stp-a*”,

“use_roles”:{
“leader_cluster_role”: “all_access”,
“follower_cluster_role”: “all_access”
}}

and want to pause it by:
POST /_plugins/_replication/_autofollow/_pause
{}

or

POST /_plugins/_replication/stp-acc-‘actual-index’/_stop
{}
but I get error that replica is running and cant be stopped

Thank you very much.

Hi Vojtěch,

Thanks for trying out cross cluster replication .

  1. This is not possible with current API . Though, you can have a workaround , by letting all indices replicate . After that , you can stop the replication on the long-term index.

  2. This warning happens on leader cluster when there are no changes on leader to replicate . You can safely ignore these. Also just make sure if your leader and follower checkpoints are in sync using status API. I have created an action to make these debug logs.

  3. You can’t pause auto follow, but only stop it . The API you mentioned should have actually worked for ongoing replication jobs. The error suggests that replication is not going on in the first place . Can you please make sure that replication is running before you try to stop it via status API ?

GET _plugins/_replication/stp-acc-‘actual-index’/_status

Hello, @gbbafna thanks for your support and quick response!

My hope was that I can stop the whole auto-follow, reindex to a permanent index and delete temporary indexes that have been made and start auto-follow again.

But right now I think I can figure out solving.

My plan and maybe someone can take it as an example:
I have 2 clusters. One with permanent data storage that will hold data for more than 1 year and an AWS backup cluster, that will hold data for 14 days.
On the permanent cluster, I will send logs from applications by logstash to more indexes that will roll over daily by logstash config and in AWS I will turn on an auto-follow pattern for every new indexes, I will call them on-prem-to-aws-daily-‘date’ so pattern like on-prem-to-aws-daily-*. Every day I will close the replication index in AWS from yesterday with the script, that will make:

POST /_plugins/_replication/on-prem-to-aws-daily-‘yesterday-date’/_stop
{}

Now when I stopped replication, I can in AWS and on-prem reindex dates to a temporary index that will be called on-prem-to-aws-monthly and will write to his write-alias and will be rollovered by ISM.

Same from AWS machines to on-prem but logs from AWS I will hold only 14 days.

1) Can I auto-follow more indexes that doesn’t match same pattern?

ISM in on-prem will be like one month_rollover and one year_delete and in AWS will be 7days_rollover and 14 days_delete.

Because my last project that I was making on was reporting server and I was working on Opensearch 1.2.3 I started working on OpenSearch 1.2.4. I would like to ask if I can upgrade to OpenSearch 1.3 in production, or OpenSearch 2.0 without any doubts.

I need plugins:
apesmaster-node3 opensearch-alerting 1.2.4.0
apesmaster-node3 opensearch-anomaly-detection 1.2.4.0
apesmaster-node3 opensearch-cross-cluster-replication 1.2.4.0
apesmaster-node3 opensearch-index-management 1.2.4.0
apesmaster-node3 opensearch-job-scheduler 1.2.4.0
apesmaster-node3 opensearch-security 1.2.4.0
2) Are they all working on upgraded versions? Thank you :slight_smile:
I am asking because I saw some troubles like this:
https://github.com/opensearch-project/OpenSearch/issues/2916
https://github.com/opensearch-project/index-management/issues/33

GET /_plugins/_replication/autofollow_stats works fine and
GET _plugins/_replication/stp-acc-000008/_status works too :wink:
thanks so much!

Hi Vojtěch ,

1) Can I auto-follow more indexes that doesn’t match same pattern?

I didn’t get your question completely. But let me try to answer it from my understanding .

In AutoFollow, you can give a pattern to start replication on matching indices. You can configure multiple AutoFollow tasks as per your need .

Also thanks for sharing your plan, it is quite interesting .

2) Are they all working on upgraded versions?

The should be working fine. If there are any unexpected regressions, our teams would fix it and do patch releases as necessary.