Bi-directional replication

vnovotny98 · June 29, 2022, 12:25pm

Hello I am using OpenSearch 1.2.4.
I am trying to build bidirectional replication. I have 2 clusters, first on-prem, and second as EC2 in AWS. I want to auto-follow every index without a system index and long-term index reindexed index, which I think can make by:

POST /_plugins/_replication/_autofollow?pretty
{
“leader_alias” : “connection-to-on-prem”,
“name”: “on-prem-to-aws”,
“pattern”: “index*”,
“use_roles”:{
“leader_cluster_role”: “all_access”,
“follower_cluster_role”: “all_access”
}}

1) My first question is if I can make pattern like * and exclude some index.
Bcs I want to replicate all new indexes that I am making by rollover, but not to replicate the one that is long-term and where I reindex all older indexes.

2) Point is that I have many errors in my log.
Everyone looks like this:
[WARN ][o.o.r.t.s.ShardReplicationTask] [aplogdb-node1] [stp-int-000003][0] Encountered a failure while executing in org.opensearch.replication.action.changes.GetChangesRequest@3dbe2036. Retrying in 10 seconds.
Jun 29 14:19:01 aplogdb01-aws-spc aplogdb-node1-elastic: org.opensearch.OpenSearchTimeoutException: 1m
Jun 29 14:19:01 aplogdb01-aws-spc aplogdb-node1-elastic: #011at org.opensearch.replication.util.CoroutinesKt$waitForGlobalCheckpoint$2$listener$1.accept(Coroutines.kt:113) ~[?:?]
Jun 29 14:19:01 aplogdb01-aws-spc aplogdb-node1-elastic: #011at org.opensearch.index.shard.GlobalCheckpointListeners.lambda$notifyListener$3(GlobalCheckpointListeners.java:240) ~[opensearch-1.2.4.jar:1.2.4]
Jun 29 14:19:01 aplogdb01-aws-spc aplogdb-node1-elastic: #011at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:95) ~[?:?]
Jun 29 14:19:01 aplogdb01-aws-spc aplogdb-node1-elastic: #011at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571) ~[?:?]
Jun 29 14:19:01 aplogdb01-aws-spc aplogdb-node1-elastic: #011at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:738) ~[?:?]
Jun 29 14:19:01 aplogdb01-aws-spc aplogdb-node1-elastic: #011at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678) ~[?:?]
Jun 29 14:19:01 aplogdb01-aws-spc aplogdb-node1-elastic: #011at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665) ~[?:?]

3)
When I create auto-follow and run I cant stop that real-time or pause.
I tried to stop and that made affect but after last index got rollovered
And when I try to stop, looks like I didn’t find API to do that.

stop replication API operation.

I started auto-follow by:
POST /_plugins/_replication/_autofollow?pretty
{
“leader_alias” : “connection-to-on-prem”,
“name”: “on-prem-to-aws”,
“pattern”: “stp-a*”,

“use_roles”:{
“leader_cluster_role”: “all_access”,
“follower_cluster_role”: “all_access”
}}

and want to pause it by:
POST /_plugins/_replication/_autofollow/_pause
{}

or

POST /_plugins/_replication/stp-acc-‘actual-index’/_stop
{}
but I get error that replica is running and cant be stopped

Thank you very much.

gbbafna · June 29, 2022, 1:20pm

Hi Vojtěch,

Thanks for trying out cross cluster replication .

This is not possible with current API . Though, you can have a workaround , by letting all indices replicate . After that , you can stop the replication on the long-term index.
This warning happens on leader cluster when there are no changes on leader to replicate . You can safely ignore these. Also just make sure if your leader and follower checkpoints are in sync using status API. I have created an action to make these debug logs.
You can’t pause auto follow, but only stop it . The API you mentioned should have actually worked for ongoing replication jobs. The error suggests that replication is not going on in the first place . Can you please make sure that replication is running before you try to stop it via status API ?

GET _plugins/_replication/stp-acc-‘actual-index’/_status

vnovotny98 · June 29, 2022, 2:46pm

Hello, @gbbafna thanks for your support and quick response!

My hope was that I can stop the whole auto-follow, reindex to a permanent index and delete temporary indexes that have been made and start auto-follow again.

But right now I think I can figure out solving.

My plan and maybe someone can take it as an example:
I have 2 clusters. One with permanent data storage that will hold data for more than 1 year and an AWS backup cluster, that will hold data for 14 days.
On the permanent cluster, I will send logs from applications by logstash to more indexes that will roll over daily by logstash config and in AWS I will turn on an auto-follow pattern for every new indexes, I will call them on-prem-to-aws-daily-‘date’ so pattern like on-prem-to-aws-daily-*. Every day I will close the replication index in AWS from yesterday with the script, that will make:

POST /_plugins/_replication/on-prem-to-aws-daily-‘yesterday-date’/_stop
{}

Now when I stopped replication, I can in AWS and on-prem reindex dates to a temporary index that will be called on-prem-to-aws-monthly and will write to his write-alias and will be rollovered by ISM.

Same from AWS machines to on-prem but logs from AWS I will hold only 14 days.

1) Can I auto-follow more indexes that doesn’t match same pattern?

ISM in on-prem will be like one month_rollover and one year_delete and in AWS will be 7days_rollover and 14 days_delete.

Because my last project that I was making on was reporting server and I was working on Opensearch 1.2.3 I started working on OpenSearch 1.2.4. I would like to ask if I can upgrade to OpenSearch 1.3 in production, or OpenSearch 2.0 without any doubts.

I need plugins:
apesmaster-node3 opensearch-alerting 1.2.4.0
apesmaster-node3 opensearch-anomaly-detection 1.2.4.0
apesmaster-node3 opensearch-cross-cluster-replication 1.2.4.0
apesmaster-node3 opensearch-index-management 1.2.4.0
apesmaster-node3 opensearch-job-scheduler 1.2.4.0
apesmaster-node3 opensearch-security 1.2.4.0
2) Are they all working on upgraded versions? Thank you
I am asking because I saw some troubles like this:
https://github.com/opensearch-project/OpenSearch/issues/2916
https://github.com/opensearch-project/index-management/issues/33

GET /_plugins/_replication/autofollow_stats works fine and
GET _plugins/_replication/stp-acc-000008/_status works too
thanks so much!

gbbafna · July 1, 2022, 5:13am

Hi Vojtěch ,

1) Can I auto-follow more indexes that doesn’t match same pattern?

I didn’t get your question completely. But let me try to answer it from my understanding .

In AutoFollow, you can give a pattern to start replication on matching indices. You can configure multiple AutoFollow tasks as per your need .

Also thanks for sharing your plan, it is quite interesting .

2) Are they all working on upgraded versions?

The should be working fine. If there are any unexpected regressions, our teams would fix it and do patch releases as necessary.

Topic		Replies	Views
Several question related to Cross-Cluster replication ( awesome feature of Opensearch ) Cross-Cluster Replication discuss , troubleshoot	4	1210	March 28, 2022
Replication on existing index Cross-Cluster Replication	4	1238	July 26, 2021
Cross Cluster Replication Automatic Stop Cross-Cluster Replication	3	713	May 24, 2022
Cross cluster replication autofollow Cross-Cluster Replication configure	1	577	March 31, 2022
Cross cluster replication not started Security troubleshoot , security-issue	25	288	July 22, 2024

Bi-directional replication

Related topics