Weird problem when write data with replication

BlackMetalz · March 31, 2022, 11:52am

Hello everyone, I’m testing data replication with some test data.

So in the leader cluster. I’m using this to insert test data

I did setup autofollow for every index

POST _plugins/_replication/_autofollow?pretty
{
   "leader_alias" : "leader-cluster",
   "name": "replication-from-leader",
   "pattern": "*",
   "use_roles":{
      "leader_cluster_role": "all_access",
      "follower_cluster_role": "all_access"
   }
}

So my problem here, when I used elasticsearch test data to insert data into OpenSearch in the leader cluster with the command:

python3 es_test_data.py --es-url=http://my-ip:9200 --username=admin --password=admin --index_name=my-index-0 --batch_size=1000 --count=50000

I run it 3 times, the first and second time, it seems fine, but when I hit the third time, the document doesn’t increase to 150000. It remains 100000 in the leader cluster

green open my-index-0               cO44ybVmSc21ukqTInKLoA 1 1 100000 0  19.3mb   9.6mb

But in follow cluster:

green open my-index-0                     k6HYqjf7Szyxg2jviLZsMw 1 1 119000 0  21.5mb  10.6mb

And log appears in the leader cluster

[2022-03-31T18:29:53,831][INFO ][o.o.c.m.MetadataCreateIndexService] [opensearch-c1-leader-dev] [my-index-0] creating index, cause [auto(bulk api)], templates [], shards [1]/[1]
[2022-03-31T18:29:54,157][INFO ][o.o.c.m.MetadataMappingService] [opensearch-c1-leader-dev] [my-index-0/cO44ybVmSc21ukqTInKLoA] create_mapping [test_type]
[2022-03-31T18:29:54,711][INFO ][o.o.c.r.a.AllocationService] [opensearch-c1-leader-dev] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[my-index-0][0]]]).
[2022-03-31T18:30:07,843][INFO ][o.o.c.s.IndexScopedSettings] [opensearch-c1-leader-dev] [my-index-0] updating [index.translog.generation_threshold_size] from [64mb] to [32mb]
[2022-03-31T18:30:07,908][INFO ][o.o.c.s.IndexScopedSettings] [opensearch-c1-leader-dev] [my-index-0] updating [index.translog.generation_threshold_size] from [64mb] to [32mb]
[2022-03-31T18:30:07,908][INFO ][o.o.c.s.IndexScopedSettings] [opensearch-c1-leader-dev] [my-index-0] updating [index.plugins.replication.translog.retention_lease.pruning.enabled] from [false] to [true]

After 15 minutes or even higher, it shows the correct number of documents I put in both leader and cluster

I have 3 nodes with the same master and data spec, each node has 1G heap ( This is just for fast testing )

Index setting ( seem like auto-created )

{
  "my-index-0" : {
    "settings" : {
      "index" : {
        "number_of_shards" : "1",
        "translog" : {
          "generation_threshold_size" : "32mb"
        },
        "plugins" : {
          "replication" : {
            "translog" : {
              "retention_lease" : {
                "pruning" : {
                  "enabled" : "true"
                }
              }
            }
          }
        },
        "provided_name" : "qwe1",
        "creation_date" : "1648725841414",
        "number_of_replicas" : "1",
        "uuid" : "sCZtIR4gT9St6l3y_37AYg",
        "version" : {
          "created" : "135247827"
        }
      }
    }
  }
}

Test with other index , this time i put 50k document each time and do it for 4 times

Both of leader and follow show this

green open os-1                         16G6gZ_ZSoCzBz_SbpPhlg 1 1 100000 0    23mb  11.1mb

But when i Check replication status

{
  "status" : "SYNCING",
  "reason" : "User initiated",
  "leader_alias" : "leader-cluster",
  "leader_index" : "os-1",
  "follower_index" : "os-1",
  "syncing_details" : {
    "leader_checkpoint" : 199999,
    "follower_checkpoint" : 149999,
    "seq_no" : 149999
  }
}

Hmmm, interesting. Why leader show 100000 but the leader checkpoint is 19999?
I’m not sure this is a bug or need, will report if I get to confirm this is a bug

P/s: I haven’t test without replication

BlackMetalz · April 4, 2022, 2:50am

@amkhar : any idea about this, this is a feature or “feature”

BlackMetalz · April 13, 2022, 4:03am

20 chars

skumrik · April 20, 2022, 3:06pm

Hey BlackMetalz,
Thanks for your post.

However this doesnt seems to be an issue. The delay you are seeing in the documents count is due to the nature of ‘_cat/indices’ API. This API refreshes the document count after certain interval which we can see in the refresh value of /_stats API. Hence it doesn’t not always return accurate doc counts.

We can use count API instead to see accurate document counts.

BlackMetalz · April 21, 2022, 4:42pm

Thanks for the reply, i will try again

BlackMetalz · April 22, 2022, 4:50am

@skumrik thanks for the help, it is correct as you said as i just tried xD

skumrik · April 22, 2022, 5:14am

So it was a feature then

Topic		Replies	Views
Rest API about replication failures Cross-Cluster Replication	1	244	April 13, 2024
Replication on existing index Cross-Cluster Replication	4	1225	July 26, 2021
No user found for indices:data/read/opendistro/replication/changes Cross-Cluster Replication	5	954	July 8, 2021
Cross cluster replication autofollow Cross-Cluster Replication configure	1	577	March 31, 2022
OpenSearch cluster very high transport data transfer OpenSearch	3	674	January 4, 2024

Weird problem when write data with replication

Related topics