Hi Team,
-
I have established a Cross-Cluster Replication supported environment
where i could find a mismatch in the document/record count in the indices
in both follower and the leader clusters. In fact, The document/record
count in the follower seems to be more sometimes when i try to hit the
records from _cat/indices. At the same time, When i try to leave the cluster
for a long time, the data starts to match and settle.Example:
[root@root leader]# curl http://${LEADER}/_cat/indices?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open log-test-2022.11.07 2yDUiNCaTQeIiLXWHIA-QQ 1 1 4156 0 2.3mb 1.2 [root@root leader]# curl http://${FOLLOWER}/_cat/indices?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open log-test-2022.11.07 XF_ebkznSueHHF4x85rZNA 1 1 35 0 2.3mb 1.2
But when i try to fetch the total hit counts via the below curl commands, Im
able to see consistent values in both the leader and follower opensearch
clusters. All the commands were run within very short time intervals (within 2 or 3 seconds)
[root@root leader]# curl http://${LEADER}/log-test-2022.11.07/_search?pretty| jq .hits.total.value
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed
100 23322 100 23322 0 0 63835 0 --:--:-- --:--:-- --:--:-- 63895
6636
[root@root follower]# curl http://${FOLLOWER}/log-test-2022.11.07/_search?
pretty | jq .hits.total.value
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed
100 23322 100 23322 0 0 63835 0 --:--:-- --:--:-- --:--:-- 63895
6636
Is this expected? Does the data takes time to reflect in both the clusters
when we try to use the _cat/indices API? This mismatch in data was also observed in a single opensearch cluster where the document/record count were totally different when i tried using _cat/indices and <index_name>/_search?pretty APIs
- Second Query: When CCR is configured is it mandatory to provide all the nodes of Leader (including cluster_manager, data, ingest nodes) as the seed hosts in the follower settings or just the master node is enough for the replication. Can this be a service name whose endpoints contain the list of Opensearch nodes of the leader cluster or should we mention individual nodes of the leader cluster for the follower to follow the leader.
Also, Is it mandatory to configure the remote_cluster_client role in the node.roles for all the nodes of the follower (i.e in the ingest, cluster_manager and data nodes)
TIA,
Sanjay