Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
2.15.0
Describe the issue:
When querying a remote cluster through the remote coordinating cluster, ie through <cluster>:index-*/_search
, I’m experiencing a significant amount of added latency.
This latency only presents itself when querying a large amount of shards. The data is timeseries and the query requests the last 15 minutes of data so many shards are “skipped”. When i query the remote cluster directly i’ll get a result back in say 200ms, but through the coordinating cluster it is around 600ms.
I’m a bit confused on where this latency is introduced, the cluster is under no load, thread pools look good, resource utilization looks good, network latency seems minimal.
When using curl
and querying the remote cluster directly from one of the remote coordinating cluster nodes my result is returned in around 200ms so i’m uncertain if its extra steps the node is introducing when returning the result set or not.
minimize_round_trips
is set to true.
This extra ~400ms is not consistent and seems to scale with the amount of shards queried (or perhaps docs). As in if i query a very small subset of shards i’ll get a result back in 30-50ms regardless if i query through the coordinating cluster or the remote cluster itself.
I’d think that since minimize_round_trips
is set to true and i’m only querying a single cluster that the overhead of merging the results would be minimal.
Thoughts? And thanks!