I was recently working with a customer who uses the Go client to connect to OpenSearch and they saw a timeout that propagated back to their users when a node got replaced.
From my experience with the Java client, I thought, “Oh, that’s easy – the client kept a stale connection to the old node in its pool and timed out waiting for a response on the next call. Just set your socket timeout closer to your typical upper bound latency and retry on socket timeout.”
I don’t know how to do that with the Go client, though.
Does anyone with experience on opensearch-go have any guidance on setting the socket timeout to fail fast (and ideally retry on a fresh connection) when the server side of the connection goes down?
// ResponseHeaderTimeout, if non-zero, specifies the amount of
// time to wait for a server's response headers after fully
// writing the request (including its body, if any). This
// time does not include the time to read the response body.
ResponseHeaderTimeout time.Duration
So, I think that’s the setting that needs to be lowered to avoid timing out on stale connections.