Connection reset by peer

Hi,
we are using Opendistro for ES 0.10.0
We have Spring Boot Java applications which connect to ES using

org.elasticsearch.client.RestClientBuilder;

and

org.elasticsearch.client.RestHighLevelClient;

We have deployed Spring Boot applications and ES as PODs within our k8s 1.16.8 cluster.
Everything seems ok except that sometimes we receive this error:

Wrapped by: java.io.IOException: Connection reset by peer at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:728) at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235) at org.elasticsearch.client.RestClient.performRequest(RestClient.java:198) at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:522) at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:508) at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:404)

The issue arises when our Spring Boot PODs do not use their connection to ES for a while (30/40 mins): then it seems the connection has been reset for some reasons and we get the error at issue.

I have googled a lot on this but found no helpful suggestions.

TCP keep alive connection is active on ES.

Any help appreciated

Alessio

This is to be expected when the connection has been idle for a long time. TCP keep-alive is best-effort and does not provide any guarantees that the connection will remain open. The actual behavior depends a lot on your network setup, and how your proxies and load balancers work.

Your client code should retry the request when it sees this failure.

Actually that was not expected as we did not have that issue in other enviornment in which we used AWS managed ES.

In the end we solved by adding:

    - net.ipv4.tcp_keepalive_time=200
    - net.ipv4.tcp_keepalive_intvl=60

to the statefulset