Encountered a retryable error (will retry with exponential backoff) {:code=>413, :url=>"https://xxxx:9200/_bulk", :content_length=>183953}

Hi Team,

I have Filebeat and Logstash installed locally in my machines.

Filebeat reads the logs in real time and sends to Logstash. As these files are temporary, it gets removed after completing its purpose. In order to make real time processing, I have increased the queue size to 32768 and bulk_max_size to 4096 in Filebeat.

In Logstash, increased the batch size and workers (only in 4 core machines) in order to process the logs as fast as it can, so that it won’t block Filebeat from sending the logs.

From Logstash, the processed logs are sent to a common Opensearch. Initially encountered the retryable error and on reducing the Logstash batch size from default 125 to 70 solved the issue. But now, as the logs has to be processed in real time, increased the batch size to 500 and the issue is observed again.

From Logstash output configuration, added two additional parameters
target_bulk_bytes => 10485760
http_compression => true

Is there any other way, this issue can be resolved?

Hi @Muthulakshmi - are there any notable errors in your log other than an indication of a retryable error? Let’s see if we can get any more information about what specifically went wrong.

Nate

1 Like

Hi @nateynate ,

There are connectivity issues observed after sending frequent retries.

[2022-04-11T07:26:17,510][ERROR][logstash.outputs.opensearch][main][] Encountered a retryable error (will retry with exponential backoff) {:code=>413, :url=>"https://xxxx:9200/_bulk", :content_length=>184027}
[2022-04-11T07:32:46,920][ERROR][logstash.outputs.opensearch][main][] Encountered a retryable error (will retry with exponential backoff) {:code=>413, :url=>"https://xxxx:9200/_bulk", :content_length=>184027}
[2022-04-11T07:32:54,415][WARN ][logstash.outputs.opensearch][main][] Marking url as dead. Last error: [LogStash::Outputs::OpenSearch::HttpClient::Pool::HostUnreachableError] OpenSearch Unreachable: [https://xx:xxxxxx@xxxx:9200/][Manticore::SocketException] Broken pipe (Write failed) {:url=>https://xx:xxxxxx@xxxx:9200/, :error_message=>"OpenSearch Unreachable: [https://xx:xxxxxx@xxxx:9200/][Manticore::SocketException] Broken pipe (Write failed)", :error_class=>"LogStash::Outputs::OpenSearch::HttpClient::Pool::HostUnreachableError"}
[2022-04-11T07:32:54,418][ERROR][logstash.outputs.opensearch][main][] Attempted to send a bulk request but OpenSearch appears to be unreachable or down {:message=>"OpenSearch Unreachable: [https://xx:xxxxxx@xxxx:9200/][Manticore::SocketException] Broken pipe (Write failed)", :exception=>LogStash::Outputs::OpenSearch::HttpClient::Pool::HostUnreachableError, :will_retry_in_seconds=>64}
[2022-04-11T07:32:55,233][ERROR][logstash.outputs.opensearch][main][] Attempted to send a bulk request but there are no living connections in the pool (perhaps OpenSearch is unreachable or down?) {:message=>"No Available connections", :exception=>LogStash::Outputs::OpenSearch::HttpClient::Pool::NoConnectionAvailableError, :will_retry_in_seconds=>2}
[2022-04-11T07:32:56,533][WARN ][logstash.outputs.opensearch][main] Restored connection to OpenSearch instance {:url=>"https://xx:xxxxxx@xxxx:9200/"}
[2022-04-11T07:46:52,190][ERROR][logstash.outputs.opensearch][main][] Encountered a retryable error (will retry with exponential backoff) {:code=>413, :url=>"https://xx:9200/_bulk", :content_length=>184027}
[2022-04-11T07:47:14,109][ERROR][logstash.outputs.opensearch][main][] Encountered a retryable error (will retry with exponential backoff) {:code=>413, :url=>"https://xxxx:9200/_bulk", :content_length=>184027}
[2022-04-11T07:52:56,477][WARN ][logstash.outputs.opensearch][main][] Marking url as dead. Last error: [LogStash::Outputs::OpenSearch::HttpClient::Pool::HostUnreachableError] OpenSearch Unreachable: [https://xx:xxxxxx@xxxx:9200/][Manticore::ClientProtocolException] Remote host terminated the handshake {:url=>https://xx:xxxxxx@xxxx:9200/, :error_message=>"OpenSearch Unreachable: [https://xx:xxxxxx@xxxx:9200/][Manticore::ClientProtocolException] Remote host terminated the handshake", :error_class=>"LogStash::Outputs::OpenSearch::HttpClient::Pool::HostUnreachableError"}
[2022-04-11T07:52:57,143][ERROR][logstash.outputs.opensearch][main][] Attempted to send a bulk request but OpenSearch appears to be unreachable or down {:message=>"OpenSearch Unreachable: [https://xx:xxxxxx@xxxx:9200/][Manticore::ClientProtocolException] Remote host terminated the handshake", :exception=>LogStash::Outputs::OpenSearch::HttpClient::Pool::HostUnreachableError, :will_retry_in_seconds=>2}
[2022-04-11T07:53:22,658][WARN ][logstash.outputs.opensearch][main] Restored connection to OpenSearch instance {:url=>"https://xx:xxxxxx@xxxx:9200/"}
[2022-04-11T07:53:22,695][ERROR][logstash.outputs.opensearch][main][] Attempted to send a bulk request but there are no living connections in the pool (perhaps OpenSearch is unreachable or down?) {:message=>"No Available connections", :exception=>LogStash::Outputs::OpenSearch::HttpClient::Pool::NoConnectionAvailableError, :will_retry_in_seconds=>4}
[2022-04-11T07:54:33,847][ERROR][logstash.outputs.opensearch][main][] Encountered a retryable error (will retry with exponential backoff) {:code=>413, :url=>"https://xxxx:9200/_bulk", :content_length=>184027}
[2022-04-11T07:54:54,207][ERROR][logstash.outputs.opensearch][main][] Encountered a retryable error (will retry with exponential backoff) {:code=>413, :url=>"https://xxxx:9200/_bulk", :content_length=>184027}

@Muthulakshmi - a response code of 413 means that the payload is too large. There might be another setting that needs changing here.

What is the current value of http.max_content_length on your common opensearch? My current thought is that you might have super-powered your ingestion so much that the bulk requests are actually too large (or specifically, the content-length HTTP header out-paced the http.max_content_length setting in opensearch.yml)

Does tuning the max_content_length variable give you any more headway?

Nate

@nateynate Sorry for the delayed response.

I got your point. At present, the opensearch configuration has a 10 MB limit. But the requests that are retried shows that the payload size is very much less (around 180 KB).

Can you please help me in understanding the reason behind this.

@Muthulakshmi : Did you find a workaround ? how are you setting the max_content_length in opensearch.yml ?

Hello Guys - I may have a theory on the issue.
The payload size 180 KB may be compressed but while posting at opensearch it may be > 10MB.

So in that case if we try to disable http compression it might help.

  1. Used default target_bulk_bytes in logstash.yml (20MB)
  2. Disable http.compression in opensearch.yml
  3. Define http.max_content_length: 100mb

I have also increased the logstash replicas in my k8s cluster.