Filebeat slow registry issues

Hello Team,
I have an Opensearch cluster with 10 Opensearch Indexers running on Kubernetes.
I have Logstash in the middle to process all the logs. Who send the logs is Filebeat, but the delay in the log ingestion on Opensearch is huge.
Checking all, I can see the Filebeat is slow on harvesting the log file.

This is my configuration:

filebeat:
  inputs:
	- type: log
	  paths:
		- "/var/log/log_file.json"
	  close_removed: true
	  clean_inactive: 7h
	  clean_removed: true
	  ignore_older: 6h
output:
  logstash:
	  hosts: ["logstash-oss:5000"]
	  bulk_max_size: 4096
	  worker: 15
	  compression_level: 3
	  pipelining: 2
queue:
  mem:
	flush:
	  min_events: 2048
	  timeout: 1s

I added the the clean and flush options to solve the issue but it does not worked.
When I check the Filebeat registry for all the Filebeat instances, I see this:

> for filebeat in $filebeats; do echo $filebeat; echo $(($(($(date +%s) - $(k-alias -n $ENV exec $filebeat -- bash -c "find /var/lib/filebeat/registry/filebeat/ -type f -regex '.*\/[0-9]+.json' -exec jq '.[-1].timestamp[1]' {} \;")))/60)); done
filebeat1-0
62
filebeat-0
1662036
filebeat-1
153
filebeat-10
201
filebeat-11
221
filebeat-2
1661991
filebeat-3
1662036
filebeat-4
1662036
filebeat-5
1661991
filebeat-6
177
filebeat-7
114
filebeat-8
332
filebeat-9
53

And these numbers keeps increasing.

I tried:

  • Increasing the number of Logstash instances to help ingest the logs
  • Increasing the number of Opensearch nodes.
  • Restarting Filebeat services
  • Restarting Filebeat pods
  • Touching the Filebeat options

But nothing solved the issue. Can you help me with this?

Welcome back! Looks like your last post was around two years ago! :slight_smile:

Experimenting with the flush and refresh interval on your Opensearch indices might help out a bit here. Usually new documents written to an index are added to a new Lucene segment, but those are stored in memory.

One of our partners, opster.com has a good guide on this here: OpenSearch Flush, Translog & Refresh - A Complete Guide

Some of these permanent index options might help as well: Refresh index - OpenSearch Documentation

Let me know if any of these index settings help you out!

Hello @nateynate
Thank you for your answer and welcome.
I will give it a try and let you know!

Hello @nateynate
I was reading about the refresh and flush (btw it is better explained in this blog post), and I do not think this could be helpful, since the refresh is already by default at 1s, and I do not think it could be less than that.
On the other hand, the refresh is to write the data from memory into the segments, and the delay is before the data reach the Opensearch, so it is hard for me to understand the application of these settings on this issue.

Can it be because of resources? maybe I need to escalate? Is there any other setting I can play with?

Any Ideas?

Hello,
For the moment we added one more node to the cluster, hoping for this to help to ingest faster, but I would like to know if there are any tunning options you can suggest in order to avoid these kinds of issues.
Thank you.

I think in order to find out just where the bottleneck is, It might be helpful to do some investigation on system utilization outside of OpenSearch itself just to make sure you’re not running out of actual horsepower. Command line utilities like top, iostat and vmstat can give you summaries on how many threads are running, how much free memory is left, and for what the memory is being used. The goal would be to find out if you’re being bound by CPU or disk input/output somewhere.

A second place to look might be in fine-tuning of Filebeat itself. Is filebeat trying to watch a lot of very small log files? Or watching very few files that grow very large in size? I’m not intimately familiar with Filebeat, but I’m sure there might be some things you can tune.

I found this video from a quick web search. I’m sure there’s more out there. There has to be some kind of balance that you can find between things like how many documents are being sent to the bulk api at once, how many workers there are, etc.

Best of luck!

1 Like