How to send logs kafka to opensearch

So, need understanding as if I send logs through beats to Kafka then Kafka will send the logs to OpenSearch. So, is that possible if yes so how to do that, if not then why?

I think it’s more typical to also have something like logstash or fluentd between beats/kafka and opensearch. This article describes a setup for ES/Kibana, but you could easily substitute OpenSearch:

1 Like

@searchymcsearchface Thanks for the reply, why are you suggesting logstash between Kafka & Opensearch can we eliminate that?

The reason is Kafka can spool your messages to Opensearch. If you need to take OS offline for maintenance, Kafka can spool the logs until OS is back (amongst other reasons).
Our configuration is:
Beats and Syslog Devices > Logstash-Ingestors > Kafka > Logstash-Consumers > OpenSearch (OS).
For doco - check the logstash doco for Kafka input and output plugins.

1 Like

does this relate to this thread?

@kris looks like we are going to need help forking that like we did the clients - also seems like there are real use cases for this in the community:)

1 Like

Thanks for the reply, I am bit confused why exactly logstash required in this case as Kafka can also work as ETL tool like logstash so we can completely eliminate logstash and use Kafka only right?

If not what functionality that logstash provides which Kafka cannot?

Technically you could use an existing kafka connect sink (I don’t know of one for opensearch) or write a kafka consumer that does the changes you require. At logz.io we have a logging microservice that is the kafka consumer, we do all our ETL there using, as well as other internal logic, our opensource library for parsing logs: Sawmill.

1 Like

@amitai @searchymcsearchface

I see, so as you are saying we can eliminate logstash right? so as per our use case we are creating streaming data architecture as I am just worried as if we use logstash as per this architecture -
beats → kafka → logstash → opensearch, so logstash will able to handle backpressure or not?

Also, If we eliminate logstash in streaming data architecture - beats → kafka → opensearch so in this, are we able to send logs from kafka to opensearch directly and are we able to handle backpressure and all the functionality that logstash provided as a ETL tool as in this case we are eliminating logstash.

So, Basically I wants to eliminates logstash in streaming data architecture and handle things with only kafka, but need a conclusion if I can eliminate logstash or not, and why?

There is nothing special about ingesting with Logstash. You would need a connector/consumer for ETL. This can be done in many different ways and is highly dependant on your use case.

1 Like

@amitai, Thanks for the reply.
So, Kafka can able to directly push data into opensearch?

this I don’t know. In theory, yes, since there is a kafka connect sink for elasticsearch. For opensearch you may need to write a new one. This is probably why most opt to use Logstash instead. The link I provided to the thread suggests a connector may be a popular request.
Are you managing kafka on your own or using confluent?
A kafka connector or Logstash or whatever are always just instances that do the ETL. You will always have something there as Opensearch doesnt poll data on its own:)

1 Like

@amitai Thanks for the quick reply.

Yes, so far we have not decided but mostly we will be leaning towards managing Kafka on our own. I think it’s clear as for now we don’t have any ready-made Kafka connector to use with OpenSearch that’s why logstash/fluentd make our things easier, else there is always an open solution to creating our own Kafka connector like elastic search for OpenSearch as well.

great input @amitai - let’s see what can be done

@amitai @searchymcsearchface
Can you gave high level idea when to use logstash and when to use fluentd? which is more better approach to go with and why?

Here is an helm chart that supports the case your are looking at and can be used for experimentation purposes K8S Logging stack with Opensearch (featuring kafka and fluentd)

1 Like

Thanks, if I am talking about elastic beats, then with elastic beats as a lightweight shipper fluentD fit with it or not?

As per our architecture there are two approaches and we want to decide one out of that.

  1. elastic beats → kafka → logstash → opensearch.
  2. elastic beats → kafka → fluentD → opensearch.

Which one is more suitable?

If you aren’t already deeply involved in beats + logstash, I would look elsewhere for a few reasons:

  1. Beats seems to be winding down to a degree and is questionable overall with OpenSearch. A) The originators of Beats, Monica and Tudor, have left Elastic to start their own (unrelated) company and there is a non-OSS technology filling the same niche within the proprietary ES stack. So, while this is conjecture on my part, I wouldn’t personally bet on a vibrant future for Beats. B) Currently, the most recent version of Beats has an explicit ES check in which blocks OpenSearch. Eliminating this would require an extensive forking process that no one has the appetite for at the moment. The versioning scheme for Beats is lock step with ES, so there is no telling what will happen or if old versions (compatible with OpenSearch) will be patched.
  2. Logstash is a useful tool and there is a path forward with it (OpenSearch output plugin). However, it’s written in Ruby (a fine language, but not always known for high performance and a bit of an outlier in this part of the stack) and it has the same idiosyncratic versioning that the rest of ES has (breaking changes can and do come in minors and it’s driven by the ES release cycle).

If I were starting out greenfield, I would pick Fluent Bit in place of beats as it’s light weight and a CNCF project - it’s not going anywhere. As an aggregator, think about Fluentd (CNCF, but Ruby) or Data Prepper (under the OpenSearch umbrella, written in Java).

6 Likes

@searchymcsearchface Thanks for sharing your point of view. That’s really helpful.

It seems fluent bit can work as an alternative to elastic beats but need some more understanding as elastic beats have several beats like filebeat, metricbeat, packetbeats, auditbeat, heartbeat, and functionbeat. So fluent bit is capable to manage all such variety of beats in it or not? and is it really a complete wrapper of the fluent bit where it manages all such types of beats variations?

Just wanted to know if the fluent bit is capable to provide all features and functionality that elastic beats provide or not? What is the major decision-making point, that we should not use a fluent bit and use elastic beats, if there are not any then it seems that fluent bit is the best fit here with fluentD.

Here are all the inputs for fluent bit (it’s pretty comprehensive):

I think the only thing you don’t really get out of the box with Fluent Bit that you do get with the beats family is built in dashboards. You can make those dashboards yourself of course though.

1 Like

@searchymcsearchface Thanks again.

Also, is it possible to provide support of some 3rd party data sources in a fluent bit? like it is possible in elastic beat.

I am listing down a few of them here -
Data sources - Splunk, g-suite, office 365, Guard Duty, CloudTrail, Azure, Okta, Thread Intel, etc.

How should we see the support of these types of data sources and many others in a fluent bit?