Evaluating OpenSearch for centralized logging

l.scorcia · November 25, 2024, 11:38pm

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch 2.18 / Data Prepper / FluentBit / Ubuntu 24

Describe the issue:
Hi! I’m looking for a low-maintenance centralized on-prem application log monitoring solution for a large-ish non profit gov organization. We have a few hundred heterogeneous web applications (Tomcat/Java, Apache/PHP, IIS/.net, Django) completely on prem - our skies are fully cloud-free. The generated log data is about 5GB per day and I need to give selective read access to the logs via web to different teams (dev team A can read the logs of app A1, A2, A3, etc, dev team B can read B1, B2, etc). Users come from LDAP, groups can be application-managed. Logs usually come via syslog are mostly unstructured, but for some of them I’d like the possibility to declare some searchable fields. We don’t need fancy statistics or insight. I just need to be able to look at a specific application log, maybe filter for a date/time range and read/export those logs. Also I am interested in defining different retention policy for different applications, some apps should store logs for about a month, others for a year, etc.

I’ve been asking around for suggestions and most people mention the ELK stack - however ELK does not support getting users via LDAP without paid license. OpenSearch seems to fit all requirements, but I am struggling to figure out the best setup - certainly because I’m still not grasping all the jargon and details.

I’d like some pointers about how to organize the whole thing. Right now I am setting up a proof of concept in a VM. This is what I have as of now:

FluentBit receives data via Syslog (RFC5424) and ships it to Data Prepper via http;
Data Prepper has a single pipeline that ingests data via http and outputs to an OpenSearch sink in a single index called “application_logs”;
OpenSearch stores the log data and lets me do basic search using the syslog fields. The log message is saved unstructured in the ‘message’ field;
I can parse the logs with Data Prepper if I define a grok processor specifying the log message pattern;
OpenSearch Dashboard shows and queries the index in the default tenant.

So the basic idea works. Now I am facing some architectural/best practice questions:

If I ship everything into a single index, how do I prevent team B seeing logs from team A’s applications? Should I separate each application into a different index? Could this also help for the retention requirements?
If a single application has two or three log files, say an Apache access log and a Java log, what’s the best practice? Ship them to a single index? Or should each index reflect a single “data source” (and therefore could be parsed according to the origin format)?
If I define a single FluentBit instance with a syslog input and all the applications send their logs there, what’s the best way to ship the data into the indexes? Conditional routing on the syslog data, directing stuff to one of the 400 indexes?
If I decide to create fewer indexes, say defining a single “apache-access-log” index and a single “java-log” index, can I still hide documents from users depending on the application they belong to? How would I implement rotation policies this way?
If each application has a different log format, can I still use a single index? If I want to parse some fields from the log message it seems I have to use grok rules, but how do I dynamically choose which grok pattern to apply? Should each pattern have its own pipeline? It seems like a small mountain of configuration steps just to add a new application.
How do I specify data retention policies? I have seen Index policies, but it seems to apply to the entire index? What about keeping the index and rotating data?
Is there an easy way (e.g. via Dashboard) to export the output of a log search to its original text format, say to send it via email to a coworker? Seems a basic feature but I couldn’t find it in the UI.
Many articles apply platform configuration via POST requests and don’t use the Dashboard UI. Is this because Dashboard is meant to be a “presentation” tool only, or should I expect that common configuration options will have a corresponding Dashboards screen?

I apologize for this long post, but the documentation describes well what one “can” do, but there is very little info out there about what one “should” do, and most of the guides only look at the AWS perspective (understandable, but not really applicable to the on-prem world). I confess I am a bit lost.

Thanks!

mgelszinnis · November 27, 2024, 3:25pm

I think I can share some advice regarding your questions:

If I ship everything into a single index, how do I prevent team B seeing logs from team A’s applications? The feature you would be looking for is called document level security Should I separate each application into a different index? Could this also help for the retention requirements? I you want to keep different data for different periods of time, then splitting your logs into seperate indices is mandatory in my opinion. Deleting old data from an index is very expensive, as that will basically reindex the whole index and has no integrated support for automating that.
If a single application has two or three log files, say an Apache access log and a Java log, what’s the best practice? Ship them to a single index? Or should each index reflect a single “data source” (and therefore could be parsed according to the origin format)? This is only necessary if you expect frequent field type conflicts between your different logs. I.e. apache and java write fields with same name but different data types.
If I define a single FluentBit instance with a syslog input and all the applications send their logs there, what’s the best way to ship the data into the indexes? Conditional routing on the syslog data, directing stuff to one of the 400 indexes? We route by origin, so each application gets their own index
If I decide to create fewer indexes, say defining a single “apache-access-log” index and a single “java-log” index, can I still hide documents from users depending on the application they belong to? How would I implement rotation policies this way? As mentioned above document level security would be your solution, but implementing differing deletion policies for your different applications is very hard that way and I would not recommend that.
If each application has a different log format, can I still use a single index? If I want to parse some fields from the log message it seems I have to use grok rules, but how do I dynamically choose which grok pattern to apply? Should each pattern have its own pipeline? It seems like a small mountain of configuration steps just to add a new application. I would advise against implementing grok patterns for all applications yourself. That is a lot of work and you will be the bootleneck unless you let developers contribute their own via a pull request. I would recommend that the applications switch to json logging as Fluentbit can parse that directly with little configuration effort and developers can decide on their data presentation. If you have logs that are always the same, so your apache logs, you can take advantage of in-built parsers (not sure if Fluentbit has one for Apache, but vector has). Otherwise route by source to your grok patterns.
How do I specify data retention policies? I have seen Index policies, but it seems to apply to the entire index? What about keeping the index and rotating data? As I mentioned above, deleting data from an index frequently is not something you really want to do. It is very slow. OpenSearch does not work like a Relational Database, it will rewrite the whole index as Lucene works in segment files that represent a chunk of written data and are immutable after being written to disk. The original file structure of your data isn’t present in opensearch as all data is structured in a so called reverse index. Try to structure your system so that you can delete whole indices.
Is there an easy way (e.g. via Dashboard) to export the output of a log search to its original text format, say to send it via email to a coworker? Seems a basic feature but I couldn’t find it in the UI. There is the reporting feature in discover (top right), but it is limited to 10.000 documents, so not suitable for all cases. I have written a script that takes in a query and calls the scroll api to enable larger exports, but this will not produce the original file as this works with json representation from OpenSearch. If possible use the share functionality in discover.
Many articles apply platform configuration via POST requests and don’t use the Dashboard UI. Is this because Dashboard is meant to be a “presentation” tool only, or should I expect that common configuration options will have a corresponding Dashboards screen? No, many advanced options do not have a visual configuration panel and are only accesible via the REST API. But you can take a look around the UI in the index management, dashboards management and security menus there is ui for a couple essential things.

Mantas · November 27, 2024, 3:29pm

Hi @l.scorcia,

When it comes to access control to the data in your indices you can employ Document-level security (DLS), Field-level security (FLS) (to control access beyond just index permissions) and Field masking (to protect/hide sensitive data).

Please see more here:

Best,
mj

Topic		Replies	Views
OpenSearch on ELK stack General Feedback	5	1316	November 15, 2023
OpenSearch Api to feed log files to OpenSearch OpenSearch	10	2861	January 28, 2025
Data Prepper Log Ingestion Demo Guide Data Prepper	1	617	February 7, 2023
Lots of beginner questions OpenDistro	11	2116	June 14, 2021
Hello team, Good day..!, Need support on fluent bit and data prepper configuration OpenSearch troubleshoot , configure	2	332	November 27, 2023

Evaluating OpenSearch for centralized logging

Related topics