Ingest AWS service logs (cloudtrail, VPC logs) to opensearch


I have a self managed cluster. Is there a documentation that has step by step guide on how to ship logs from AWS service (cloudtrail logs) to self managed OpenSearch project set up on EKS? Please help!

There are tons of articles on the internet which is why there probably is not a specific doc here, there are a few caveats like “your indexer will need to be able to index to opensearch etc” so YMMV.

The big thing here is that whatever is shipping the logs to Opensearch will need to be able to read from s3, and write to opensearch.
AWS itself has a decent doc of how you can do this: Visualizing AWS CloudTrail Events using Kibana | AWS Cloud Operations & Migrations Blog
However it goes a bit deep and not opensearch specific.
But the diagram there should be a decent representation of how the flow should look.

For my use-cases, we ship cloudtrail events to cloudwatch, that cloudwatch log group has a kinesis stream filter, logstash reads from kinesis (requires a dynamo table for checkpoints) then ships the logs to one of my opensearch clusters.

Hope that helps.

I’ve done this a while ago, and the Lambda function is on GitHub, you can use it as is: How to forward CloudTrail (or other logs from AWS S3) to Logsene - Sematext

The only caveat is that it’s made to work with Sematext Logs (because I work for Sematext), which is - and I’m oversimplifying here - OpenSearch for Logs as a Service. As part of Sematext Logs we have a syslog receiver (an rsyslog instance that we host). So you can just go ahead and use it :slight_smile:

If you want to do do a local setup with Lambda + rsyslog + OpenSearch, feel free to take that lamda, that effectively takes CloudTrail logs and parses the JSON, sending them over TCP syslog, then configure your own rsyslog to send logs to your own Elasticsearch. rsyslog is very light, it’s one of the reasons I like it. The rsyslog config would be a simplified version of the one in this recipe: Recipe: Apache Logs + rsyslog (parsing) + Elasticsearch - Sematext

It’s because there we tail files, parse unstructured data… in your case you’d only listen to TCP and forward all traffic to OpenSearch.