[Solved] Fluentd/Opensearch - How to ingest nested types?

I have configured a index to contain a nested field type, such that we can later do more complex queries on the nested field. I basically used:

PUT app.backend.app_logs
{
  "mappings": {
    "properties": {
      "related_objects": {
        "type": "nested" 
      }
    }
  }
}

My problem now is that I cannot send logs from fluentd to opensearch anymore, as opensearch rejects the log messages. I assume that’s because fluentd has field called “related_objects” but that’s just a (JSON) string.

Example log:

fluentd | 2022-07-07 10:28:14 +0200 [warn]: #0 dump an error event: error_class=Fluent::Plugin::OpenSearchErrorHandler::OpenSearchError error="400 - Rejected by OpenSearch" location=nil tag="filter.app.backend.app_logs" time=2022-07-07 10:27:05.310000000 +0200 record={"docker.container_id"=>"c3fd494fbd9e6f08dfa1e8bdbd57bf1a94422805f090141df7a94dece52fcfad", "time"=>"2022-07-07T08:27:05.310555466Z", "stream"=>"stdout", "docker.container_started"=>"2022-07-06T13:13:32.633261163Z", "docker.container_image"=>"image_path", "docker.container_name"=>"intranet-worker", "module"=>"org-worker", "severity"=>"DEBUG", "file"=>"logger.py", "function"=>"event_user_accepted_offer", "message"=>"Found 0 other registrations to be deleted", "related_objects"=>"{\"briefing_offer\": 24}", "target_index"=>"app.backend.app_logs"}

My question is how to ingest the nested object from fluentd? I’m sure there is some fluentd configuration I need to do, but as far as I understood the documentation, there is no object data type (just array, but its not an array).

In case it matters, I’m using the opensearch output plugin with the following configuration:

<match **>
  @type opensearch
  target_index_key target_index
  host opensearch-node1
  include_timestamp true
  port 9200
  scheme https
  user fluentd
  password xxx
  ssl_verify false
  index_name xxx
  ca_file /fluentd/etc/ssl/ca.crt.pem
</match>

Any help would be appreciated.

The solution is actually so much simpler than I thought. Having the original log parsed and having the JSON in a field as string, I just need to parse this field and add the nested content back to it:

# Parse nested data in backend logs
<filter filter.app.backend.app_logs>
  @type parser
  key_name related_objects
  hash_value_field related_objects
  reserve_data true
  reserve_time true
  <parse>
    @type json
  </parse>
</filter>

That did the trick, now the logs are accepted again and I can do nested queries as expected.

2 Likes