Hello @sputmayer ,
Thank you for your interest in Data Prepper. This should be possible using the grok
processor. And then, if you’d like to extract the JSON, you can use the parse_json
processor.
First, I noticed that your JSON lines start with a parenthesis instead of curly braces.
213123132 2023-08-02T23:56:00.000Z (“key2”:{“key3” : “value1”}}
I’m going to assume this is a copy-paste error. If not, we can discuss further on solutions.
OK. Let’s get into the solution.
If we just use grok
, you can have a configuration similar to the following.
grok-pipeline:
source:
file:
path: /usr/share/test.log
record_type: event
processor:
- grok:
match:
message: ['%{INT:number:int} %{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:json}']
sink:
- stdout:
The key part is this grok pattern: '%{INT:number:int} %{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:json}'
. It will look for an integer value, then an ISO-8601 timestamp. Finally, it gets the rest of the data and puts it into a field named json
.
If you run this, you will get output like the following:
{"message":"213123131 2023-08-02T23:56:00.000Z {\"key\":{\"key1\" : \"value1\"}}","number":213123131,"json":"{\"key\":{\"key1\" : \"value1\"}}","timestamp":"2023-08-02T23:56:00.000Z"}
{"message":"213123132 2023-08-02T23:56:00.000Z {\"key2\":{\"key3\" : \"value1\"}}","number":213123132,"json":"{\"key2\":{\"key3\" : \"value1\"}}","timestamp":"2023-08-02T23:56:00.000Z"}
You will see that the json
key has the JSON as a string. If you’d like to parse you can use the parse_json
processor. This next example completes that, and also deletes the original message
and the JSON string which was found in json
above.
grok-pipeline:
source:
file:
path: /usr/share/test.log
record_type: event
processor:
- grok:
match:
message: ['%{INT:number:int} %{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:json}']
- parse_json:
source: json
- delete_entries:
with_keys:
- message
- json
sink:
- stdout:
Running this on your input yields:
{"number":213123131,"timestamp":"2023-08-02T23:56:00.000Z","key":{"key1":"value1"}}
{"number":213123132,"timestamp":"2023-08-02T23:56:00.000Z","key2":{"key3":"value1"}}
You can use this a starting point to manipulate the data as you see necessary from here. Feel free to reach out with more questions.