Data Prepper is not providing output in json format and not structuring data acc to patterns passed to grok processor

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Docker image version of data-prepper: latest

Describe the issue:
Hello everyone, I am quite new to Dataprepper and opensearch.

fluentbit->Data-prepper->Open search
Am using the above data flow procedure. Data prepper is not giving the output in json format. I have used grok processor for structuring the data, but the output is not in structured format.

Any help would be greatly appreciated

Configuration:

This is my pipelines.yaml file
log-pipeline:
workers: 10
delay: 100
buffer:
bounded_blocking:
buffer_size: 1024000
batch_size: 10000
source:
http:
ssl: false
processor:
- grok:
match:
message: [‘%{YEAR:year} %{MONTH:month} %{DAY:day} %{HOUR:hour}:%{MINUTE:minutes}:%{SECOND:seconds} %{DATA:Type} %{NUMBER:ID}%{DATA:message} – %{GREEDYDATA:MessageType}’]
sink:
- opensearch:
hosts: [ “https://example.com:xxxx”]
insecure: true
username:
password:
index: my_logs

Relevant Logs or Screenshots:

This is my input data

2024 May 9 06:39:28.730 network 0.1 Packet – received message
2024 May 9 06:39:29.370 network 2.2 Packet – request message
2024 May 9 06:39:30.010 network 3.6 Packet – accept message
2024 May 9 06:39:32.090 network 0.8 Packet – attach message
2024 May 9 06:39:33.370 network 1.9 Packet – reject message

Output i received:

{“date”:1.716227466308855E9,“log”:“2024 MAY 9 06:39:29.050 network 0.1 Packet – received message\r”}
{“date”:1.716227466308853E9,“log”:“2024 MAY 9 06:39:28.730 network 2.2 Packet – request message\r”}
{“date”:1.716227466308855E9,“log”:“2024 MAY 9 06:39:29.370 network 3.6 Packet – accept message\r”}
{“date”:1.716227466308856E9,“log”:“2024 MAY 9 06:39:30.010 network 0.8 Packet – attach message\r”}
{“date”:1.716227466308856E9,“log”:“2024 MAY 9 06:39:32.090 network 1.9 Packet – reject message\r”}

Expected Output:

{
“hour”: “06”,
“minutes”: “39”,
“seconds”: “29.370”,
“year”: “2024”,
“month”: “May”,
“day”: “9”,
“Type”: “network”,
“ID”: “0.1”,
“message”: “packet”,
“MessageType”: “received message”

}

The field you want to grok is “log”, not “message”, so try to change the grok processor config to:

...
  processor:
    - grok:
        match:
          log: {your-grok-pattern-here}
...

Hii oeyh,

I have tried with the “log” instead of “message” it didn’t work, the output remains same.

Output:
{“date”:1.716227466308855E9,“log”:“2024 MAY 9 06:39:29.050 network 0.1 Packet – received message\r”}
{“date”:1.716227466308853E9,“log”:“2024 MAY 9 06:39:28.730 network 2.2 Packet – received message\r”}
{“date”:1.716227466308855E9,“log”:“2024 MAY 9 06:39:29.370 network 3.6 Packet – received message\r”}
{“date”:1.716227466308856E9,“log”:“2024 MAY 9 06:39:30.010 network 0.8 Packet – received message\r”}
{“date”:1.716227466308856E9,“log”:“2024 MAY 9 06:39:32.090 network 1.9 Packet – received message\r”}

Expected output:
{
“hour”: “06”,
“minutes”: “39”,
“seconds”: “29.370”,
“year”: “2024”,
“month”: “May”,
“day”: “9”,
“Type”: “network”,
“ID”: “0.1”,
“message”: “packet”,
“MessageType”: “received message”

}

Hi @kavya, looks like the grok pattern doesn’t match so the expected fields are not generated. Try adjusting the pattern. This one may work:

'%{YEAR:year} %{WORD:month} %{MONTHDAY:day} %{HOUR:hour}:%{MINUTE:minutes}:%{SECOND:seconds} %{DATA:Type} %{NUMBER:Id} %{DATA:message} - %{GREEDYDATA:MessageType}'

The changes I made:

  • MONTH → WORD, MONTH doesn’t match month with all caps
  • DAY → MONTHDAY, DAY is for Monday, Tuesday, etc.
  • Added a space between %{NUMBER:ID} and %{DATA:message}

See this link for pattern definitions.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.