Parsing date with Data Prepper 2.0.1

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

Data Prepper 2.0.1
Fluent-Bit 2.0.9
OpenSearch 2.6.0
OpenSearch Dashboards 2.6.0
Docker-Compose 1.29.2

Describe the issue:

Hello together,

i want to setup log-ingestigation and i do have a question (probably several questions) on how i can parse/change a date coming from a logfile, so all the timestamps have the same format. The goal is to not use the timestamp created by fluent-bit or data prepper, but using the timestamp created by the application and using this time for a Time Filter on OpenSearch.

Unfortunately i do have different formats at the date, so OpenSearch is throwing an error (see below)

The parsing in general is working fine with all the other logfiles except schedule.log, server.log and client.log due of the differences in the date values.

I’m basically tailing 6 logfiles with fluent-bit, all with partly different arrangements and datestamps:

schedule.log

INFO 04.02.2023 23:00:00.474 (com.namespace.test): starting task ‘Clean up logs’ - schedule entry ‘Clean up logs’ (id=5)

server.log:

INFO 17.02.2023 07:59:16.810 (com.namespace.test): Uploaded 0 media file(s).

client.log

ERROR 30.06.2022 10:56:45.859 {uID=54,pID=58} (com.namespace.test): Client error: ERROR 30.06.2022 10:56:45.666 (com.namespace.test): The test() method has thrown exception:
Username (Firstname Lastname), session: sessionID, project: 1, ip: 127.0.0.1
Version=13;JDK=17;OS=Windows 11 amd64;Date=30.06.2022 10:56:45 (I)
java.lang.IllegalStateException: JSObject is not valid or already disposed.
at com.namespace.test(SourceFile:196)
at com.namespace.test(ChromeEngine.java:488)
at com.namespace.test$e.b(SourceFile:3783)
at com.namespace.test$e.onMessageReceived(SourceFile:5755)
at com.namespace.test(SourceFile:1085)
at com.namespace.test(SourceFile:69)
at com.namespace.test(SourceFile:79)
at com.namespace.test(Executors.java:539)
at com.namespace.test(FutureTask.java:264)
at com.namespace.test(ThreadPoolExecutor.java:1136)
at com.namespace.test(ThreadPoolExecutor.java:635)
at com.namespace.test(Thread.java:833)

access.log (Tomcat)

2022-07-12T08:09:52.729+0200 45 192.168.1.0 Username 0 304 “GET /home/javascript.js HTTP/1.1” “127.0.0.1, 127.0.0.2”

catalina.log (Tomcat)

2022-12-07T16:23:35.132+0100 INFO com.namespace.testcleanupCaches: Caches stats: authReqs=0 sessions=1 introspection=0

access.log (Apache2)

2023-03-07T01:19:01.166+0100 xxx xxx 192.168.0.2 “192.168.0.3” - - 232 74 242 382 - 0 200 “GET /test.html HTTP/1.0” “-” “Referrer”

Configuration:

pipelines.yaml:

log-pipeline:
workers: 2
delay: “5000”
source:
http:
ssl: false
port: 2021
health_check_service: true
authentication:
unauthenticated:

processor:

- add_entries:
    entries:
    - key: "environment"
      value: "dev"
    - key: "id"
      value: "c24"
      overwrite_if_key_exists: true

- grok:
    patterns_directories: [ "/usr/share/data-prepper/patterns" ]
    match:
      log: [ '%{DATESTAMP_EVENTLOG_ACCESS:time_config}%{SPACE}%{LOGLEVEL_OWN:log-level}(?<greedydata>(.|\r|\n)*)', '%{LOGLEVEL_OWN:log-level}%{SPACE}%{DATESTAMP_EVENTLOG_SERVER:time_config}(?<greedydata>(.|\r|\n)*)', '%{DATESTAMP_EVENTLOG_ACCESS:time_config} %{NUMBER:request.duration.ms:double} %{IP:remote.host.ip} %{USER:remote.authenticated.user} %{NUMBER:bytes.sent:int} %{NUMBER:http.status.code:int} \"(?:%{WORD:http.request.method} %{NOTSPACE:http.request.url}(?: HTTP/%{NUMBER:http.request.version})?|-)\"', '%{DATESTAMP_EVENTLOG_ACCESS:time_config} %{HOSTNAME:request.uri} %{URIHOST:request.uri} %{IP:lb.ip}(?<greedydata>(.|\r|\n)*)' ]

- delete_entries:
    with_keys: ["timestamp"]

sink:
- stdout:
(usually OpenSearch, but commented out)

pattern.conf:

LOGLEVEL_OWN (DEBUG|INFO|WARN|ERROR)
MILLISECONDS (\d){3,7}
DATESTAMP_EVENTLOG_ACCESS %{YEAR}-%{MONTHNUM}-%{MONTHDAY}T%{HOUR}:%{MINUTE}:%{SECOND}.%{MILLISECONDS}%{ISO8601_TIMEZONE}
DATESTAMP_EVENTLOG_SERVER %{DATE_EU}%{SPACE}%{HOUR}:%{MINUTE}:%{SECOND}.%{MILLISECONDS}
USERNAME [a-zA-Z0-9._-]+
USER %{USERNAME}

fluent-bit.conf:
(example with server.log)

[INPUT]
Name tail
Refresh_Interval 60
Path /logs/server.log
Path_Key logfile-origin
Ignore_Older 1m
multiline.parser java
Read_from_Head true
Skip_Long_Lines Off
Mem_Buf_Limit 25MB
Tag server-log

Relevant Logs or Screenshots:

data-prepper | 2023-03-02T10:16:31,850 [log-pipeline-sink-worker-2-thread-2] WARN org.opensearch.dataprepper.plugins.sink.opensearch.OpenSearchSink - Document [org.opensearch.client.opensearch.core.bulk.BulkOperation@1352c987] has failure.
data-prepper | java.lang.RuntimeException: failed to parse field [time_config] of type [date] in document with id ‘QBDSoYYB0DLbZ_tRy_Nd’. Preview of field’s value: ‘02.03.2023 11:16:21.912’ caused by failed to parse date field [02.03.2023 11:16:21.912] with format [strict_date_optional_time||epoch_millis] caused by Failed to parse with all enclosed parsers
data-prepper | at org.opensearch.dataprepper.plugins.sink.opensearch.BulkRetryStrategy.handleFailures(BulkRetryStrategy.java:163) ~[opensearch-2.0.1.jar:?]
data-prepper | at org.opensearch.dataprepper.plugins.sink.opensearch.BulkRetryStrategy.handleRetry(BulkRetryStrategy.java:118) ~[opensearch-2.0.1.jar:?]
data-prepper | at org.opensearch.dataprepper.plugins.sink.opensearch.BulkRetryStrategy.execute(BulkRetryStrategy.java:71) ~[opensearch-2.0.1.jar:?]
data-prepper | at org.opensearch.dataprepper.plugins.sink.opensearch.OpenSearchSink.lambda$flushBatch$2(OpenSearchSink.java:206) ~[opensearch-2.0.1.jar:?]
data-prepper | at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:89) ~[micrometer-core-1.9.4.jar:1.9.4]
data-prepper | at org.opensearch.dataprepper.plugins.sink.opensearch.OpenSearchSink.flushBatch(OpenSearchSink.java:203) ~[opensearch-2.0.1.jar:?]
data-prepper | at org.opensearch.dataprepper.plugins.sink.opensearch.OpenSearchSink.doOutput(OpenSearchSink.java:177) ~[opensearch-2.0.1.jar:?]
data-prepper | at org.opensearch.dataprepper.model.sink.AbstractSink.lambda$output$0(AbstractSink.java:38) ~[data-prepper-api-2.0.1.jar:?]
data-prepper | at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:89) ~[micrometer-core-1.9.4.jar:1.9.4]
data-prepper | at org.opensearch.dataprepper.model.sink.AbstractSink.output(AbstractSink.java:38) ~[data-prepper-api-2.0.1.jar:?]
data-prepper | at org.opensearch.dataprepper.pipeline.Pipeline.lambda$publishToSinks$3(Pipeline.java:247) ~[data-prepper-core-2.0.1.jar:?]
data-prepper | at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
data-prepper | at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
data-prepper | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
data-prepper | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
data-prepper | at java.lang.Thread.run(Thread.java:833) ~[?:?]

Also maybe you do have hints or recommendations about what i could change. I’m pretty new to grok/fluent-bit and OpenSearch itself, so i’m thankful for any comment! :slight_smile:

Hi @B3n, thank you for your interest in Data Prepper!

Regarding your issue, I was able to reproduce by following the information you provided. I think this is what’s happening here:

You are making great use of the grok patterns and your grok processor are working properly. The error actually happens after the logs have been processed (time_config field extracted) and when they are being sent to OpenSearch.

OpenSearch tries to parse time_config field to date type using format strict_date_optional_time or epoch_millis, and it works for your catalina.log because that date string complies with strict_date_optional_time foramt, but it doesn’t work for your server.log because that date string doesn’t complies with strict_date_optional_time format and hence the error message you saw. (See this doc for the date format mentioned here)

OpenSearch index can be configured to parse custom date string format, but I’m not sure we can configure that in Data Prepper currently. Let me double check and see if there’s another way to help with your use case here.

1 Like

Hey @oeyh,

thanks a lot for your reply!

I think that’s exactly the case, the server.log is having the format

27.01.2023 18:07:41.833 (which is like dd.MM.YYYY HH:mm:ss.SSS)

and my other logs as e.g. catalina.log

2023-03-02T12:55:33.137+0100 (which is like YYYY-MM-ddTHH:mm:ss.SSSz)

I’m happy with any solution, may it be with Data Prepper or also configurable inside OpenSearch with a parameter in put inside a config :smiley:

Hi @B3n, I think you can actually configure OpenSearch index mappings through Data Prepper: OpenSearch sink has a template_file option to configure index template (see reference here).

I did a simple test with this template json (see this doc for creating mapping for date field)

{
  "mappings": {
    "properties": {
      "time_config": {
        "type": "date",
        "format": "strict_date_optional_time||dd.MM.yyyy HH:mm:ss.SSS||epoch_millis"
      }
    }
  }
}

And I was able to read both server.log and catalina.log samples you provided above into OpenSearch with time_config being a date field.

Hope this helps!

3 Likes

Hey @oeyh,

thanks a lot for this! It is working like a charm :slight_smile:

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.