Concatenate different lines of logstash output into one line

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

Logstash v 7.10.0

Describe the issue:

Hi, I am new to logstash, but am trying to read in a non-formatted log with different information I need scattered around. I use the Grok filter to grab that data just fine, but it is spread across multiple json output objects. I need my output to have all of the information I grab in one single json.

Configuration:

I have 3 different grok filters inside of if statements so that I do not end up withhundreds of lines of output that are useless to me like so:

filter {
  if ([message] =~ /^Job/) {
    grok {
      match => { "message" => ['Job \<%{NUMBER:job_id}\> is submitted to queue \<%{WORD:queue}\>\.'] }
      add_field => {
        "group" => "1"
      }
      }
  }
  else if ([message] =~ /^Running test/) {
    grok {
      match => { "message" => ['Running test %{WORD:test} on block %{WORD:Block} with seed %{WORD:Seed}'] }
      add_field => {
        "group" => "1"
      }
    }
  }

Relevant Logs or Screenshots:

Log file lines:

Job <4471520> is submitted to queue <hw_queue>.

<<Waiting for dispatch ...>>

<<Starting on server>>

LSB_JOBNAME is /path/tests/test_2m_4s/511863904/simulate

presim

Running test test_2m_4s on block BLOCK with seed 511863904

Running this command:

<command>

TOOL: xrun(64) 21.12-a071: Started on Jan 31, 2023 at 12:52:33 CST

--------------------------------------------------------------------
Name                         Type                     Size  Value   
--------------------------------------------------------------------
...

Current output looks like:

{"queue":"hw_queue","job_id":"4471520"}
{"Block":"BLOCK","test":"test_2m_4s","Seed":"511863904"}
{"Year":"2023","Month":"Jan","Day":"31","TimeZone":"CST"}

Desired output:

{"queue":"hw_queue","job_id":"4471520", "Block":"BLOCK","test":"test_2m_4s","Seed":"511863904", "Year":"2023","Month":"Jan","Day":"31","TimeZone":"CST"}

I have tried a few things, but nothing has been working for me, any help would be appreciated.

If anyone has any suggestions for a solution to my problem, I would love to hear them in a reply, thanks.

I would try something down this path:

input {
  file {
    path => "/var/log/test.log"
    start_position => "beginning"
    codec => multiline {
      pattern => "(^Job)|(^Running.+)|(^TOOL)" // basically add patterns here to find all the lines you want to capture.
      what => "next"
    }
  }
}

Then from there see what the message is translated to as a single event then use a grok filter to parse that out…

1 Like

Thank you for your response!

From my understanding of the multiline codec and some tests I ran with your suggestion, it looks like that will grab the lines that I need, like this one:
Job <4471520> is submitted to queue <hw_queue>.
and append the line either directly after or before it. In this case it will be the one directly after (due to what => 'next') which is:
<<Waiting for dispatch ...>>
And that creates :
Job <4471520> is submitted to queue <hw_queue>.\n<<Waiting for dispatch ...>>
Rather than grabbing the next line I need and appending it like so:
Job <4471520> is submitted to queue <hw_queue>.\nRunning test test_2m_4s on block BLOCK with seed 511863904

Is there something I am doing wrong / misunderstanding?

I’m not sure, it would need testing obviously. I was only reading the docs here Multiline codec plugin | Logstash Reference [8.6] | Elastic so there may be some more bits there to test around negating etc to get it how you want. Hope this helps.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.