Concatenate different lines of logstash output into one line

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

Logstash v 7.10.0

Describe the issue:

Hi, I am new to logstash, but am trying to read in a non-formatted log with different information I need scattered around. I use the Grok filter to grab that data just fine, but it is spread across multiple json output objects. I need my output to have all of the information I grab in one single json.

Configuration:

I have 3 different grok filters inside of if statements so that I do not end up withhundreds of lines of output that are useless to me like so:

filter {
  if ([message] =~ /^Job/) {
    grok {
      match => { "message" => ['Job \<%{NUMBER:job_id}\> is submitted to queue \<%{WORD:queue}\>\.'] }
      add_field => {
        "group" => "1"
      }
      }
  }
  else if ([message] =~ /^Running test/) {
    grok {
      match => { "message" => ['Running test %{WORD:test} on block %{WORD:Block} with seed %{WORD:Seed}'] }
      add_field => {
        "group" => "1"
      }
    }
  }

Relevant Logs or Screenshots:

Log file lines:

Job <4471520> is submitted to queue <hw_queue>.

<<Waiting for dispatch ...>>

<<Starting on server>>

LSB_JOBNAME is /path/tests/test_2m_4s/511863904/simulate

presim

Running test test_2m_4s on block BLOCK with seed 511863904

Running this command:

<command>

TOOL: xrun(64) 21.12-a071: Started on Jan 31, 2023 at 12:52:33 CST

--------------------------------------------------------------------
Name                         Type                     Size  Value   
--------------------------------------------------------------------
...

Current output looks like:

{"queue":"hw_queue","job_id":"4471520"}
{"Block":"BLOCK","test":"test_2m_4s","Seed":"511863904"}
{"Year":"2023","Month":"Jan","Day":"31","TimeZone":"CST"}

Desired output:

{"queue":"hw_queue","job_id":"4471520", "Block":"BLOCK","test":"test_2m_4s","Seed":"511863904", "Year":"2023","Month":"Jan","Day":"31","TimeZone":"CST"}

I have tried a few things, but nothing has been working for me, any help would be appreciated.

If anyone has any suggestions for a solution to my problem, I would love to hear them in a reply, thanks.

I would try something down this path:

input {
  file {
    path => "/var/log/test.log"
    start_position => "beginning"
    codec => multiline {
      pattern => "(^Job)|(^Running.+)|(^TOOL)" // basically add patterns here to find all the lines you want to capture.
      what => "next"
    }
  }
}

Then from there see what the message is translated to as a single event then use a grok filter to parse that out…

Thank you for your response!

From my understanding of the multiline codec and some tests I ran with your suggestion, it looks like that will grab the lines that I need, like this one:
Job <4471520> is submitted to queue <hw_queue>.
and append the line either directly after or before it. In this case it will be the one directly after (due to what => 'next') which is:
<<Waiting for dispatch ...>>
And that creates :
Job <4471520> is submitted to queue <hw_queue>.\n<<Waiting for dispatch ...>>
Rather than grabbing the next line I need and appending it like so:
Job <4471520> is submitted to queue <hw_queue>.\nRunning test test_2m_4s on block BLOCK with seed 511863904

Is there something I am doing wrong / misunderstanding?

I’m not sure, it would need testing obviously. I was only reading the docs here Multiline codec plugin | Logstash Reference [8.6] | Elastic so there may be some more bits there to test around negating etc to get it how you want. Hope this helps.