Can we read the Numeric date from the string in Opensearch

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

Describe the issue:
I am trying to read the data using DQL from logs. But i am not able to read the number data. My goal is to calculate the total of all the number.

Below is the message :
{
“traceId”: “”,
“timestamp”: “2024-02-27T01:48:00.984Z”,
“functionVersion”: “$LATEST”,
“traceIndex”: 15,
“awsAccountName”: “none”,
“serviceVersion”: “latest”,
“envName”: “abc”,
“envType”: “abc”,
“pfcExecutionContextID”: “none”,
“awsAccountNumber”: “none”,
“awsRegion”: “none”,
“processName”: “FTA”,
“message”: “Total Bytes Transfered from spoke to hub : 2317”,
“severity”: “METERING”
}

i want the 2317 from “message”: “Total Bytes Transfered from spoke to hub : 2317”,

is there any way to read this numeric data from “message”: “Total Bytes Transfered from spoke to hub : 2317”,

Configuration:

Relevant Logs or Screenshots:

You can use grok ingest processor to do that, something like this:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "grok": {
          "field": "message",
          "patterns":["%{WORD} : %{NUMBER:count}"]
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "message": "Total Bytes Transfered from spoke to hub : 2317"
      }
    }
  ]
}

, the result is:

{
  "docs": [
    {
      "doc": {
        "_index": "_index",
        "_id": "_id",
        "_source": {
          "count": "2317",
          "message": "Total Bytes Transfered from spoke to hub : 2317"
        },
        "_ingest": {
          "timestamp": "2024-02-27T14:36:15.79856Z"
        }
      }
    }
  ]
}

, see the documentation: Grok processor | Elasticsearch Guide [7.10] | Elastic

Thanks for the quick response. One small question how we are going to add the Gork in opensearch. This is the plugin right?

No, it’s a built-in feature, you can create a ingest pipeline firstly, and then use it when indexing documents by bulk API:

  1. create a ingest pipeline
PUT _ingest/pipeline/my-pipeline
{
  "processors": [
      {
        "grok": {
          "field": "message",
          "patterns":["%{WORD} : %{NUMBER:count}"]
        }
      }
    ]
}
  1. Use that pipeline
POST _bulk?pipeline=my-pipeline

, most opensearch client and ingestion tools supports the pipeline parameter, you can set it directly after the pipeline is created.