Groupby in opensearch

Good afternoon, I am doing an integration of zeek with wazuh.

I have managed to create an inventory in zeek which collects the services that a single computer has and sends it to an isolated index called inventory.

The idea is that every day it sends a list of active cad with the services it has collected that day.

The final goal would be to create a view or a table that allows to make a group by of an asset, to get the services collected in all the days. In such a way that if an asset on day 1 has the ssh and http service, but on day 2 only detects ftp, the final result would be for that asset all 3 services have been found.

How could I do it?

Thanks very much in advance.

Hello @pacogomez,

I think the thing you are looking for are aggregations - in your case the Terms aggregation specifically: Terms - OpenSearch Documentation

In your case you would apply this aggregation to the field which contains the name of the service (for example service: “httpd”) and execute the query with the specified time-range. Every returned key in the terms aggregation would be a distinct service running in the queried time-frame.

BR,
Andreas

Good afternoon, I have just tried to take this path before.
The problem is that I can’t manage to group the terms correctly.

First I created this basic query in which I simply get all the ips of the computers found and it works correctly:

GET /wazuh-alerts-4.x-inventory-2024.03.20/_search
{
“size”: 0,
“aggs”: {
“group_by_category”: {
“terms”: {
“field”: “data.ip”
}
}
}
}

THE problem is that I can’t get the following query to work correctly:

GET /wazuh-alerts-4.x-conn-2024.03.18/_search
{
“size”: 0,
“aggs”: {
“ips”: {
“terms”: {
“field”: “data.ip”,
“size”: 10
},
“aggs”: {
“services”: {
“terms”: {
“field”: “data.services.keyword”,
“size”: 10
}
}
}
}
}
}

Could you guide me to locate the problem?

Thanks!

Hello @pacogomez,

Are you using a different index in your 2nd GET request on purpose - as far as I can tell you saved your data into the “wazuh-alerts-4.x-inventory-2024.03.20” index, but are searching the “wazuh-alerts-4.x-conn-2024.03.18” index in the 2nd request?

Also, does grouping only by the data.services.keyword work as expected, so would this return all services across all IPs as expected?

just a side note but it would be helpful using the “preformatted text” option when pasting get-request, logs etc. since it is easier to copy and read, like this for example:

{
    "test": "test"
}

BR,
Andreas

The indexes should be the same, it is a mistake.

but it happens to me exactly the same thing, I attach the result of the second code that I mention to you that does not make correctly the grouping:

GET /wazuh-alerts-4.x-inventory-2024.03.20/_search
{
  "size": 0,
  "aggs": {
    "group_by_category": {
      "terms": {
        "field": "data.ip",
        "size": 10
      },
      "aggs": {
        "services": {
          "terms": {
            "field": "data.port_proto.keyword",
            "size": 10
          }
        }
      }
    }
  }
}

Result:

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 114,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "group_by_category": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 94,
      "buckets": [
        {
          "key": "0.0.0.0",
          "doc_count": 2,
          "services": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": []
          }
        },
        {
          "key": "63.245.209.91",
          "doc_count": 2,
          "services": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": []
          }
        },
        {
          "key": "63.245.209.105",
          "doc_count": 2,
          "services": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": []
          }
        },
        {
          "key": "64.4.20.169",
          "doc_count": 2,
          "services": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": []
          }
        },
        {
          "key": "65.54.95.64",
          "doc_count": 2,
          "services": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": []
          }
        },
        {
          "key": "65.54.95.198",
          "doc_count": 2,
          "services": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": []
          }
        },
        {
          "key": "65.54.234.75",
          "doc_count": 2,
          "services": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": []
          }
        },
        {
          "key": "65.55.16.121",
          "doc_count": 2,
          "services": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": []
          }
        },
        {
          "key": "65.55.18.18",
          "doc_count": 2,
          "services": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": []
          }
        },
        {
          "key": "65.55.184.155",
          "doc_count": 2,
          "services": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": []
          }
        }
      ]
    }
  }
}

As you can see, it does not show the grouping of services.

Thanks in advance.

I only had to use the protocol field as text and not as a keyword.

It works correctly, but now I have the doubt if this could be added to a dashboard by means of a view.

1 Like

If possible I would like to display this visualization as if it were the following discover menu:

Maybe try to create a Data Table with the following configuration:

  1. Create (if not already existing) an index pattern which covers all your inventory indices ( wazuh-alerts-4.x-inventory-* I would guess)
  2. Crate a Data Table visualization using the index pattern created before
  3. To the right in the “Bucket” menu add a “split table” using the “Terms” aggregation and the “data.ip” field
  4. Now add a “Split row” as well in the “Bucket” menu using “Terms” as an aggregation again and select the data.services Field

This should create a table per IP for the specified time-range and which services were present. Note this does not include a histogram. You could also use a “split row” in step 3 instead of a “split table” - play around with the parameter and find something that looks right for you.

Edit: I can only recommend the “kibana enhanced tables” plugin for OpenSearch Dashboards - we exclusively use those tables now due to superior UX:

BR,
Andreas

Good afternoon, I have achieved the following:

But is it not possible to get this in list format?

Thanks!

Hello @pacogomez,

did you use ports and protocol instead of the data.services? And yes it list format would work if you replaced the “split table” with a “split row”. Of course it depends on how your data is ingested, but try that. I have mentioned how this should be configured in my comment above.

BR,
Andreas