OpenSearch Bucket Aggregation - Get full message text

v 2.6.0

I am looking to use bucket aggregations to alert on Entitlement changes in SQL server audit logs.

I have the query built and returning correct logs in discover, but when I try to create a bucket aggregation, I am unable to get the “message” text included in the Extraction query response. The end goal would be to include ALL of the message text in the Extraction query response so I can include the text in a trigger.

Configuration:
The query I am using is looking for:
“match_phrase”: {
“message”: “statement:ALTER SERVER ROLE”
}

I am then nesting my aggregations on individual host.names.

“aggregations”: {
“serverNames”: {
“terms”: {
“field”: “host.hostname.keyword”,
“size”: 10,
“min_doc_count”: 1,
“shard_min_doc_count”: 0,
“show_term_doc_count_error”: false,
“order”: [
{
“_count”: “desc”
},
{
“_key”: “asc”
}
]
},
“aggregations”: {
“serverMessage”: {
“terms”: {
“field”: “message.keyword”,
“size”: 100,
“min_doc_count”: 1,
“shard_min_doc_count”: 0,
“show_term_doc_count_error”: false,
“order”: [
{
“_count”: “desc”
},
{
“_key”: “asc”
}
]

The only thing that isn’t working is getting the FULL TEXT from the message term into the Extraction query response. I have tried to use significant_text, but that just seems to return random junk from the log.

Relevant Logs or Screenshots:

Most templates (e.g. Logstash) configure message to be only a text field without a keyword subfield, so maybe that’s why you don’t see anything - message.keyword may not exist.

You could enable it in the template and reindex, but:

  • the aggregation can be expensive, because you’d aggregate on a field with a VERY high cardinality
  • indexing will slow down and you’ll store more data

Alternatively, you can try the Top Hits aggregation to be your inner aggregation: Top hits aggregation | Elasticsearch Guide [7.10] | Elastic

I assume you don’t need a specific order for this text, so you can skip specifying a sort value and you can say just how many message you want per host.

Another option (as the docs above point out) is to use collapse: Collapse search results | Elasticsearch Guide [7.10] | Elastic

You’d collapse on host name and show whichever fields you want from documents. More limited than top_hits (which can be nested under N levels of aggregations) but potentially faster.

1 Like

Collapse was exactly what I needed.

Thanks!

1 Like