Alerts and Aggregation name

Hi All!
I have created a few new alerts and using metricbeat as a source.
I used the following parameters:
Monitor type
Per bucket monitor
Schedule
Every 3 minutes
I used the host.id or the host.name as a group by parameter. (I tried various combination)

The Alert successfully catch the events, however -even though I have multiple hosts sending the data- the host.id/host.name/host.hostname coloumn is always empty as you can see on the attached screenshots.

If I check this in the alerting API (GET _plugins/_alerting/monitors/alerts) the API shows properly the bucket name with the hostname:

        "bucket" : {
          "doc_count" : 3092,
          "avg_system_cpu_system_norm_pct" : {
            "value" : 0.4892820512820513
          },
          "key" : {
            "host.hostname" : "kube.ucs.local"
          }
        }
      }
    },

Or with multiple group by parameters:

   "bucket" : {
          "doc_count" : 518,
          "avg_system_cpu_system_norm_pct" : {
            "value" : 0.3834153846153846
          },
          "key" : {
            "host.hostname" : "WIN-CESRKKF4EO5",
            "host.id" : "d7068737-5756-445e-a12d-333eb81a7f8f"
          }
        }
      }
    },```

Am I doing something wrong, or is it a bug in the UI?

Hi @szultan,

From the attached image and sample responses, it looks like there may be a field called host.name and a field called host.hostname (host.hostname being the desired field to alert on in this case). Judging from the column name in the attached image, the table seems to be trying to display the values associated with the host.name field, which isn’t present in the provided API responses. When defining the group by parameters in the Query section of the Create Monitor UI, do you see both host.name and host.hostname as options in the dropdown menu?

1 Like

Hi @AWSHurneyt!

The host.hostname and host.name is present in every document in that index.
When I pull the alert over API I can even see it in the API response. (The attached example is from an alert where only the host.hostname was added as a group by parameter for the bucket)
So that was the reason I thought that this is a bug on the UI.

Currently, I don’t have access to the cluster but I will check it when I can and confirm if I can find them on the drop.down meniu. (But top of my head it was listed there)

Hi @szultan,

Sounds good! My thoughts are if you select the host.hostname option from the dropdown menu, instead of host.name, the Alerts table should display a column that populates with data from the host.hostname.

Is it still possible with version 2.11.1 to add a custom column to the alerts table?

Hi @hm21,

There isn’t currently an option to add a custom column to that table. The table does update with the group by fields that are selected, though.

If adding custom columns would be helpful, please create a feature request in our github repository. There may be other folks who could benefit from a similar addition, which would help us get the enhancement prioritized.

Thanks for the fast reply.

I tried that out. I selected the index opensearch_dashboards_sample_data_logs and selected “event.dataset” for group by but also all the other options in the drop down field but it does not work. The field is not added to the table like shown in the above image. I run OS 2.11.1 in containers.

@hm21

On the monitor details page, there’s an “export as json” button. Would you mind pasting the monitor json here?

{
   "name": "Test-Monitor",
   "type": "monitor",
   "monitor_type": "bucket_level_monitor",
   "enabled": true,
   "schedule": {
      "period": {
         "unit": "MINUTES",
         "interval": 1
      }
   },
   "inputs": [
      {
         "search": {
            "indices": [
               "opensearch_dashboards_sample_data_logs"
            ],
            "query": {
               "size": 0,
               "aggregations": {
                  "composite_agg": {
                     "composite": {
                        "sources": []
                     },
                     "aggs": {}
                  }
               },
               "query": {
                  "bool": {
                     "filter": [
                        {
                           "range": {
                              "timestamp": {
                                 "gte": "{{period_end}}||-1h",
                                 "lte": "{{period_end}}",
                                 "format": "epoch_millis"
                              }
                           }
                        }
                     ]
                  }
               }
            }
         }
      }
   ],
   "triggers": [
      {
         "bucket_level_trigger": {
            "id": "bdwZqo0BEN1NLVLr5Ksn",
            "name": "Test-Trigger",
            "severity": "1",
            "condition": {
               "buckets_path": {
                  "_count": "_count"
               },
               "parent_bucket_path": "composite_agg",
               "script": {
                  "source": "params._count > 1",
                  "lang": "painless"
               },
               "gap_policy": "skip"
            },
            "actions": []
         }
      }
   ],
   "ui_metadata": {
      "schedule": {
         "timezone": null,
         "frequency": "interval",
         "period": {
            "unit": "MINUTES",
            "interval": 1
         },
         "daily": 0,
         "weekly": {
            "tue": false,
            "wed": false,
            "thur": false,
            "sat": false,
            "fri": false,
            "mon": false,
            "sun": false
         },
         "monthly": {
            "type": "day",
            "day": 1
         },
         "cronExpression": "0 */1 * * *"
      },
      "monitor_type": "bucket_level_monitor",
      "search": {
         "searchType": "graph",
         "timeField": "timestamp",
         "aggregations": [],
         "cleanedGroupBy": [],
         "bucketValue": 1,
         "bucketUnitOfTime": "h",
         "filters": []
      }
   }
}

I’m using the visual editor to set the group by paramter, but somehow it is not persisted or something like that. Whenever, I click on the edit button and scroll down to the group by section I need to reassign the field.

@AWSHurneyt Just tried it with OS version 2.7.0 in container and there it works:

Also tried it with version 2.10.0. That also failed. Did not try 2.8.0, 2.9.0 or 2.11.0

EDIT: Works for version 2.8.0, but not for version 2.9.0

2nd EDIT: @AWSHurneyt do you encounter the same problem or is it just me?

3rd EDIT: Found an issue for that matter: [BUG] Editing a Per Bucket Monitor loses the Group By expressions · Issue #858 · opensearch-project/alerting-dashboards-plugin · GitHub

4th EDIT: @AWSHurneyt does it work for you with 2.11.1?

5th EDIT: @AWSHurneyt seems like you already fixed the groupBy Issue, but haven’t included them in the 2.11.1 release. Didn’t you include them by purpose?

@hm21 Sorry for the delayed response!

I just tested using a 2.11.1 docker image, and you’re correct; the bug is still in that version. The fix is present on our 2.11 branch currently (link), so it will be included in patch 2.11.2.

I did test this on a v2.12 cluster (which just released), and the column does show up in the table on the monitor details page. In addition, the group by selections do not reset in the UI when editing a monitor.

1 Like

Hi, I’m testing the same on v2.12.

The config seems fine. Grouping by host.name is there.

In the results though there is no info to which host.name the alert belongs to:

Interestingly the preview works as expected:

{
   "name": "CPU Monitoring",
   "type": "monitor",
   "monitor_type": "bucket_level_monitor",
   "enabled": true,
   "schedule": {
      "period": {
         "unit": "MINUTES",
         "interval": 5
      }
   },
   "inputs": [
      {
         "search": {
            "indices": [
               "metricbeat-*"
            ],
            "query": {
               "size": 0,
               "aggregations": {
                  "composite_agg": {
                     "composite": {
                        "sources": [
                           {
                              "host.name": {
                                 "terms": {
                                    "field": "host.name"
                                 }
                              }
                           }
                        ]
                     },
                     "aggs": {
                        "max_system_cpu_total_norm_pct": {
                           "max": {
                              "field": "system.cpu.total.norm.pct"
                           }
                        }
                     }
                  }
               },
               "query": {
                  "bool": {
                     "filter": [
                        {
                           "range": {
                              "@timestamp": {
                                 "gte": "{{period_end}}||-10m",
                                 "lte": "{{period_end}}",
                                 "format": "epoch_millis"
                              }
                           }
                        },
                        {
                           "range": {
                              "system.cpu.total.norm.pct": {
                                 "gt": 0.2
                              }
                           }
                        }
                     ]
                  }
               }
            }
         }
      }
   ],
   "triggers": [
      {
         "bucket_level_trigger": {
            "id": "tG_b440BRRTTBmxaKrm2",
            "name": "CPU Critical",
            "severity": "1",
            "condition": {
               "buckets_path": {
                  "max_system_cpu_total_norm_pct": "max_system_cpu_total_norm_pct"
               },
               "parent_bucket_path": "composite_agg",
               "script": {
                  "source": "params.max_system_cpu_total_norm_pct > 0.3",
                  "lang": "painless"
               },
               "gap_policy": "skip"
            },
            "actions": [
               {
                  "id": "notification892102",
                  "name": "CPU Critical  SMS",
                  "destination_id": "Um7S440BRRTTBmxagL5Z",
                  "message_template": {
                     "source": "Monitor {{ctx.monitor.name}} just entered alert status. Please investigate the issue.\n  - Trigger: {{ctx.trigger.name}}\n  - Severity: {{ctx.trigger.severity}}\n  - Period start: {{ctx.periodStart}}\n  - Period end: {{ctx.periodEnd}}\n\n  - Deduped Alerts:\n  {{#ctx.dedupedAlerts}}\n    * {{id}} : {{bucket_keys}}\n  {{/ctx.dedupedAlerts}}\n\n  - New Alerts:\n  {{#ctx.newAlerts}}\n    * {{id}} : {{bucket_keys}}\n  {{/ctx.newAlerts}}\n\n  - Completed Alerts:\n  {{#ctx.completedAlerts}}\n    * {{id}} : {{bucket_keys}}\n  {{/ctx.completedAlerts}}",
                     "lang": "mustache"
                  },
                  "throttle_enabled": false,
                  "subject_template": {
                     "source": "Alerting Notification action",
                     "lang": "mustache"
                  },
                  "action_execution_policy": {
                     "action_execution_scope": {
                        "per_alert": {
                           "actionable_alerts": [
                              "DEDUPED",
                              "NEW"
                           ]
                        }
                     }
                  }
               }
            ]
         }
      },
      {
         "bucket_level_trigger": {
            "id": "dm_c440BRRTTBmxaet1X",
            "name": "CPU Major",
            "severity": "2",
            "condition": {
               "buckets_path": {
                  "max_system_cpu_total_norm_pct": "max_system_cpu_total_norm_pct"
               },
               "parent_bucket_path": "composite_agg",
               "script": {
                  "source": "params.max_system_cpu_total_norm_pct > 0.2",
                  "lang": "painless"
               },
               "gap_policy": "skip"
            },
            "actions": [
               {
                  "id": "notification511331",
                  "name": "CPU Major SMS",
                  "destination_id": "Um7S440BRRTTBmxagL5Z",
                  "message_template": {
                     "source": "Monitor {{ctx.monitor.name}} just entered alert status. Please investigate the issue.\n  - Trigger: {{ctx.trigger.name}}\n  - Severity: {{ctx.trigger.severity}}\n  - Period start: {{ctx.periodStart}}\n  - Period end: {{ctx.periodEnd}}\n\n  - Deduped Alerts:\n  {{#ctx.dedupedAlerts}}\n    * {{id}} : {{bucket_keys}}\n  {{/ctx.dedupedAlerts}}\n\n  - New Alerts:\n  {{#ctx.newAlerts}}\n    * {{id}} : {{bucket_keys}}\n  {{/ctx.newAlerts}}\n\n  - Completed Alerts:\n  {{#ctx.completedAlerts}}\n    * {{id}} : {{bucket_keys}}\n  {{/ctx.completedAlerts}}",
                     "lang": "mustache"
                  },
                  "throttle_enabled": false,
                  "subject_template": {
                     "source": "Alerting Notification action",
                     "lang": "mustache"
                  },
                  "action_execution_policy": {
                     "action_execution_scope": {
                        "per_alert": {
                           "actionable_alerts": [
                              "DEDUPED",
                              "NEW"
                           ]
                        }
                     }
                  }
               }
            ]
         }
      }
   ],
   "ui_metadata": {
      "schedule": {
         "timezone": null,
         "frequency": "interval",
         "period": {
            "unit": "MINUTES",
            "interval": 5
         },
         "daily": 0,
         "weekly": {
            "tue": false,
            "wed": false,
            "thur": false,
            "sat": false,
            "fri": false,
            "mon": false,
            "sun": false
         },
         "monthly": {
            "type": "day",
            "day": 1
         },
         "cronExpression": "0 */1 * * *"
      },
      "monitor_type": "bucket_level_monitor",
      "search": {
         "searchType": "graph",
         "timeField": "@timestamp",
         "aggregations": [
            {
               "aggregationType": "max",
               "fieldName": "system.cpu.total.norm.pct"
            }
         ],
         "groupBy": [
            "host.name"
         ],
         "bucketValue": 10,
         "bucketUnitOfTime": "m",
         "filters": [
            {
               "fieldName": [
                  {
                     "label": "system.cpu.total.norm.pct",
                     "type": "number"
                  }
               ],
               "fieldValue": 0.2,
               "operator": "is_greater"
            }
         ]
      }
   }
}

@Jaro Do you have an example of the documents that get ingested to that index (scrubbed of any sensitive info)?

Hi, thanks for coming back!

There is nothing sensitive in the documents. It’s a generic metricbeat set of metrics.

What I need is an alert grouped by certain field/term. In this case host.name.
BTW I tried to use some other fields like host.hostname or just system_id. There is a column there but always empty.

An idea. Is it possible that the column is empty because the grouping is case sensitive or the first caps is only a matter of visualization?

Two documents for your reference.

{
  "_index": "metricbeat-a_u-r_m-stat-2024.09_cls5.0",
  "_id": "Je2N7o0BoUI6je75bhY3",
  "_version": 1,
  "_score": null,
  "_source": {
    "log_source": "metricbeat generic",
    "access_level": "user",
    "system_id": "999-123",
    "ecs": {},
    "system": {
      "cpu": {
        "idle": {
          "norm": {
            "pct": 0.995
          },
          "pct": 3.9802
        },
        "cores": 4,
        "total": {
          "norm": {
            "pct": 0.005
          },
          "pct": 0.0198
        },
        "system": {
          "norm": {
            "pct": 0.0021
          },
          "pct": 0.0083
        },
        "user": {
          "norm": {
            "pct": 0.0029
          },
          "pct": 0.0115
        }
      }
    },
    "@timestamp": "2024-02-28T07:11:04.381Z",
    "agent": {
      "name": "WIN06",
      "type": "metricbeat"
    },
    "@version": "1",
    "host": {
      "name": "WIN06",
      "ip": "fe80::9c1b:ade7:a895:70a8",
      "hostname": "win06",
      "cpu": {
        "usage": 0.005
      },
      "os": {
        "type": "windows"
      }
    },
    "event": {
      "dataset": "system.cpu",
      "module": "system"
    },
    "service": {
      "type": "system"
    },
    "metricset": {
      "name": "cpu",
      "period": 30000
    }
  },
  "fields": {
    "@timestamp": [
      "2024-02-28T07:11:04.381Z"
    ]
  },
  "sort": [
    1709104264381
  ]
}
{
  "_index": "metricbeat-a_u-r_m-stat-2024.09_cls5.0",
  "_id": "7e2N7o0BoUI6je75WwtJ",
  "_version": 1,
  "_score": null,
  "_source": {
    "log_source": "metricbeat generic",
    "access_level": "user",
    "system_id": "999-123",
    "ecs": {},
    "system": {
      "cpu": {
        "cores": 8,
        "total": {
          "norm": {
            "pct": 0.0011
          },
          "pct": 0.0084
        },
        "irq": {
          "norm": {
            "pct": 0
          },
          "pct": 0
        },
        "system": {
          "norm": {
            "pct": 0.0005
          },
          "pct": 0.004
        },
        "iowait": {
          "norm": {
            "pct": 0
          },
          "pct": 0.0003
        },
        "nice": {
          "norm": {
            "pct": 0
          },
          "pct": 0
        },
        "user": {
          "norm": {
            "pct": 0.0005
          },
          "pct": 0.004
        },
        "idle": {
          "norm": {
            "pct": 0.9989
          },
          "pct": 7.9913
        },
        "softirq": {
          "norm": {
            "pct": 0
          },
          "pct": 0.0003
        },
        "steal": {
          "norm": {
            "pct": 0
          },
          "pct": 0
        }
      }
    },
    "@timestamp": "2024-02-28T07:10:59.988Z",
    "agent": {
      "name": "mtm02.refta3.local",
      "type": "metricbeat"
    },
    "@version": "1",
    "event": {
      "dataset": "system.cpu",
      "duration": 91405,
      "module": "system"
    },
    "service": {
      "type": "system"
    },
    "host": {
      "name": "mtm02.refta3.local",
      "ip": "192.168.131.52",
      "cpu": {
        "usage": 0.0011
      },
      "hostname": "mtm02.refta3.local",
      "os": {
        "type": "linux"
      }
    },
    "metricset": {
      "name": "cpu",
      "period": 30000
    }
  },
  "fields": {
    "@timestamp": [
      "2024-02-28T07:10:59.988Z"
    ]
  },
  "sort": [
    1709104259988
  ]
}

Hi, just tested on 2.13. The issue is still there.