More than 500 buckets in Alert by buckets

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

Opensearch 2.8 / Red Hat 8.8

Describe the issue:

Hello

I have discovered a strange behaviour of the alerting module of Opensearch 2.8. We have a monitor “per bucket” that check the status of an item on every monitored hosts of our system. Every host can have multiple different items. We want to have an alert by item and by host, that’s why we choose the “per bucket” monitor.

When the number of couple host/item (a bucket if i’m correct) goes over 500 at a time (execution of the monitor every minute), then the number of active alerts increase indefinitely.

When there is less than 500 couples, the number of active alerts is exactly the number of couples (which is correct, what we want) and the number doesn’t evolve (until the number of couples evolve too)

So, under 500 couples (buckets), the behaviour is correct, but over 500 buckets, the alerting module start to get mad and increase the number of alerts without explainations (and without error messages).

Is there a parameter somewhere that set that limit of 500 buckets ?

I tried to find something in the advanced settings of the stack management module, wihout success. I also changed the max_compilations_rate (1000/1m) and script.cache.max_size to 1000, but it doesn’t change the limit of 500 buckets max

Does anyone have a solution for this ?

1 Like

Hello

I’ve done a few more tests, and i’ve noticed some more things :

I send a bulk of 10 documents, with the wanted item status at “not ok” value. A send this bulk every 10 seconds, and for a certain number of hosts. That means that I have 10 items on every hosts.

If I do it for 1 hostname, I have 10 buckets, the monitor trigger the alerts and I have 10 messages every 60 minutes (throttling set at 60 minutes). The number of active alerts is 10, and stay at ten all the time the documents are “active” (i mean the time window duration monitored by the monitor). When the document are no more in this monitored window (too old docs), the alerts are completed, and everything works as expected.

If I do it for 5 hostnames, I have 50 buckets, and everything works as expected (50 messages every 60 minutes, one bucket per message, number of active alerts is 50…)

If I do it for 6 hostnames, I have 60 buckets, and there is only one message that is sent, with the list of 60 couples of hostname/item. This message is sent every minute (the monitor is run every minutes) even if the throttling is set at 60 minutes. So here we have a flood of messages. The number of active alerts is 60, and stay calm until the documents get too old, then it goes to 0, as expected.

If I do it for 50 hostnames (500 buckets), I have the same behaviour than or 6 hostnames

If I do it for 51 hostnames (510 buckets), I have 1 message with the list of 500 (or 510?) couples of hostname/item, the number of active alerts is 510, but it grows every minutes (each execution of the monitor) form 10 to 10 (510, then 520 the minute after, then 530, 540…). When the documents are too old, the active alerts slowly decrease, more slowly than expected.

I think it is an issue on the alerting plugin of Opensearch 2.8 (Or maybe 2 separate issues). Is it a known behaviour of the alerting plugin ? Does anyone knows if that bug has already been reported ? Is it fixed in another version of OpenSearch ?

Is there somewhere a parameter to change the limit of 50 buckets and 500 buckets ?

If you need information, do not hesitate to contact me

Hello, does anyone already noticed this behaviour ?

Do i have to fill a ticket for this issue ?