Hello, using amazon elasticsearch service ver 6.5
this is the first time I am using alerting so pls excuse the n00b question if im missing something obvious.
I have set up a new monitor:
the extraction query is:
{
"query": {
"bool": {
"filter": [
{
"range": {
"@timestamp": {
"from": "{{period_start}}||-16m",
"to": "{{period_start}}||-1m",
"include_lower": true,
"include_upper": true,
"boost": 1
}
}
},
{
"simple_query_string": {
"query": "5*|ERROR",
"fields": [
"level^1.0",
"response_code^1.0"
],
"flags": -1,
"default_operator": "or",
"analyze_wildcard": false,
"auto_generate_synonyms_phrase_query": true,
"fuzzy_prefix_length": 0,
"fuzzy_max_expansions": 50,
"fuzzy_transpositions": true,
"boost": 1
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
}
}
Basically, the TLDR of what im searching for in my kiabna logs is: any log entry which has response_code:5*
OR level:ERROR
. The query runs at an interval of 15 mins and queries a shifting window of now-16m
till now-1m
(15 min shifting window)
side note: the syntax of ||-5m
was picked off documentation and im not sure how it works, I was not able to find clean documentation on how it works.
I have setup a trigger as ctx.results[0].hits.total > 0
now the message I have configured is:
Monitor {{ctx.monitor.name}} just entered alert status. Please investigate the issue.
- Trigger: {{ctx.trigger.name}}
- Severity: {{ctx.trigger.severity}}
- Period start: {{ctx.periodStart}}
- Period end: {{ctx.periodEnd}}
- Total Error Count: {{ctx.results.0.hits.total}}
under the message preview, this shows period start as now minus 15 mins and period end as now.
Also, when I “send a test message”, the time stamps match up with what is shown in the preview
However, when the automated runs happen, the timestamps are inconsistent:
period start is now
period end is now + 15 min
- seems there is some inconsistency in computation of periodStart and periodEnd when it is manual run vs automated run. I verified this by checking the count and viewing the count in kibana. The count matches up with my query of (now-16m and now-1m)
- the fact that I am not able to get the time stamps based on the actual query is not helpful (now-16m and now-1m)
if you need more info pls let me know, this is a brain dump of all my context.
thx.