ALERTING for Snapshot Failure

Hello, I would like to use an alert to notify me of a snapshot failure. However, I quickly find myself limited because I am unable to define the type of monitor (Per query or per cluster metrics monitor).

I need to retrieve the last 4 snapshots, and if any of them are not in the SUCCESS state, I should receive an email notification.

Here is my configuration:

{
  name: "daily_backup",
  type: "monitor",
  enabled: true,
  schedule: {
    period: {
      interval: 1,
      unit: "MINUTES"
    }
  },
  inputs: [{
    search: {
      indices: ["test-logs-*"],
      query: {
        size: 0,
        aggregations: {},
        query: {
          snapshot.status: 'FAILED'
        }
      }
    }
  }],
  triggers: [
    {
    name : "[Trigger for 'daily_backup' snapshot failure]",
      severity : "1",
      condition : {
        script : {
          source : "ctx.results[0].hits.total.value > 0",
          lang : "painless"
        }
      },
      actions : [
        {
          name : "Email",
          destination_id : "mail-to-team",
          message_template : {
            source : "The creation of snapshot 'daily_backup' failed.",
          },
          throttle_enabled : true,
          throttle : {
            value : 30,
            unit : "MINUTES"
          }
          subject_template : {
            source : "[SNAPSHOT-FAILURE]",
          }
        }
      ]
    }
  ]
}

Thanks for your answer
Best,

@Akinator To monitor snapshot failures in OpenSearch you should use a Per Query Monitor.
And Cluster metrics monitors are designed to alert on metrics like CPU utilization, JVM memory pressure, or disk usage.

And to retrieve the last 4 snapshot check, there are few changes need s to be done on the alert configuration.

Here is the updated alert configuration for monitoring snapshots and alerting if any of the last 4 snapshots are not in the SUCCESS state.


{
  "name": "daily_backup",
  "type": "monitor",
  "enabled": true,
  "schedule": {
    "period": {
      "interval": 1,
      "unit": "MINUTES"
    }
  },
  "inputs": [
    {
      "search": {
        "indices": ["test-logs-*"],
        "query": {
          "size": 4,
          "query": {
            "bool": {
              "must_not": [
                {
                  "term": {
                    "snapshot.status.keyword": {
                      "value": "SUCCESS"
                    }
                  }
                }
              ]
            }
          },
          "sort": [
            {
              "start_time": {
                "order": "desc"
              }
            }
          ]
        }
      }
    }
  ],
  "triggers": [
    {
      "name": "[Trigger for 'daily_backup' snapshot failure]",
      "severity": "1",
      "condition": {
        "script": {
          "source": "return ctx.results[0].hits.total.value > 0",
          "lang": "painless"
        }
      },
      "actions": [
        {
          "name": "Email",
          "destination_id": "mail-to-team",
          "message_template": {
            "source": "The creation of snapshot 'daily_backup' failed."
          },
          "subject_template": {
            "source": "[SNAPSHOT-FAILURE]"
          },
          "throttle_enabled": true,
          "throttle": {
            "value": 30,
            "unit": "MINUTES"
          }
        }
      ]
    }
  ]
}

The configuration should work as is, assuming the indices and field names match your setup.

1 Like

Thank you very much @aravindan07 for your response. Have a great day!