Snapshot management policy snapshoting failed without latest_execution.info field

crs · April 4, 2024, 4:56pm

Snapshot started to failed since 10 days, without any logs to help. Nothing changed concerning the SM policy configuration.

{
  "policies": [
    {
      "name": "snapshot-abc",
      "creation": {
        "current_state": "CREATION_START",
        "trigger": {
          "time": 1712253600000
        },
        "latest_execution": {
          "status": "FAILED",
          "start_time": 1712239399209,
          "end_time": null
        }
      },
      "deletion": {
        "current_state": "DELETION_START",
        "trigger": {
          "time": 1712286000000
        },
        "latest_execution": {
          "status": "FAILED",
          "start_time": 1712199799200,
          "end_time": null
        }
      },
      "policy_seq_no": 63243851,
      "policy_primary_term": 243,
      "enabled": true
    }
  ]
}

Any idea? My only guess is that the ism config is not triggered, due to this log

[2024-04-04T09:51:10,198][INFO ][o.o.j.s.JobSweeper       ] [es-main-master-1] Error while sweeping shard [.opendistro-ism-config][0], error message: all shards failed

But the shard status is green…

Any help would be appreciated!

crs · April 10, 2024, 9:38am

I solved the error the hard way by first deleting the index .opendistro-ism-config and then reconfigure the SM policy.

/!\ This action removes every ISM you could have configured until then on your OpenSearch cluster (every action triggered by the JobScheduler).

My intuition is that the JobScheduler couldn’t validate the configurations stored in the index .opendistro-ism-config. I guess I could have cleaned it manually instead of deleting everything. The issue is clearly linked to the presence of a corrupted document within the index that leads to JobScheduler error.

On the other hand, I still don’t understand what caused the index to be corrupted.

crs · June 3, 2024, 2:11pm

I had the exact same issue on my preprod cluster and found the root cause of the issue.

TLDR; The user used to create Snapshot Management Policy did not exist anymore leading to the issue.

The error was indeed caused by a corrupted document in the .opendistro-ism-config index, more specifically by the document describing the Snapshot Management Policy that I configured previously, say MY_SMP.

To find this document

GET .opendistro-ism-config/_search
{
  "query": {
    "bool": {
      "filter": [
        {"term": {
          "sm_policy.name": "MY_SMP"
        }}
      ]
    }
  }
}

The returned document contained a field user that mentioned an old user that I deleted previously.

To solve the error, delete the document

DELETE /.opendistro-ism-config/_doc/<MY_SMP_DOC_NAME>?timeout=5m

And recreate the Snapshot Management Policy with an available user.

system · August 2, 2024, 2:11pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Snapshot Management policy keeps failing Index Management troubleshoot	4	445	October 1, 2023
Snapshot policy fails OpenSearch	4	417	January 23, 2024
Snapshot Issue Connection pool shut down OpenSearch troubleshoot	1	69	July 15, 2024
Error creating State Management Policies OpenSearch configure , index-management	2	298	October 4, 2023
Snapshot management policies not working Index Management troubleshoot , index-management	1	670	December 3, 2023

Snapshot management policy snapshoting failed without latest_execution.info field

Related topics