Snapshots Deletion Triggered S3 Rate Limiting and Subsequent Snapshot Failures

Mai · July 7, 2025, 11:00pm

2.19.0 (relevant - OpenSearch/Dashboard/Server OS/Browser):

Describe the issue:
We’re experiencing a recurring issue with OpenSearch snapshots getting stuck in the IN_PROGRESS state.

Context:

We use OpenSearch ISM to create daily snapshots to an S3-backed repository.
The snapshot policy uses "*" to include all indices and runs once per day.
Snapshot deletion, is hitting S3 rate limit, causing the process to be stuck.

Configuration:

  "policies": [
    {
      "_id": "daily-policy-1-sm-policy",
      "_seq_no": 42620428,
      "_primary_term": 90,
      "sm_policy": {
        "name": "daily-policy-1",
        "description": "Daily snapshot policy at 1 AM PST",
        "schema_version": 21,
        "creation": {
          "schedule": {
            "cron": {
              "expression": "0 1 * * *",
              "timezone": "America/Los_Angeles"
            }
          },
          "time_limit": "1h"
        },
        "deletion": {
          "schedule": {
            "cron": {
              "expression": "0 0 * * *",
              "timezone": "America/Los_Angeles"
            }
          },
          "condition": {
            "min_count": 7,
            "max_count": 30
          }
        },
        "snapshot_config": {
          "indices": [
            "*"
          ],
          "ignore_unavailable": true,
          "include_global_state": false,
          "name": "daily-{now/d}",
          "repository": "daily_snapshot_1",
          "partial": false
        },
        "schedule": {
          "interval": {
            "start_time": 1745875413734,
            "period": 1,
            "unit": "Minutes"
          }
        },
        "enabled": true,
        "last_updated_time": 1751925866497,
        "enabled_time": 1751925866497
      }
    },

Relevant Logs or Screenshots:

deleting snapshots [daily-policy-1-2025-05-15t08:00:40-8vcvxmzn] from repository [daily_snapshot_1][2025-07-03T07:01:22,386][WARN ][o.o.r.b.BlobStoreRepository] [os-fileingest-master-3.prod.mw.int] [daily_snapshot_1] 
Exception during single stale index delete java.lang.RuntimeException: 

java.util.concurrent.CompletionException: software.amazon.awssdk.services.s3.model.S3Exception: Please reduce your request rate. (Service: S3, Status Code: 503, 
        at org.opensearch.repositories.s3.S3BlobContainer.getFutureValue(S3BlobContainer.java:400) ~[?:?]        at org.opensearch.repositories.s3.S3BlobContainer.delete(S3BlobContainer.java:380) ~[?:?]
        at org.opensearch.repositories.blobstore.BlobStoreRepository.deleteContainer(BlobStoreRepository.java:2280) ~[opensearch-2.19.0.jar:2.19.0]        at org.opensearch.repositories.blobstore.BlobStoreRepository.lambda$executeOneStaleIndexDelete$45(BlobStoreRepository.java:2245) [opensearch-2.19.0.jar:2.19.0]
        at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74) [opensearch-2.19.0.jar:2.19.0]
        at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) [opensearch-2.19.0.jar:2.19.0]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1014) [opensearch-2.19.0.jar:2.19.0]        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.19.0.jar:2.19.0]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]

Anthony · July 8, 2025, 8:14am

@Mai since the rate limiting on S3 is per partition, have you tried to separate the snapshots into a smaller (list of indices) and create separate partitions and ISM policies to manage them.

Also snapshotting with “*” captures a lot of system indices that you will probably never attempt to restore. It is a better approach to select the indices you actually would like to snapshot (using wildcards to capture groups of indices where necessary, eg products-* )

Mai · July 8, 2025, 6:52pm

Yes we are looking into that option as well. At present our indices don’t have a pattern where we can group them.
Do you think, since the delete operation is preventing the subsequent snapshots, do remove the deletion policy from the ISM, and have a external script delete older snapshots? We will hit rate limit error as well, but we can add retry.

Anthony · July 9, 2025, 10:19am

@Mai this would work, but you would be applying to “patch”, which could still have issues if the underlying problem with snapshots is not resolved.

Topic		Replies	Views
Error adding Snapshot action Index Management	3	35	November 21, 2025
Snapshot Management policy keeps failing Index Management troubleshoot	4	473	October 1, 2023
Issue with OpenSearch Snapshot Deletion When Multiple Snapshots Exist in AWS S3 Repository OpenSearch	1	304	September 17, 2024
Q: Enable S3 Snapshotting Without s3:DeleteObject General Feedback troubleshoot , configure	0	278	January 17, 2023
Snapshot policy fails OpenSearch	4	439	January 23, 2024

Snapshots Deletion Triggered S3 Rate Limiting and Subsequent Snapshot Failures

Context:

Related topics