2.19.0 (relevant - OpenSearch/Dashboard/Server OS/Browser):
Describe the issue:
We’re experiencing a recurring issue with OpenSearch snapshots getting stuck in the IN_PROGRESS state.
Context:
- We use OpenSearch ISM to create daily snapshots to an S3-backed repository.
- The snapshot policy uses
"*"to include all indices and runs once per day. - Snapshot deletion, is hitting S3 rate limit, causing the process to be stuck.
Configuration:
"policies": [
{
"_id": "daily-policy-1-sm-policy",
"_seq_no": 42620428,
"_primary_term": 90,
"sm_policy": {
"name": "daily-policy-1",
"description": "Daily snapshot policy at 1 AM PST",
"schema_version": 21,
"creation": {
"schedule": {
"cron": {
"expression": "0 1 * * *",
"timezone": "America/Los_Angeles"
}
},
"time_limit": "1h"
},
"deletion": {
"schedule": {
"cron": {
"expression": "0 0 * * *",
"timezone": "America/Los_Angeles"
}
},
"condition": {
"min_count": 7,
"max_count": 30
}
},
"snapshot_config": {
"indices": [
"*"
],
"ignore_unavailable": true,
"include_global_state": false,
"name": "daily-{now/d}",
"repository": "daily_snapshot_1",
"partial": false
},
"schedule": {
"interval": {
"start_time": 1745875413734,
"period": 1,
"unit": "Minutes"
}
},
"enabled": true,
"last_updated_time": 1751925866497,
"enabled_time": 1751925866497
}
},
Relevant Logs or Screenshots:
deleting snapshots [daily-policy-1-2025-05-15t08:00:40-8vcvxmzn] from repository [daily_snapshot_1][2025-07-03T07:01:22,386][WARN ][o.o.r.b.BlobStoreRepository] [os-fileingest-master-3.prod.mw.int] [daily_snapshot_1]
Exception during single stale index delete java.lang.RuntimeException:
java.util.concurrent.CompletionException: software.amazon.awssdk.services.s3.model.S3Exception: Please reduce your request rate. (Service: S3, Status Code: 503,
at org.opensearch.repositories.s3.S3BlobContainer.getFutureValue(S3BlobContainer.java:400) ~[?:?] at org.opensearch.repositories.s3.S3BlobContainer.delete(S3BlobContainer.java:380) ~[?:?]
at org.opensearch.repositories.blobstore.BlobStoreRepository.deleteContainer(BlobStoreRepository.java:2280) ~[opensearch-2.19.0.jar:2.19.0] at org.opensearch.repositories.blobstore.BlobStoreRepository.lambda$executeOneStaleIndexDelete$45(BlobStoreRepository.java:2245) [opensearch-2.19.0.jar:2.19.0]
at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74) [opensearch-2.19.0.jar:2.19.0]
at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) [opensearch-2.19.0.jar:2.19.0]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1014) [opensearch-2.19.0.jar:2.19.0] at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.19.0.jar:2.19.0]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]