What I would like to do is keep a relatively modest amount of data on my OpenSearch cluster, but still have a longer set of data retrievable via S3 backed snapshot. Let’s say I want 15 days of data on my OpenSearch, but 60 days available if needed as part of snapshots. I am having trouble reading the documentation because while I feel dumb saying that ‘delete’ sounds vague in the action description, as it seems to indicate that the index itself gets deleted, the ‘snapshot’ action seems to create a snapshot. However, it is unclear how I can have a delete snapshot event in the lifecycle. Again: trying to recreate this:
active data → after 1 day data is snapshotted (copied to S3) → after 15 days, index is deleted in OpenSearch, but still available for restoration via snapshot → after 60 days, the storage in S3 is cleaned and removed entirely.
For the purposes on this example, I am using indices that have a date postpended to the index (the logstash ‘default’). I don’t particularly care if data is snapshotted after 1 day or exactly when. The data isn’t that critical that I need by-the-hour snapshots if that makes getting rid of the old ones harder.
It doesn’t seem like I can use S3 lifecycle policies on the bucket itself to purge old data as this note seems to warn against it:
If you need to delete a snapshot, be sure to use the Elasticsearch API rather than navigating to the storage location and purging files. Incremental snapshots from a cluster often share a lot of the same data; when you use the API, Elasticsearch only removes data that no other snapshot is using.
Thanks for any suggestions you have!