What I would like to do is keep a relatively modest amount of data on my OpenSearch cluster, but still have a longer set of data retrievable via S3 backed snapshot. Let’s say I want 15 days of data on my OpenSearch, but 60 days available if needed as part of snapshots. I am having trouble reading the documentation because while I feel dumb saying that ‘delete’ sounds vague in the action description, as it seems to indicate that the index itself gets deleted, the ‘snapshot’ action seems to create a snapshot. However, it is unclear how I can have a delete snapshot event in the lifecycle. Again: trying to recreate this:
active data → after 1 day data is snapshotted (copied to S3) → after 15 days, index is deleted in OpenSearch, but still available for restoration via snapshot → after 60 days, the storage in S3 is cleaned and removed entirely.
For the purposes on this example, I am using indices that have a date postpended to the index (the logstash ‘default’). I don’t particularly care if data is snapshotted after 1 day or exactly when. The data isn’t that critical that I need by-the-hour snapshots if that makes getting rid of the old ones harder.
It doesn’t seem like I can use S3 lifecycle policies on the bucket itself to purge old data as this note seems to warn against it:
If you need to delete a snapshot, be sure to use the Elasticsearch API rather than navigating to the storage location and purging files. Incremental snapshots from a cluster often share a lot of the same data; when you use the API, Elasticsearch only removes data that no other snapshot is using.
Definitely wouldn’t be able to do the full lifecycle workflow you want as of now. We don’t currently support managing snapshots in ISM besides being able to do individual snapshots of an index at some point in the index lifecycle.
So of your example you can do:
active data → after 1 day data is snapshotted (copied to S3) → after 15 days, index is deleted in OpenSearch, but still available for restoration via snapshot
Once you delete the index though ISM cleans up any internal metadata/jobs for that index as it is gone from the cluster.
It seems like you would benefit from us implementing something similar to Elastic’s SLM to manage your snapshots. I believe there is an enhancement request, feel free to +1 and add any information to the issue so we can continue to track requests for it.
Thanks for the quick reply. I was able to find a link on the Amazon site for this topic: Amazon OpenSearch Service - which seems like a not so great substitute for the ISM ticket you referenced. BTW, the link amazon references on that page to Curator seems old and unmaintained, since it has a changelog ending in 2018 and support for Elastisearch 6.5 as ‘future’ tense. This seems like the better github project link, I think: Releases · elastic/curator · GitHub. However, it has a pretty long list of python dependencies and there appears to be no python3-curator package in redhat or epel, for those looking at this later which makes using it painful for those who want automatic updates (at least for security issues).