Consistent data for backups and snapshot disk space

Hi!

I’m running an application in Kubernetes, and that application depends on OpenSearch. So now I have to learn how to maintain OpenSearch! :smile: Right now, I am trying to wrap my head around how to properly backup OpenSearch.

I found that it’s possible to configure snapshots in OpenSearch. Clearly, if I configure snapshots to run regularly and save those on disk, that data will be consistent if I back it up using some external solution (for example Velero).

Snapshots is probably a good thing to use in any case, but just for my own understanding I want to know: if I backup OpenSearch’s PVC data without any snapshots, does that risk data inconsistency?

A related question I have is regarding the incremental nature of snapshots. It says in the docs:

Snapshots store only incremental changes since the last snapshot. Thus, while taking an initial snapshot may be a heavy operation, subsequent snapshots have minimal overhead.

I understand that the initial snapshot may be relatively slow. But does this also mean that the initial snapshot takes up extra disk space? So if I expect my live data to be 100 GiB on disk, I would need to provision more than 200 GiB disk to accommodate snapshots. Right?

Cheers!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.