Consistent data for backups and snapshot disk space

Hi!

I’m running an application in Kubernetes, and that application depends on OpenSearch. So now I have to learn how to maintain OpenSearch! :smile: Right now, I am trying to wrap my head around how to properly backup OpenSearch.

I found that it’s possible to configure snapshots in OpenSearch. Clearly, if I configure snapshots to run regularly and save those on disk, that data will be consistent if I back it up using some external solution (for example Velero).

Snapshots is probably a good thing to use in any case, but just for my own understanding I want to know: if I backup OpenSearch’s PVC data without any snapshots, does that risk data inconsistency?

A related question I have is regarding the incremental nature of snapshots. It says in the docs:

Snapshots store only incremental changes since the last snapshot. Thus, while taking an initial snapshot may be a heavy operation, subsequent snapshots have minimal overhead.

I understand that the initial snapshot may be relatively slow. But does this also mean that the initial snapshot takes up extra disk space? So if I expect my live data to be 100 GiB on disk, I would need to provision more than 200 GiB disk to accommodate snapshots. Right?

Cheers!