Does OpenSearch have any recommendations for data recovery strategies if there is a partial data loss? For example if only a subset of shards in an index become unassigned on a cluster due to node drop, is there a way to recover just those shards? My understanding is that if you use a snapshot to restore an index in a cluster, the entire index is recreated from the snapshot which doesn’t seem like the best approach for a partial data loss scenario.
On the index in questioned, those shards have a replica?
By chance did you enable Shard allocation awareness? This is found here
Hi @Gsmitt thanks for the reply.
We have 2 replicas for each shard (so we have 3 total copies) spread across three different availability zones. However, it is possible for nodes to fail occasionally and if the nodes storing these shards happens to simultaneously fail, we could lose the data stored on the shards.
So we are want to see if there is a way to either use a snapshot to restore the failed shard only, or if there is some other way to partially restore data.
As for a failed shard, the Shard allocation awareness should take care of that. Let’s say you have one failed shard of 4, the snapshoot I dont think it will recove just one shard it will replace index set. Im not 100% sure but I have nt seen that done. Like i said have the Shard allocation awareness enabled should resolve that issue. From My understanding thats why you would have multiple shard and replicas for an issue like that. if a shard failed on a index set I have seen a index rotate and shard get replace and asigned to a node.
Best way I would know for 100% is test this out in lab setup.