Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
- debian 11
- 3x opensearch nodes (2.5.0 running in docker container)
- snapshot repo via nfs mount into each opensearch docker container
Describe the issue:
I need to restore an index from our snapshot repository. We use daily-dated indices so restoring an old (not existing anymore) index should not require rename or delete an existing index
My restore curl command looks as follows
curl -X POST -k -H 'Content-Type: application/json' https://admin:redacted@localhost:9200/_snapshot/seppl-snap-repo/20230214_044701/_restore' -d'
{
"indices": "redacted-2022-09-07",
"partial": true,
"ignore_unavailable": true,
"include_global_state": false
}'
the last two parameters I just added in the last try but it did not change anything. It still throws an error like this
"error":{"root_cause":[{"type":"snapshot_missing_exception","reason":"[seppl-snap-repo:20230214_044701/d64rfMCbTweXtBqRUrOZyA] is missing"}],"type":"snapshot_missing_exception","reason":"[seppl-snap-repo:20230214_044701/d64rfMCbTweXtBqRUrOZyA] is missing","caused_by":{"type":"no_such_file_exception","reason":"/backup/opensearch/indices/rucdkTYiSH2Y8n_8i_rbJQ/meta-Yru_noQB1TFRlOR8LLLM.dat"}},"status":404}
when checking the snapshot repo I can see that the mentioned meta-Yru_noQB1TFRlOR8LLLM.dat
File is really missing in /backup/opensearch/indices/rucdkTYiSH2Y8n_8i_rbJQ/
. We have a lot of files in that directory but all of them are named snap-*.dat
If I list the indices in snap 20230214_044701
I can see that the index I try to restore is contained in that snaphot
curl -X GET -k 'https://admin:redacted@localhost:9200/_snapshot/seppl-snap-repo/20230214_044701' 2>/dev/null|jq|grep redacted-2022-09-07
"redacted-2022-09-07"
Configuration:
the snapshots are created every hour with the following bash script on the first opensearch node in cluster (always the same instance)
#!/bin/bash
SNAP_NAME=$(date +'%Y%m%d_%H%M%S')
curl -k -X PUT "https://admin:redacted@localhost:9200/_snapshot/seppl-snap-repo/$SNAP_NAME?wait_for_completion=true" -H 'Content-Type: application/json' >/dev/null 2>&1
exit 0
Relevant Logs or Screenshots:
there are literally no additional loglines in the container’s log if a restore is called and fails