Searchable Remote Snapshots Hardware Guidance

jong · May 31, 2023, 3:52pm

Is there any guidance or help for what additional hardware is required for searchable remote snapshots in OpenSearch 2.7?

Specifically:

How large does a search node need to be?
Does a cluster specifically need a search node for remote snapshots to work?
How many search nodes are required?
How much data can be supported in this manner?

Would be useful to know have any sort of guidance and info in relation to shards / data size / number of snapshots etc.

Gsmitt · May 31, 2023, 10:20pm

Hey @jong

Thank for the post, I didnt realize there was a remote snapshot, look like it came out on 5/23 . Judging from this blog its per index set https://opensearch.org/blog/searchable-snapshots/

As for

Depends on your environment, meaning how much logs are ingested per day, etc… How long you want to keep those snapshots and how many do you want.

I have a spare node with 500GB drive, works for me.
I also read this documentation about Known limitations.

As for any snapshots, I would use this documentation for guidance.

jong · June 1, 2023, 8:38am

Thanks for the links. So really I’m looking for guidance in terms of number of snapshots / data size. Say I have 250 snapshots, each 50GB in size (or indeed x snapshots, each y GB in size), what hardware do I need?

The guidance for normal shards I believe is 20 shards per GB of heap memory. Is there any guidance for (remote) snapshots? I suspect that this guidance doesn’t exist yet as this is a new feature, I just don’t know where to start here.

How many GB of heap do people have vs. how many/how large snapshots have people managed to get working? Is there theoretically no limit to the number/size of searchable snapshots that can be queried in this manner, with performance being the only bottleneck?

Gsmitt · June 1, 2023, 11:14pm

Hey @jong

That estimates 250 * 50 / 1024 = 12.20703125 TB of storage. not counting OS, etc… That a lot.
I keep my shard around 20-30 GB this depend on the index rotation strategy you going to use.

No sure, but I do know you can over shard your instance which i seen most do. Each environment is different, some want an index set per day, others set it per document value, some want to retain 60 Days with a weekly backup, and some want hourly backups AKA snapshots.

Need to come up with some sort of plan on the number of devices, then get a sum of what you may need. For example, I had 150 nodes ingesting about 5 GB a day using Syslog UDP, and another DMZ I had 24 nodes pushing 35 GB a day using GELF UDP. If i were you, perhaps set up a instance like Docker, throw some logs on it and calculate how many logs per day you’re getting for starters. This will make calculation much easier.

Topic		Replies	Views
Repository usage by searchable snapshots Observability	1	30	September 30, 2024
Searchable snapshots and initial cache size Index Management configure	1	23	April 11, 2025
Heap space in data nodes go out of memory when taking snapshots OpenSearch	5	575	July 12, 2024
Opensearch Resource requirements OpenSearch	4	4016	August 31, 2023
OpenSearch searchable snapshot set-up in Glacier tier OpenSearch	1	112	February 24, 2025

Searchable Remote Snapshots Hardware Guidance

Related topics