Best approach to retrieve large set of data for download use case

gokulnithya03 · July 16, 2025, 9:51am

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Opensearch version - 2.11

Describe the issue:
We are creating a API to retrieve more than 10k records from opensearch. Maximum might be around 500k and we are not worried about realtime data

What would be the better approach to retrieve the data - search_after or PIT API with search_after with memory, cpu and storage considerations?

Also, what would happen in case of single node or shrad failure?

Configuration:

Data Node
r6g.2xlarge.search

Number of data nodes

3

Storage type

EBS

EBS volume type

General Purpose (SSD) - gp3

EBS volume size (GiB)

200

Provisioned IOPS

6000 IOPS

Provisioned Throughput (MiB/s)

250 MiB/s

Master Node

Instance type

m6g.large.search

Number of master nodes

3

Relevant Logs or Screenshots:

Topic		Replies	Views
Read billion records OpenSearch discuss	2	136	July 19, 2024
Tunning search more than 100G OpenSearch	0	41	July 7, 2024
Bulk API data range OpenSearch	0	178	February 13, 2023
Opensearch ingestion is slow and timeouts are occuring very frequently OpenSearch	11	380	January 20, 2025
Opensearch fetch restriction to 10K records OpenSearch troubleshoot	3	4869	August 1, 2022

Best approach to retrieve large set of data for download use case

Related topics