Opensearch backed storage s3 translog failing to keep with the indexing speed

evlis · July 31, 2024, 12:29pm

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
2.15.0

Describe the issue:
I have a spark job that indexes about 10Tb of data(2bb docs) into Opensearch and it fails with next error:

OpenSearchHadoopRemoteException: i_o_exception: Failed to upload metadata__9223372036854775806__9223372036854775425__9223370314459207742__-1452333006__1;org.opensearch.hadoop.rest.OpenSearchHadoopRemoteException: i_o_exception: Unable to upload object [translog/....] using a single upload;org.opensearch.hadoop.rest.OpenSearchHadoopRemoteException: s3_exception: s3_exception: Please reduce your request rate. (Service: S3, Status Code: 503, Request ID: XXXXXXXXXXX)

I’ve noticed that it uploads a lot of small(50-200kb) translog files and was wondering what settings i need to tune to overcome this error.
I’ve tried to increase the translog buffer interval, but it didn’t help much.

Configuration:

"cluster.routing.allocation.balance.prefer_primary": true,
"segrep.pressure.enabled": true,
"cluster.remote_store.translog.buffer_interval": "10s",

Relevant Logs or Screenshots:

Topic		Replies	Views
OpenSearch Query failing with "reducePhase.aggregations" is null OpenSearch troubleshoot	1	1921	August 1, 2023
Reindex job failing with search phase execution exception OpenSearch	4	1077	August 19, 2024
Issue with async bulkrequest OpenSearch	1	471	March 27, 2024
Getting weird error when trying to load saved objects in OSD 2.7.0 OpenSearch Dashboards	4	1455	June 15, 2023
Transform Job "Failed to get the modified buckets in source indices" Index Management troubleshoot , index-management	2	222	April 30, 2024

Opensearch backed storage s3 translog failing to keep with the indexing speed

Related topics