Data Prepper performance

harshit_patel · March 2, 2023, 9:08am

Hi everyone,

Given my ingestion rate of 250GB per day from one source and that I want to support 100 such sources, what buffer_size and batch_size would be most appropriate for data-prepper? Alternatively, is data-prepper able to effectively process such a high volume of data?

I am thinking of keeping buffer_size = 1500000 and batch_size = 31250. Is there any downside for keeping such high buffer size?

@graytaylor0

graytaylor0 · March 6, 2023, 3:37pm

If you have the memory available, keeping buffer size and batch size to a large value will have better performance, and will not have a downside. Here is an example of performance testing that was run for log ingestion (data-prepper/latest_performance_test_results.md at main · opensearch-project/data-prepper · GitHub). Note that the buffer size was 200,000 in this case, but 1.5M is fine too depending on the amount of memory available.

system · May 5, 2023, 3:38pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Data Prepper is utilizing only 2 cpus even though number of workers are 8 Data Prepper	3	425	May 15, 2023
Data-Prepper, Fluent bit stack unable to handle large files Data Prepper	1	678	May 31, 2024
Data Prepper buffer does not have enough capacity left Data Prepper troubleshoot	4	426	October 6, 2024
Data Prepper - Event Recovery and Backpressure Data Prepper configure , feature-request	3	429	May 26, 2022
RabbitMQ source for Data Prepper. Data Prepper feature-request	4	774	December 16, 2023

Data Prepper performance

Related topics