Data Prepper performance

Hi everyone,

Given my ingestion rate of 250GB per day from one source and that I want to support 100 such sources, what buffer_size and batch_size would be most appropriate for data-prepper? Alternatively, is data-prepper able to effectively process such a high volume of data?

I am thinking of keeping buffer_size = 1500000 and batch_size = 31250. Is there any downside for keeping such high buffer size?


If you have the memory available, keeping buffer size and batch size to a large value will have better performance, and will not have a downside. Here is an example of performance testing that was run for log ingestion (data-prepper/ at main · opensearch-project/data-prepper · GitHub). Note that the buffer size was 200,000 in this case, but 1.5M is fine too depending on the amount of memory available.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.