Given my ingestion rate of 250GB per day from one source and that I want to support 100 such sources, what buffer_size and batch_size would be most appropriate for data-prepper? Alternatively, is data-prepper able to effectively process such a high volume of data?
I am thinking of keeping buffer_size = 1500000 and batch_size = 31250. Is there any downside for keeping such high buffer size?
If you have the memory available, keeping buffer size and batch size to a large value will have better performance, and will not have a downside. Here is an example of performance testing that was run for log ingestion (data-prepper/latest_performance_test_results.md at main · opensearch-project/data-prepper · GitHub). Note that the buffer size was 200,000 in this case, but 1.5M is fine too depending on the amount of memory available.