I’m looking for some suggestions on the problems in seeing throughout my environment, we have a fairly big amount of data coming in, and indexing the vast majority of things to a single daily index.
Size wise, we are looking at:
Daily Index A: 240GB
Daily Index B: 30GB
These are store sizes with replication/sharding.
These are running on an environment running 5 data nodes at 16GB heap sizes each (16GB x 5) on 5 32GB machines.
In addition to this, I have several clients acting as ingesters. I’ve ran these as both 6GB heap size, and 4GB with more client LBing.
Regardless of what I do, my clients right now are constantly oom’ing out. This means we’re effective losing the data and I get a number of rejections from my fluentd instances funneling the data into the clients.
Error looks like this:
[2019-08-16T10:59:22,352][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [elasticsearch-opendistro-es-client-858fc4c7d5-2dwz7] fatal error in thread [Thread-6], exiting
java.lang.OutOfMemoryError: Java heap space
Index A is looking at 200 or so fields at the moment.
Does anyone have any benchmarks or suggestions on ingestion nodes and what they would expect on X workload?
Would it be better to move ingestion to the data nodes?
Am I just way off on my own benchmarking and we just need to give more memory to these nodes?