This eventually leads to an OOM exception, even though there’s plenty of field cache data that could be dropped.
The cluster consists of 3 nodes with one index and ~1000 primary shards + 1 replica. There heap is configured to use 90GB of RAM. This means there are about 7.4 shards per 1GB heap.
Gotcha. Just Open Distro security. I would definitely look at adding performance analyzer and using PerfTop, it was pretty much built to help solve this type of issue.
While I’m still struggling to get the Performance Analyzer activated here’s an observation:
The heap size correlates with the number of segments (or documents, but I guess it’s segments). The fullest cluster has more than 80000 segments. Not sure if this is a lot or not.
When I close and open the index, a lot of memory is freed. More than query cache and the ‘field data memory’
Are you running perftop NodeAnalysis? I think that has heap usage insight.
Re-reading this thread, I’m wondering if you have too much heap. I’m not an expert in this area tbh but I dug up this article about having too much heap (greater than 32gb)
I guess I’ve no choice to use more than 32gb if 32gb are not sufficient. I understand that exceeding the border around 32gb will disable the pointer compression, however, using way more memory should compensate that. I’ll would also happily trade long running garbage collection for dying nodes because of OOM
It turned out that limiting the field data cache mitigates the problem. According to the manual this cache grows unbounded until the circuit breaker saves day and the cache must be deleted manually.
While this might work for certain indices this doesn’t seem to work for my index. But the initial question still remains, it’s totally unclear to me why elasticsearch requires more than 100gb of heap to operate. Even when I add all caches and all the memory that the segments require in memory, I do not get a value that even comes close to the required memory needed.
Is it possible to see the memory requirements of read and write requests? Since bulk requests cannot be used for $reasons, maybe this is increasing the heap usage?