OpenSearch Upgrade 2.15 -> 3.4 using almost double the amount of memory

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): 2.15, 3.4

Describe the issue: Running the same performance tests, OpenSearch is now using almost double the amount of memory. This is causing us to hit circuit breakers we’ve never hit before on 2.15. Setting the real memory flag to false seems to be a semi work around, but we understand the potential pitfalls with this and would like to keep said flag at true if at all possible.

We have discovered that aggregate queries seem to not play well and do spike the memory usage quite a bit. We test without any aggregate queries we do get a lot more queries ran without hitting said circuit breaker.

Configuration: Docker containers with a simple console app running 12 threads querying opensearch with a variety of queries.

200k docs

1G heap which is default for the docker container (I understand that is low, however, the point is 2.15 handles this load without circuit breaking and 3.x’s whole thing is performance and better usage of space and memory…)

Relevant Logs or Screenshots:

Hi @RyanCav, I understand your point, but 1G for heap size that is low for the production cluster.
What’s the reason behind this setting?

Due to the number of clients we are supporting we try to save on memory in every way we can, and reiterating from the original post, running at 1G was sufficient when on version 2.15.

We also ran into the same problems on 3.4 when running at 2G heap, so even with double the amount of memory previously needed, we are running into said issues.

So, I guess our question is: why did the low-end memory requirements jump so much between 2.15 and 3.4 and is there any way/setting or otherwise to get around this circuit breaker issue other than the work around mentioned in the original post?

@RyanCav How did you deploy your OpenSearch clusters? Do you have more than one cluster? Are they all virtual machines on the same hypervisor?

  1. There are a variety of ways we deploy, but generally we just make some minor modifications to the yml, jvm options, and service and env bat files and dropping the package onto a windows server and start up the service.
  2. Usually yes, about 8 per server
  3. No VMs currently deployed to users, though we are moving in that direction.

@pablo it looks as though I was able to get it working without the use_real_memory = false by using some aggressive GC tuning and reducing fielddata cache size and queries cache size.

GC configs:

-XX:GCTimeRatio=3
-XX:G1PeriodicGCInterval=300000

indices flags:

indices.fielddata.cache.size: 10%
indices.queries.cache.size: 5%
1 Like