I moved from ELK 7.x about a month ago. So I’m still learning Opensearch. I am seeing slower indexing performance so far though. I am using Logstash with the Opensearch output plugin to push logs to Opensearch.
I see the indexing fluctuate from 1600 to 3200 EPS consistently. ELK EPS was quite a bit higher and more consistent with the same underlying setup. We create three indexes per day with only one of them being quite large. Two are around 100MB per day and the other is anywhere from 40GB to 70GB per day.
I’ve read what I can find and am not seeing how I up the EPS so far. I’m using the same index template as I did with ELK. Refresh interval is set to 5 minutes, one shard and no replicas.
I am running in docker via docker swarm. Docker compose did not perform very well. Just one Opensearch v1.3.0 node and a dashboards docker container. It is running in an AWS memory optimized instance as was ELK.
We observed similar performance degradation while upgrading from 1.2.4 to 1.3. This OpenSearch cluster is deployed via Helm Chart on a k8s environment. We’ll try +UseG1GC .
How to enable G1GC for helm chart managed cluster?
We observed the performance degradation with OpenSearch 1.3 as well. @mrweber Your bug report helps a lot. The OpenSearch cluster we have is deployed with Helm Chart and running on top of k8s (managed by Rancher).
To apply the G1GC collector is a bit tricky for helm chart. By adding the config map may or may not work. We updated the helm chart to obtain the config/jvm.options . I’m not Java people hence it takes awhile for me to apply the change. One key point is the number in the fron of each line. That means which JDK version to apply the line. The default is 14 for G1CG. OpenSearch 1.3 uses JDK 11. Hence you need to change the version range for the G1CG otherwise, it’ll go SerialGC which has very poor performance.
## GC configuration
8-13:-XX:+UseConcMarkSweepGC
8-13:-XX:CMSInitiatingOccupancyFraction=75
8-13:-XX:+UseCMSInitiatingOccupancyOnly
## G1GC Configuration
# NOTE: G1 GC is only supported on JDK version 10 or later
# to use G1GC, uncomment the next two lines and update the version on the
# following three lines to your version of the JDK
10-13:-XX:-UseConcMarkSweepGC
10-13:-XX:-UseCMSInitiatingOccupancyOnly
10-14:-XX:+UseG1GC
10-14:-XX:G1ReservePercent=25
10-14:-XX:InitiatingHeapOccupancyPercent=30
As you can see from the OpenSearch exporter, the indexing rate is much higher and the GC latency and times are way lower. This change along with the refresh_interval increasing brings the performance back as OpenSearch 1.2.4. Or maybe even better since we had refresh_interval as 5sec before. It’s 60s now.
Thanks a lot @reta and @hugok .
After running the updated helm, the configuration was updated as bellow:
cd config
[opensearch@logs-corporativos-data-3 config]$ cat jvm.options
## GC configuration
8-13:-XX:+UseConcMarkSweepGC
8-13:-XX:CMSInitiatingOccupancyFraction=75
8-13:-XX:+UseCMSInitiatingOccupancyOnly
## G1GC Configuration
# NOTE: G1 GC is only supported on JDK version 10 or later
# to use G1GC, uncomment the next two lines and update the version on the
# following three lines to your version of the JDK
10-13:-XX:-UseConcMarkSweepGC
10-13:-XX:-UseCMSInitiatingOccupancyOnly
10-14:-XX:+UseG1GC
10-14:-XX:G1ReservePercent=25
10-14:-XX:InitiatingHeapOccupancyPercent=30
Hi there - I know some time has passed since the last comment on this thread.
I would also be interested to know, in general how indexing performance might be measured and how I could identify a lag/buildup in documents and possibly alert on it.
I’d also interested to know where that grafana dashboard has come from? I don’t recognise it from the opensearch prometheus exporter.