Opensearch ingestion is slow and timeouts are occuring very frequently

**Versions: **
Opensearch version: 2.11.0
Server OS: Ubuntu 22.04

Issue:
I have 3 nodes of opensearch in a cluster, each deployed on a separate virtual machine with 12 vCPUs and 20 GB RAM each.
The problem is that ingestion is very slow and timeouts occur again and again. In logs there is nothing useful. When I try to ingest more data in bulk, randomly any node disconnect from the cluster. How can I overcome this issue and make indexing speed faster. I want an EPS of 5000.

Hi @Iqra_shafiq, when it times out, do you see any errors in your logs (heap size or similar)?

Best,
mj

Hi, have you tried to reduce the size of the bulk?

No, there are no errors spotted.

I have bulk size of 10000 but I have various pipelines which are sending data to it. Also, My one node randomly keeps on disconnecting and reconnecting automatically after sometime.

what is your stack .? are you using logstash .?

Like Mantas & sdas018, I’m curious how the error log occurred. Was there any sign of an error with Circuit Breaking Exception?

No, I am not using logstash. I am using filebeat and kafka.

No, there is no circuit breaking exception.

[2024-12-11T06:17:21,534][WARN ][o.o.m.j.JvmGcMonitorService] [n3] [gc][young][1699595][37052] duration [3.1s], collections [1]/[3.7s], total [3.1s]/[4.4h], memory [8.8gb]->[8gb]/[20gb], all_pools {[young] [880mb]->[0b]/[0b]}{[old] [7.8gb]->[7.9gb]/[20gb]}{[survivor] [106.2mb]->[85.7mb]/[0b]}
[2024-12-11T06:17:21,535][WARN ][o.o.m.j.JvmGcMonitorService] [n3] [gc][1699595] overhead, spent [3.1s] collecting in the last [3.7s]
[2024-12-11T06:17:21,542][INFO ][o.o.a.MonitorRunnerService] [n3] Executing scheduled monitor - id: nqBUkZABavtMuA8SVbrF, type: QUERY_LEVEL_MONITOR, periodStart: 2024-12-11T06:16:20.827Z, periodEnd: 2024-12-11T06:17:20.827Z, dryrun: false, executionId: nqBUkZABavtMuA8SVbrF_2024-12-11T06:17:21.542950714_6a1f2ca6-5bdc-4f7b-8050-09891cd4ac02
[2024-12-11T06:17:21,543][INFO ][o.o.i.s.IndexShard       ] [n3] [opensearch-ad-plugin-result-duplicate_id_detec][1] primary-replica resync completed with 0 operations
[2024-12-11T06:17:32,541][WARN ][o.o.c.s.ClusterApplierService] [n3] cluster state applier task [ApplyCommitRequest{term=1506, version=395968, sourceNode={n1}{mmmpyOL0TsCm2kDyGXFO8Q}{3dE9FmVTRiybqSD7ZwrDjg}{192.168.0.82}{192.168.0.82:9300}{dimr}{shard_indexing_pressure_enabled=true}}] took [30s] which is above the warn threshold of [30s]: [running task [ApplyCommitRequest{term=1506, version=395968, sourceNode={n1}{mmmpyOL0TsCm2kDyGXFO8Q}{3dE9FmVTRiybqSD7ZwrDjg}{192.168.0.82}{192.168.0.82:9300}{dimr}{shard_indexing_pressure_enabled=true}}]] took [0ms], [connecting to new nodes] took [0ms], [applying settings] took [0ms], [running applier [org.opensearch.repositories.RepositoriesService@31f975a3]] took [0ms], [running applier [org.opensearch.indices.cluster.IndicesClusterStateService@42d319f1]] took [28892ms], [running applier [org.opensearch.script.ScriptService@24f716c9]] took [0ms], [running applier [org.opensearch.snapshots.RestoreService@32c05185]] took [0ms], [running applier [org.opensearch.ingest.IngestService@49fee6bc]] took [0ms], [running applier [org.opensearch.search.pipeline.SearchPipelineService@5392d09c]] took [0ms], [running applier [org.opensearch.action.ingest.IngestActionForwarder@35f13b2a]] took [0ms], [running applier [org.opensearch.action.admin.cluster.repositories.cleanup.TransportCleanupRepositoryAction$$Lambda$4427/0x00007f17f0d33320@7baf268a]] took [0ms], [running applier [org.opensearch.tasks.TaskManager@3efc0f53]] took [0ms], [running applier [org.opensearch.snapshots.SnapshotsService@631b9516]] took [0ms], [notifying listener [org.opensearch.cluster.InternalClusterInfoService@5d162c39]] took [0ms], [notifying listener [org.opensearch.snapshots.InternalSnapshotsInfoService@34e365f]] took [0ms], [notifying listener [org.opensearch.security.configuration.ClusterInfoHolder@3cc2b005]] took [0ms], [notifying listener [org.opensearch.jobscheduler.sweeper.JobSweeper@1d3d13c3]] took [0ms], [notifying listener [org.opensearch.ad.indices.AnomalyDetectionIndices@2a4365f4]] took [0ms], [notifying listener [org.opensearch.ad.cluster.ADClusterEventListener@5f95c874]] took [650ms], [notifying listener [org.opensearch.ad.cluster.ClusterManagerEventListener@6d58457b]] took [0ms], [notifying listener [org.opensearch.alerting.alerts.AlertIndices@a645ba]] took [0ms], [notifying listener [org.opensearch.alerting.core.JobSweeper@64d16f3c]] took [0ms], [notifying listener [org.opensearch.alerting.util.destinationmigration.DestinationMigrationCoordinator@6f257e49]] took [0ms], [notifying listener [org.opensearch.indexmanagement.indexstatemanagement.IndexStateManagementHistory@550d337]] took [0ms], [notifying listener [org.opensearch.indexmanagement.indexstatemanagement.ManagedIndexCoordinator@1970f0a9]] took [0ms], [notifying listener [org.opensearch.indexmanagement.indexstatemanagement.PluginVersionSweepCoordinator@6b8e4de2]] took [0ms], [notifying listener [org.opensearch.ml.cluster.MLCommonsClusterEventListener@7adc936e]] took [0ms], [notifying listener [org.opensearch.ml.cluster.MLCommonsClusterManagerEventListener@29c1710f]] took [0ms], [notifying listener [org.opensearch.sql.legacy.esdomain.LocalClusterState$$Lambda$2526/0x00007f17f0a24848@72dac7b4]] took [0ms], [notifying listener [org.opensearch.cluster.metadata.SystemIndexMetadataUpgradeService@5496c4e5]] took [0ms], [notifying listener [org.opensearch.cluster.metadata.TemplateUpgradeService@8857bf7]] took [0ms], [notifying listener [org.opensearch.node.ResponseCollectorService@28502a7d]] took [0ms], [notifying listener [org.opensearch.snapshots.SnapshotShardsService@c13afa4]] took [0ms], [notifying listener [org.opensearch.persistent.PersistentTasksClusterService@6ec0e32f]] took [0ms], [notifying listener [org.opensearch.cluster.routing.DelayedAllocationService@4db967c3]] took [0ms], [notifying listener [org.opensearch.indices.store.IndicesStore@6413bfcb]] took [1ms], [notifying listener [org.opensearch.persistent.PersistentTasksNodeService@2fc0c72c]] took [0ms], [notifying listener [org.opensearch.search.asynchronous.management.AsynchronousSearchManagementService@3679006d]] took [0ms], [notifying listener [org.opensearch.securityanalytics.indexmanagment.DetectorIndexManagementService@4dbe1307]] took [0ms], [notifying listener [org.opensearch.geospatial.ip2geo.listener.Ip2GeoListener@3e681dbf]] took [0ms], [notifying listener [org.opensearch.gateway.GatewayService@17b854f9]] took [0ms], [notifying listener [org.opensearch.indices.recovery.PeerRecoverySourceService@69bc540b]] took [0ms], [notifying listener [org.opensearch.indices.replication.SegmentReplicationSourceService@54274410]] took [335ms]
[2024-12-11T06:17:45,225][WARN ][o.o.m.j.JvmGcMonitorService] [n3] [gc][young][1699615][37054] duration [2.5s], collections [1]/[3s], total [2.5s]/[4.4h], memory [8.7gb]->[8.1gb]/[20gb], all_pools {[young] [688mb]->[32mb]/[0b]}{[old] [7.9gb]->[8gb]/[20gb]}{[survivor] [123.1mb]->[58.2mb]/[0b]}
[2024-12-11T06:17:45,225][WARN ][o.o.m.j.JvmGcMonitorService] [n3] [gc][1699615] overhead, spent [2.5s] collecting in the last [3s]
[2024-12-11T06:17:45,230][INFO ][o.o.j.s.JobScheduler     ] [n3] Will delay 146350 miliseconds for next execution of job edr-archives-2024.07.10
[2024-12-11T06:17:45,230][INFO ][o.o.j.s.JobScheduler     ] [n3] Will delay 116387 miliseconds for next execution of job edr-alerts-2024.07.27
[2024-12-11T06:17:45,635][INFO ][o.o.j.s.JobScheduler     ] [n3] Will delay 85311 miliseconds for next execution of job edr-alerts-2024.07.12
[2024-12-11T06:17:51,415][WARN ][o.o.m.j.JvmGcMonitorService] [n3] [gc][young][1699619][37056] duration [1.7s], collections [1]/[3s], total [1.7s]/[4.4h], memory [8.9gb]->[8.1gb]/[20gb], all_pools {[young] [832mb]->[0b]/[0b]}{[old] [8gb]->[8.1gb]/[20gb]}{[survivor] [82.3mb]->[68.3mb]/[0b]}
[2024-12-11T06:17:51,415][WARN ][o.o.m.j.JvmGcMonitorService] [n3] [gc][1699619] overhead, spent [1.7s] collecting in the last [3s]
[2024-12-11T06:17:51,418][INFO ][o.o.j.s.JobScheduler     ] [n3] Will delay 77158 miliseconds for next execution of job edr-alerts-2024.08.21

Again my node is disconnected and now I searched the logs and above are the logs.

@Iqra_shafiq ,

I would like to know the config of your cluster and tweak your settings like allocate 50% jvm of your total memory.

Eg : If you have 10GB memory allocate 4 or 5GB for your jvm and see the performance

also follow below steps
use logstash pipeline which will help you to make your ingestion faster with batch size
Use dedicated ingest node in your config

@Iqra_shafiq, What do your monitors look like?

would you mind sharing (if so, please ensure no sensitive information is leaked)?


POST _plugins/_alerting/monitors/_search?pretty=true
{
  "query": {
    "match_all": {}
  }
}

Best,
mj