Opensearch ingestion is slow and timeouts are occuring very frequently

Iqra_shafiq · December 3, 2024, 5:08am

**Versions: **
Opensearch version: 2.11.0
Server OS: Ubuntu 22.04

Issue:
I have 3 nodes of opensearch in a cluster, each deployed on a separate virtual machine with 12 vCPUs and 20 GB RAM each.
The problem is that ingestion is very slow and timeouts occur again and again. In logs there is nothing useful. When I try to ingest more data in bulk, randomly any node disconnect from the cluster. How can I overcome this issue and make indexing speed faster. I want an EPS of 5000.

Mantas · December 3, 2024, 10:41am

Hi @Iqra_shafiq, when it times out, do you see any errors in your logs (heap size or similar)?

Best,
mj

cseugen · December 5, 2024, 7:48pm

Hi, have you tried to reduce the size of the bulk?

Iqra_shafiq · December 6, 2024, 10:15am

No, there are no errors spotted.

Iqra_shafiq · December 6, 2024, 10:16am

I have bulk size of 10000 but I have various pipelines which are sending data to it. Also, My one node randomly keeps on disconnecting and reconnecting automatically after sometime.

sdas018 · December 6, 2024, 1:23pm

what is your stack .? are you using logstash .?

yeonghyeonKo · December 6, 2024, 4:32pm

Like Mantas & sdas018, I’m curious how the error log occurred. Was there any sign of an error with Circuit Breaking Exception?

Iqra_shafiq · December 9, 2024, 4:51am

No, I am not using logstash. I am using filebeat and kafka.

Iqra_shafiq · December 9, 2024, 4:52am

No, there is no circuit breaking exception.

Iqra_shafiq · December 11, 2024, 6:21am

[2024-12-11T06:17:21,534][WARN ][o.o.m.j.JvmGcMonitorService] [n3] [gc][young][1699595][37052] duration [3.1s], collections [1]/[3.7s], total [3.1s]/[4.4h], memory [8.8gb]->[8gb]/[20gb], all_pools {[young] [880mb]->[0b]/[0b]}{[old] [7.8gb]->[7.9gb]/[20gb]}{[survivor] [106.2mb]->[85.7mb]/[0b]}
[2024-12-11T06:17:21,535][WARN ][o.o.m.j.JvmGcMonitorService] [n3] [gc][1699595] overhead, spent [3.1s] collecting in the last [3.7s]
[2024-12-11T06:17:21,542][INFO ][o.o.a.MonitorRunnerService] [n3] Executing scheduled monitor - id: nqBUkZABavtMuA8SVbrF, type: QUERY_LEVEL_MONITOR, periodStart: 2024-12-11T06:16:20.827Z, periodEnd: 2024-12-11T06:17:20.827Z, dryrun: false, executionId: nqBUkZABavtMuA8SVbrF_2024-12-11T06:17:21.542950714_6a1f2ca6-5bdc-4f7b-8050-09891cd4ac02
[2024-12-11T06:17:21,543][INFO ][o.o.i.s.IndexShard       ] [n3] [opensearch-ad-plugin-result-duplicate_id_detec][1] primary-replica resync completed with 0 operations
[2024-12-11T06:17:32,541][WARN ][o.o.c.s.ClusterApplierService] [n3] cluster state applier task [ApplyCommitRequest{term=1506, version=395968, sourceNode={n1}{mmmpyOL0TsCm2kDyGXFO8Q}{3dE9FmVTRiybqSD7ZwrDjg}{192.168.0.82}{192.168.0.82:9300}{dimr}{shard_indexing_pressure_enabled=true}}] took [30s] which is above the warn threshold of [30s]: [running task [ApplyCommitRequest{term=1506, version=395968, sourceNode={n1}{mmmpyOL0TsCm2kDyGXFO8Q}{3dE9FmVTRiybqSD7ZwrDjg}{192.168.0.82}{192.168.0.82:9300}{dimr}{shard_indexing_pressure_enabled=true}}]] took [0ms], [connecting to new nodes] took [0ms], [applying settings] took [0ms], [running applier [org.opensearch.repositories.RepositoriesService@31f975a3]] took [0ms], [running applier [org.opensearch.indices.cluster.IndicesClusterStateService@42d319f1]] took [28892ms], [running applier [org.opensearch.script.ScriptService@24f716c9]] took [0ms], [running applier [org.opensearch.snapshots.RestoreService@32c05185]] took [0ms], [running applier [org.opensearch.ingest.IngestService@49fee6bc]] took [0ms], [running applier [org.opensearch.search.pipeline.SearchPipelineService@5392d09c]] took [0ms], [running applier [org.opensearch.action.ingest.IngestActionForwarder@35f13b2a]] took [0ms], [running applier [org.opensearch.action.admin.cluster.repositories.cleanup.TransportCleanupRepositoryAction$$Lambda$4427/0x00007f17f0d33320@7baf268a]] took [0ms], [running applier [org.opensearch.tasks.TaskManager@3efc0f53]] took [0ms], [running applier [org.opensearch.snapshots.SnapshotsService@631b9516]] took [0ms], [notifying listener [org.opensearch.cluster.InternalClusterInfoService@5d162c39]] took [0ms], [notifying listener [org.opensearch.snapshots.InternalSnapshotsInfoService@34e365f]] took [0ms], [notifying listener [org.opensearch.security.configuration.ClusterInfoHolder@3cc2b005]] took [0ms], [notifying listener [org.opensearch.jobscheduler.sweeper.JobSweeper@1d3d13c3]] took [0ms], [notifying listener [org.opensearch.ad.indices.AnomalyDetectionIndices@2a4365f4]] took [0ms], [notifying listener [org.opensearch.ad.cluster.ADClusterEventListener@5f95c874]] took [650ms], [notifying listener [org.opensearch.ad.cluster.ClusterManagerEventListener@6d58457b]] took [0ms], [notifying listener [org.opensearch.alerting.alerts.AlertIndices@a645ba]] took [0ms], [notifying listener [org.opensearch.alerting.core.JobSweeper@64d16f3c]] took [0ms], [notifying listener [org.opensearch.alerting.util.destinationmigration.DestinationMigrationCoordinator@6f257e49]] took [0ms], [notifying listener [org.opensearch.indexmanagement.indexstatemanagement.IndexStateManagementHistory@550d337]] took [0ms], [notifying listener [org.opensearch.indexmanagement.indexstatemanagement.ManagedIndexCoordinator@1970f0a9]] took [0ms], [notifying listener [org.opensearch.indexmanagement.indexstatemanagement.PluginVersionSweepCoordinator@6b8e4de2]] took [0ms], [notifying listener [org.opensearch.ml.cluster.MLCommonsClusterEventListener@7adc936e]] took [0ms], [notifying listener [org.opensearch.ml.cluster.MLCommonsClusterManagerEventListener@29c1710f]] took [0ms], [notifying listener [org.opensearch.sql.legacy.esdomain.LocalClusterState$$Lambda$2526/0x00007f17f0a24848@72dac7b4]] took [0ms], [notifying listener [org.opensearch.cluster.metadata.SystemIndexMetadataUpgradeService@5496c4e5]] took [0ms], [notifying listener [org.opensearch.cluster.metadata.TemplateUpgradeService@8857bf7]] took [0ms], [notifying listener [org.opensearch.node.ResponseCollectorService@28502a7d]] took [0ms], [notifying listener [org.opensearch.snapshots.SnapshotShardsService@c13afa4]] took [0ms], [notifying listener [org.opensearch.persistent.PersistentTasksClusterService@6ec0e32f]] took [0ms], [notifying listener [org.opensearch.cluster.routing.DelayedAllocationService@4db967c3]] took [0ms], [notifying listener [org.opensearch.indices.store.IndicesStore@6413bfcb]] took [1ms], [notifying listener [org.opensearch.persistent.PersistentTasksNodeService@2fc0c72c]] took [0ms], [notifying listener [org.opensearch.search.asynchronous.management.AsynchronousSearchManagementService@3679006d]] took [0ms], [notifying listener [org.opensearch.securityanalytics.indexmanagment.DetectorIndexManagementService@4dbe1307]] took [0ms], [notifying listener [org.opensearch.geospatial.ip2geo.listener.Ip2GeoListener@3e681dbf]] took [0ms], [notifying listener [org.opensearch.gateway.GatewayService@17b854f9]] took [0ms], [notifying listener [org.opensearch.indices.recovery.PeerRecoverySourceService@69bc540b]] took [0ms], [notifying listener [org.opensearch.indices.replication.SegmentReplicationSourceService@54274410]] took [335ms]
[2024-12-11T06:17:45,225][WARN ][o.o.m.j.JvmGcMonitorService] [n3] [gc][young][1699615][37054] duration [2.5s], collections [1]/[3s], total [2.5s]/[4.4h], memory [8.7gb]->[8.1gb]/[20gb], all_pools {[young] [688mb]->[32mb]/[0b]}{[old] [7.9gb]->[8gb]/[20gb]}{[survivor] [123.1mb]->[58.2mb]/[0b]}
[2024-12-11T06:17:45,225][WARN ][o.o.m.j.JvmGcMonitorService] [n3] [gc][1699615] overhead, spent [2.5s] collecting in the last [3s]
[2024-12-11T06:17:45,230][INFO ][o.o.j.s.JobScheduler     ] [n3] Will delay 146350 miliseconds for next execution of job edr-archives-2024.07.10
[2024-12-11T06:17:45,230][INFO ][o.o.j.s.JobScheduler     ] [n3] Will delay 116387 miliseconds for next execution of job edr-alerts-2024.07.27
[2024-12-11T06:17:45,635][INFO ][o.o.j.s.JobScheduler     ] [n3] Will delay 85311 miliseconds for next execution of job edr-alerts-2024.07.12
[2024-12-11T06:17:51,415][WARN ][o.o.m.j.JvmGcMonitorService] [n3] [gc][young][1699619][37056] duration [1.7s], collections [1]/[3s], total [1.7s]/[4.4h], memory [8.9gb]->[8.1gb]/[20gb], all_pools {[young] [832mb]->[0b]/[0b]}{[old] [8gb]->[8.1gb]/[20gb]}{[survivor] [82.3mb]->[68.3mb]/[0b]}
[2024-12-11T06:17:51,415][WARN ][o.o.m.j.JvmGcMonitorService] [n3] [gc][1699619] overhead, spent [1.7s] collecting in the last [3s]
[2024-12-11T06:17:51,418][INFO ][o.o.j.s.JobScheduler     ] [n3] Will delay 77158 miliseconds for next execution of job edr-alerts-2024.08.21

Again my node is disconnected and now I searched the logs and above are the logs.

sdas018 · January 20, 2025, 6:34am

@Iqra_shafiq ,

I would like to know the config of your cluster and tweak your settings like allocate 50% jvm of your total memory.

Eg : If you have 10GB memory allocate 4 or 5GB for your jvm and see the performance

also follow below steps
use logstash pipeline which will help you to make your ingestion faster with batch size
Use dedicated ingest node in your config

Mantas · January 20, 2025, 12:57pm

@Iqra_shafiq, What do your monitors look like?

would you mind sharing (if so, please ensure no sensitive information is leaked)?


POST _plugins/_alerting/monitors/_search?pretty=true
{
  "query": {
    "match_all": {}
  }
}

Best,
mj

Topic		Replies	Views
Opensearch Performance tuning OpenSearch	6	841	October 10, 2023
Slow query performance issue Performance Analyzer	39	4189	February 21, 2023
org.opensearch.transport.ConnectTransportException: [opensearch-cluster-master-18][10.42.2.20:9300] connect_timeout[30s] OpenSearch troubleshoot , configure	10	1852	May 30, 2022
Opensearch dashboard is not getting log lines after 10-15 minutes OpenSearch troubleshoot	2	836	August 2, 2023
Performance between OpenSearch installation by docker, VM and pyhsical machine? OpenSearch discuss	6	1991	June 21, 2022

Opensearch ingestion is slow and timeouts are occuring very frequently

Related topics