### Describe the bug
After upgrading our OpenSearch cluster from 2.18.0 to 3.1.…0, we observe a significant increase in disk read operations while write throughput/IOPS remain roughly the same. The workload is ingest-only (searches disabled). On 2.18.0, read IOPS stayed around 200–400 per node; on 3.1.0, under the same conditions, read IOPS jump to 400–3000+ per node.
This looks like a regression in indexing path / segment lifecycle that triggers substantially more background reads during ingestion (probably merge threads).
**Requests for guidance**
Are there changes in 3.1.x that would increase background reads during indexing (e.g., segment lifecycle, merge behavior, replication strategy interactions, compaction/refresh defaults)?
Any recommended settings in 3.1.x to restore 2.18-like read profiles for ingest-only use cases?
### Related component
Indexing:Performance
### To Reproduce
- Create a 3-node cluster on t4g.2xlarge with gp3 volumes (500 GB, 125 MB/s, 3000 IOPS) using image public.ecr.aws/opensearchproject/opensearch:3.1.0.
- Apply the cluster settings shown above.
- Create indices via the index template shown above.
- Ingest time-series logs at ~1.6k docs/sec per node; disable searches entirely (ingest-only).
- Retain only 2 days of indices, total size ~160 GB, reaching 1971 shards (650+ per node).
- Observe per-node disk metrics: read throughput 20–40 MB/s, read IOPS 400–3000+, while writes remain 15–60 MB/s, 200–600 IOPS.
- Repeat the same steps with OpenSearch 2.18.0 and note that read IOPS stay around 200–400 per node.
### Expected behavior
Read IOPS during ingest-only workload should be comparable to 2.18.0 (approximately 200–400 read IOPS per node), given identical hardware, shard layout, and ingestion rate.
**Actual behavior**
On 3.1.0, read IOPS increase to 400–3000+ per node under the same workload and configuration, while write IOPS remain similar to 2.18.0.
**Impact**
Higher disk reads lead to increased storage load and cost risk, potential saturation of gp3 baseline, and reduced indexing headroom.
### Additional Details
**Environment**
test cluster setup:
**OpenSearch version:** 3.1.0 (Docker image public.ecr.aws/opensearchproject/opensearch:3.1.0)
**Previous version (baseline):** 2.18.0
**Cluster size:** 3 data nodes (also cluster-manager/ingest)
**Instance type:** t4g.2xlarge (8 vCPU, 32 GB RAM) on AWS
**Storage:** gp3 500 GB per node, 125 MB/s baseline throughput, 3000 IOPS
**JVM opts:** -Xms16g -Xmx16g -XX:MaxGCPauseMillis=400
**Workload:** time-series log ingestion only (@timestamp field), searches disabled
**Data volume:** indices retained 2 days, total size ~160 GB
**Shards:** 1971 total (~650+ per node)
**Observed metrics (per node)**
**Indexing rate:** ~1.6k docs/sec
**Write throughput:** 15–60 MB/s, 200–600 write IOPS
**Read throughput (problem):** 20–40 MB/s, 400–3000+ peak read IOPS on 3.1.0
(On 2.18.0: typically 200–400 read IOPS)
Cluster settings
```
OPENSEARCH_JAVA_OPTS = "-Xms16g -Xmx16g -XX:MaxGCPauseMillis=400"
node.attr.temp = "hot"
node.roles = "cluster_manager,data,ingest,remote_cluster_client"
plugins.security.ssl.http.enabled = "false"
plugins.security.system_indices.enabled = "false"
plugins.security.ssl.http.clientauth_mode = "NONE"
plugins.security.protected_indices.enabled = "false"
indices.recovery.max_bytes_per_sec = "60mb"
cluster.routing.rebalance.enable = "all"
cluster.routing.allocation.allow_rebalance = "indices_primaries_active"
cluster.routing.allocation.disk.threshold_enabled = "false"
cluster.routing.allocation.node_initial_primaries_recoveries = "2"
cluster.routing.allocation.node_concurrent_recoveries = "16"
cluster.max_shards_per_node = "3000"
cluster.routing.allocation.balance.prefer_primary = "true"
cluster.indices.replication.strategy = "SEGMENT"
```
**Index template:**
```
{
"replication": { "type": "SEGMENT" },
"allocation": { "max_retries": "300" },
"mapping": {
"total_fields": { "limit": "2000" },
"depth": { "limit": "20" },
"ignore_malformed": "true"
},
"refresh_interval": "120s",
"translog": {
"flush_threshold_size": "1024mb",
"sync_interval": "120s",
"durability": "async"
},
"unassigned": {
"node_left": { "delayed_timeout": "5m" }
},
"number_of_replicas": "1",
"merge_on_flush": {
"enabled": "false",
"max_full_flush_merge_wait_time": "30s",
"policy": "default"
},
"codec": "default",
"routing": {
"allocation": {
"require": { "temp": "hot" },
"total_shards_per_node": "3"
}
},
"number_of_shards": "3",
"use_compound_file": "false",
"merge": {
"scheduler": { "max_thread_count": "1" },
"policy.max_merge_at_once": "10",
"policy": "log_byte_size"
}
}
```