Versions (relevant - OpenSearch/Dashboard/Server OS/Browser)
- Operator : 2.8.0
- Opensearch : 3.2.0
- Kubernetes Environment
- EKS 1.30
- AWS
Describe the issue
Hi. I`m running opensearch cluster with opensearch operator.
There are two issues when changing the Nodepool replicas.
[Issue #1***] When increasing the node pool replicas, the existing pods also restart.***
(Test : data nodepool replicas 3 → 6)
When the number of OpenSearch node pool replicas changes,
a new ControllerRevision is also created.
❯ k get pods | grep data
opensearch-data-0 1/1 Running 0 16m
opensearch-data-1 1/1 Running 0 14m
opensearch-data-2 1/1 Running 0 5m25s
opensearch-data-3 0/1 Init:0/2 0 65s
opensearch-data-4 0/1 Init:0/2 0 65s
opensearch-data-5 0/1 Init:0/2 0 65s
kubectl get controllerrevision | egrep -e 'statefulset.apps/opensearch-data|NAME'
NAME CONTROLLER REVISION AGE
opensearch-data-58bc78989d statefulset.apps/opensearch-data 28 88s
opensearch-data-84788b946b statefulset.apps/opensearch-data 27 20m
After that, the existing pods are also restarted.
kubectl get pods -L controller-revision-hash | egrep -e 'data|NAME'
NAME READY STATUS RESTARTS AGE CONTROLLER-REVISION-HASH
NAME READY STATUS RESTARTS AGE CONTROLLER-REVISION-HASH
# Existing Pods (Why this restarting ?)
opensearch-data-0 1/1 Running 0 5m31s opensearch-data-5dd76687b6
opensearch-data-1 1/1 Running 0 3m43s opensearch-data-5dd76687b6
opensearch-data-2 1/1 Running 0 118s opensearch-data-5dd76687b6
# New Pods (Re - Starting because controller hash changed)
opensearch-data-3 0/1 Init:0/2 0 4s opensearch-data-5dd76687b6
opensearch-data-4 1/1 Running 0 8m39s opensearch-data-58bc78989d
opensearch-data-5 1/1 Running 0 8m39s opensearch-data-58bc78989d
If anyone knows, please help me. Is this normal behavior?
If not, is there a way to prevent it?
Or, How to troubleshoot this ?
kubectl get controllerrevision opensearch-data-84788b946b -n opensearch-prod -o yaml > old.yamlkubectl get controllerrevision opensearch-data-5dd76687b6 -n opensearch-prod -o yaml > new.yaml
diff -u old.yaml new.yaml
[Issue #2**] When reducing the OpenSearch node pool replicas,
the SmartScaler does not work properly.**
(Test : data nodepool replicas 6 → 3)
# kubectl get opensearchclusters opensearch -o jsonpath="{.status}"
{
"availableNodes": 10,
"componentsStatus": [
{
"component": "Restarter",
"status": "InProgress"
},
{
"component": "Scaler",
"description": "data",
"status": "Excluded"
}
],
"health": "yellow",
"initialized": true,
"phase": "RUNNING",
"version": "3.2.0"
}
# kubectl get pods
opensearch-data-0 1/1 Running 0 2m8s
opensearch-data-1 1/1 Running 0 13m
opensearch-data-2 1/1 Running 0 11m
# kubectl get opensearchclusters opensearch -o jsonpath="{.status}"
{
"availableNodes": 9,
"componentsStatus": [
{
"component": "Restarter",
"status": "InProgress"
}
],
"health": "red",
"initialized": true,
"phase": "RUNNING",
"version": "3.2.0"
}
Since SmartScaler is enabled, I expected it to properly migrate the shards from the data nodes being scaled down before reducing them, but it seems this is not happening correctly.
[DEV Tools]
# GET _cat/shards?v=true&h=index,shard,prirep,state,node,unassigned.reason&s=state
index shard prirep state node unassigned.reason
security-auditlog-2025.09.08 0 p UNASSIGNED NODE_LEFT
security-auditlog-2025.09.08 0 r UNASSIGNED NODE_LEFT
.kibana_1 0 p UNASSIGNED NODE_LEFT
.kibana_1 0 r UNASSIGNED NODE_LEFT
security-auditlog-2025.08.30 0 p UNASSIGNED NODE_LEFT
security-auditlog-2025.08.30 0 r UNASSIGNED NODE_LEFT
top_queries-2025.09.02-00378 0 p STARTED opensearch-data-1
top_queries-2025.09.02-00378 0 r STARTED opensearch-data-2
# GET _cluster/settings
"transient": {
"cluster": {
"routing": {
"allocation": {
"enable": "all"
}
}
}
}
Is this the intended behavior, or could there be something misconfigured on my side?
Configuration
- The reason I set
drainDataNodes: falseis that the PVC volumes already exist, and I wanted to prevent shard relocation during a simple restart.
spec:
..
..
confMgmt:
smartScaler: true
general:
..
drainDataNodes: false
..
..
nodePools:
- additionalConfig:
plugins.security.audit.config.enable_rest: "false"
plugins.security.audit.config.enable_transport: "false"
plugins.security.enable_snapshot_restore_privilege: "false"
plugins.security.ssl_cert_reload_enabled: "true"
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
opster.io/opensearch-nodepool: data
topologyKey: kubernetes.io/hostname
annotations:
ad.datadoghq.com/opensearch.checks: |
{
"elastic": {
"init_config": {},
"instances": [
{
"tls_verify": false,
"url": "https://%%host%%:9200",
"username": "ENC[k8s_secret@opensearch-prod/admin-credentials-secret/username]",
"password": "ENC[k8s_secret@opensearch-prod/admin-credentials-secret/password]",
"index_stats": "true",
"pshard_stats": "true",
"cat_allocation_stats": "true",
"pending_task_stats": "true"
}
]
}
}
component: data
diskSize: 1000Gi
env:
- name: DISABLE_INSTALL_DEMO_CONFIG
value: "true"
nodeSelector:
karpenter.sh/nodepool: opensearch-nodepool
pdb:
enable: true
maxUnavailable: 1
persistence:
pvc:
accessModes:
- ReadWriteOnce
storageClass: ebs-gp3
replicas: 3
resources:
limits:
memory: 20Gi
requests:
cpu: 3000m
memory: 20Gi
roles:
- data
- ingest
tolerations:
- effect: NoSchedule
key: karpenter.sh/nodepool
operator: Equal
value: opensearch-nodepool
Relevant Logs or Screenshots:
Thanks !
