How to avoid downtime /minimise downtime for Opensearch version Upgrade

Deepa · February 25, 2025, 4:12am

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Current Version of Opensearch nodes and dashboard : 2.14.0
New version required for Opensearch nodes and dashboard : 2.19.0

Current Version of Opensearch Operator : 2.5.1
New version required for Opensearch Operator: 2.7.0

Describe the issue:
Am trying to upgrade my cluster to 2.19.0version for Opensearch cluster nodes and Dashboards, Opensearch operator to 2.7.0.

We have the cluster built using helm chart in GKE cluster.
AS per the current configuration, we have the docker image built for each of the above.
But issue am facing is, While upgrading the version , theres a downtime for opensearch operator ( as am doing the operator upgrade first)
After that, as per steps (in Documentation) if i need to restart node by node, its not possible. As my current configuration , docker image is mentioned as general configuration applicable to all nodes.
so the moment i make the change to new Image version , it will be synced in argoCD application and all nodes are pointed to new image including the master node.
Again Opensearch Dashboard Configuration is also included in the same values.yaml.
SO Dashboard becomes unavailable until all the nodes are restarted and taken the new version upgrade.
SO there’s sufficient high downtime period during this upgrade.

Configuration:
Configuration Attached.

Relevant Logs or Screenshots:

Deepa · February 25, 2025, 4:14am

Opensearch Configuration

==========================================================================
opensearchCluster:
  enabled: true
  
  clusterName: opensearch
  confMgmt:
    
		smartScaler: true
  general:
		#########################################################################################################
		# GENERAL
		#--------------------------------------------------------------------------------------------------------
		# general section define docker image basically
		# also include common setting in all nodes
		#--------------------------------------------------------------------------------------------------------
		httpPort: "9200"
	   
		version: 2.14.0
		image: <path to hub repo>
		imagePullPolicy: Always
		imagePullSecrets: []
		
		serviceName: "opensearch"
		drainDataNodes: true
		
		setVMMaxMapCount: false
   dashboards:
    
		enable: true
		version: 2.14.0
		image: <path to hub repo>

		imagePullPolicy: Always
		imagePullSecrets: [] 

		replicas: 2
		
	nodePools:
	
	###master node 
	- component: master
      labels:
        lv_status: "BUILD"
        lv_application: "OPENSEARCH"
        lv_sla: "GOLD"
        lv_contact: ""
      replicas: 4
      pdb:
        enable: true
        minAvailable: 2
      diskSize: "30Gi"
      jvm: -Xmx1600M -Xms1600M
      roles:
        - "cluster_manager"
      resources:
        requests:
          memory: "3Gi"
          cpu: "800m"
        limits:
          memory: "3Gi"
          cpu: "800m"
      persistence:
        pvc:
          storageClass: standard-rwo
          accessModes: 
            - ReadWriteOnce
			
	#####hot node		
	- component: hot
      labels:
          lv_status: "BUILD"
          lv_application: "OPENSEARCH"
          lv_sla: "GOLD"
          lv_contact: ""
      replicas: 4
      pdb:
        enable: true
        minAvailable: 2
      diskSize: "600Gi"
      jvm: -Xmx4000M -Xms4000M
      nodeSelector:
      resources:
         requests:
            memory: "8Gi"
            cpu: 3000m
         limits:
            memory: "8Gi"
            cpu: 3000m
      roles:
        - "data"
        - "ingest"
      persistence:
        pvc:
          storageClass: premium-rwo
          accessModes:
            - ReadWriteOnce	
			
	##warmnode 		
	- component: digital-warm
      labels:
          lv_status: "BUILD"
          lv_application: "OPENSEARCH"
          lv_sla: "GOLD"
          lv_contact: ""
      replicas: 4
      pdb:
        enable: true
        minAvailable: 2
      diskSize: "800Gi"
      jvm: -Xmx4000M -Xms4000M
      nodeSelector:
      resources:
         requests:
            memory: "8Gi"
            cpu: "2300m"
         limits:
            memory: "8Gi"
            cpu: "2300m"
      roles:
        - "data"
      persistence:
        pvc:
          storageClass: standard-rwo
          accessModes:
            - ReadWriteOnce 
	
	##cold node
	  - component: digital-cold
      labels:
          lv_status: "BUILD"
          lv_application: "OPENSEARCH"
          lv_sla: "GOLD"
          lv_contact: ""
      replicas: 4
      pdb:
        enable: true
        minAvailable: 2
      diskSize: "800Gi"
      jvm: -Xmx4000M -Xms4000M
      nodeSelector:
      resources:
         requests:
            
            memory: "8Gi"
            cpu: "1600m"  
         limits:
            memory: "8Gi"
            cpu: "1600m"
      roles:
        - "data"
      persistence:
        pvc:
          storageClass: standard-rwo
          accessModes:
            - ReadWriteOnce
			
	###security settings/certs/serviceaccount setting follows.
=======================================================================
Opensearch Operator Configuration


nameOverride: ""
fullnameOverride: ""

nodeSelector: {}
tolerations: []
securityContext:
  #--------------------------------------------------------------------------------------------------------
  # Force to run as non-root account and skipt init container
  #--------------------------------------------------------------------------------------------------------
  runAsNonRoot: true
  runAsUser: 1000
  #fsGroup: 1000

manager:
  #--------------------------------------------------------------------------------------------------------
  # PPROF end point will help memory leak issue
  #--------------------------------------------------------------------------------------------------------
  pprofEndpointsEnabled: true
  securityContext:
    #--------------------------------------------------------------------------------------------------------
    # Force to run as non-root account and skipt init container
    #--------------------------------------------------------------------------------------------------------
    runAsNonRoot: true
    runAsUser: 1000
    allowPrivilegeEscalation: false
  extraEnv: 
    - name: SKIP_INIT_CONTAINER
      value: "true"
  resources:
    limits:
      cpu: 200m
      memory: 500Mi
    requests:
      cpu: 100m
      memory: 350Mi

  livenessProbe:
    failureThreshold: 10
    httpGet:
      path: /healthz
      port: 8081
    periodSeconds: 30
    successThreshold: 1
    timeoutSeconds: 3
    initialDelaySeconds: 120

  readinessProbe:
    failureThreshold: 10
    httpGet:
      path: /readyz
      port: 8081
    periodSeconds: 30
    successThreshold: 1
    timeoutSeconds: 3
    initialDelaySeconds: 120

  # ---------------------------------------------------------------------------------------------------------------------
 
  # To recover Master PODs, without this setting forever in "waiting cluster be available..." errror message 
  # ---------------------------------------------------------------------------------------------------------------------
  parallelRecoveryEnabled: true
  # ---------------------------------------------------------------------------------------------------------------------

  image:
    #--------------------------------------------------------------------------------------------------------
    # OpenSearch operator image has been updated through Harbor
    #--------------------------------------------------------------------------------------------------------
    repository: <path to hub repo>
    #tag: "2.5.1"
    ### Upgrading version from 2.5.1 to 2.7.0 
    tag: "2.7.0"
    pullPolicy: "Always"

  ## Optional array of imagePullSecrets containing private registry credentials
  imagePullSecrets: []
  # - name: secretName

  dnsBase: cluster.local

  # Log level of the operator. Possible values: debug, info, warn, error
  #--------------------------------------------------------------------------------------------------------
  # As PODs are running in GKE, logs will go to GKE LOG REPO, and it will have cost
  # So, changed log level to WARN to save cost
  #--------------------------------------------------------------------------------------------------------
  loglevel: warn
  #--------------------------------------------------------------------------------------------------------
  # If a watchNamespace is specified, the manager's cache will be restricted to
  # watch objects in the desired namespace. Defaults is to watch all namespaces.
  #
 
  #--------------------------------------------------------------------------------------------------------
  watchNamespace: observability

# Install the Custom Resource Definitions with Helm
installCRDs: true

serviceAccount:
 #######################################################################################
 <Serviceaccount configurtions>
 ##################################################################################

kubeRbacProxy:
  enable: false
  securityContext:
    allowPrivilegeEscalation: false
  resources:
    limits:
      cpu: 50m
      memory: 50Mi
    requests:
      cpu: 25m
      memory: 25Mi

  livenessProbe:
    failureThreshold: 10
    httpGet:
      path: /healthz
      port: 10443
      scheme: HTTPS
    periodSeconds: 30
    successThreshold: 1
    timeoutSeconds: 3
    initialDelaySeconds: 120

  readinessProbe:
    failureThreshold: 10
    httpGet:
      path: /healthz
      port: 10443
      scheme: HTTPS
    periodSeconds: 30
    successThreshold: 1
    timeoutSeconds: 30
    initialDelaySeconds: 120

  image:
    #--------------------------------------------------------------------------------------------------------
    # Using image uploaded
    #--------------------------------------------------------------------------------------------------------
    repository: "<path>kube-rbac-proxy"
    tag: "v0.15.0"

Topic		Replies	Views
UPDATE Version 1 to version 2 do opensearch, dashboard OpenSearch upgrade	4	1355	January 26, 2023
OpenSearch Dashboard not working OpenSearch Dashboards	0	2176	March 15, 2023
Opensearch Upgrade from 1.2.x to 2.4 Documentation OpenSearch upgrade	1	427	January 19, 2023
Opensearch vs Opensearch dashaboard version compatibility OpenDistro releases , discuss	1	746	October 14, 2022
What are different ways to check the current Opensearch Version and Update to latest OpenDistro discuss	4	12496	July 3, 2024

How to avoid downtime /minimise downtime for Opensearch version Upgrade

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): Current Version of Opensearch nodes and dashboard : 2.14.0 New version required for Opensearch nodes and dashboard : 2.19.0

Current Version of Opensearch Operator : 2.5.1 New version required for Opensearch Operator: 2.7.0

Related topics

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Current Version of Opensearch nodes and dashboard : 2.14.0
New version required for Opensearch nodes and dashboard : 2.19.0

Current Version of Opensearch Operator : 2.5.1
New version required for Opensearch Operator: 2.7.0