How to specify nodeselector in opensearch stateful set

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

OpenSearch version: 2.13.0
Platform: hosted k8s [1.28.8]

Describe the issue:

I have a single node OpenSearch cluster running in k8s
I want to run the OpenSearch pod on a particular node out of 3 nodes.
The deployment type is stateful set
I am using default k8s scheduler

I am trying to add nodeselector in my stateful yaml in various ways but it is failing.

I am referring to following yaml but it is a bit confusing how to use the nodeselector & where [line number 99 & 100] = helm-charts/charts/opensearch/templates/statefulset.yaml at main · opensearch-project/helm-charts · GitHub

Configuration:

  • Stateful set deployment

Relevant Logs or Screenshots:

  • uknown field “spec.nodeSelector”
  • Similar errors are there when I do kubectl apply -f statefulset.yaml

@abhyankar Your link refers to OpenSearch helm charts. I assume that your cluster is deployed with official helm charts. If that’s true, then I don’t understand why you bother to modify statefulset.yml.
You can provide nodeselector value using values.yml file (line 396 - .Values.nodeSelector)

Hello @pablo Thanks for your reply.

I am not using helm chart but I am reading the yaml as reference to use nodeselector.

I tried to use the content from line number 396 from values.yml and tried to put it in spec in my statefulset.yml but when I am deploying it, I am getting error like the 1 I mentioned above thanks.

@abhyankar Could you share your statefulset.yml manifest?

Sure.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: gmsp-es-logging
  namespace: gcs-logging-poc
  labels:
    app: gmsp-es-logging
spec:
  replicas: 3
  selector:
    matchLabels:
      app: gmsp-es-logging
  template:
    metadata:
      name: gmsp-es-logging
      labels:
        app: gmsp-es-logging
      annotations:
        prometheus.io/path: /_prometheus/metrics
        prometheus.io/port: '9200'
        prometheus.io/scrape: 'true'
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/hostname
                operator: In
                values:
                - pool-cloud-staging-1-5qz5abcdef
                - pool-cloud-staging-1-lcrmabcdef
                - pool-cloud-staging-1-y4noabcdef
                  
      volumes:
        - name: opensearchkeystore
          secret:
            secretName: opensearchkeystore
            defaultMode: 420
      initContainers:
        - name: configure-sysctl
          image: opensearchproject/opensearch:2.13.0
          command:
            - /bin/sh
          args:
            - '-c'
            - echo 262144 > /proc/sys/vm/max_map_count
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          imagePullPolicy: IfNotPresent
          securityContext:
            privileged: true
            runAsUser: 0
      containers:
        - name: elasticsearch
          image: opensearchproject/opensearch:2.13.0
          ports:
            - name: http
              containerPort: 9200
              protocol: TCP
            - name: transport
              containerPort: 9300
              protocol: TCP
          env:
            - name: node.name
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
            - name: cluster.name
              value: gmsp-es-logging
            - name: discovery.type
              value: single-node
            - name: network.host
              value: 0.0.0.0
            - name: ES_JAVA_OPTS
              value: '-Xmx4g -Xms6g -Dlog4j2.formatMsgNoLookups=true'
            - name: plugins.security.disabled
              value: 'true'
            - name: OPENSEARCH_INITIAL_ADMIN_PASSWORD
              value: myStrongPassword123@456
          resources:
            limits:
              cpu: '1'
              memory: 2Gi
            requests:
              cpu: 100m
              memory: 1Gi
          volumeMounts:
            - name: gmsp-es-logging
              mountPath: /usr/share/opensearch/data
            - name: opensearchkeystore
              mountPath: /usr/share/opensearch/config/opensearch.keystore
              subPath: opensearch.keystore
          readinessProbe:
            exec:
              command:
                - sh
                - '-c'
                - >
                  #!/usr/bin/env bash -e

                  # If the node is starting up wait for the cluster to be ready
                  (request params: "wait_for_status=green&timeout=1s" )

                  # Once it has started only check that the node itself is
                  responding

                  START_FILE=/tmp/.es_start_file


                  # Disable nss cache to avoid filling dentry cache when calling
                  curl

                  # This is required with Elasticsearch Docker using nss < 3.52

                  export NSS_SDB_USE_CACHE=no


                  http () {
                    local path="${1}"
                    local args="${2}"
                    set -- -XGET -s

                    if [ "$args" != "" ]; then
                      set -- "$@" $args
                    fi

                    if [ -n "${ELASTIC_USERNAME}" ] && [ -n "${ELASTIC_PASSWORD}" ]; then
                      set -- "$@" -u "${ELASTIC_USERNAME}:${ELASTIC_PASSWORD}"
                    fi

                    curl --output /dev/null -k "$@" "http://127.0.0.1:9200${path}"
                  }


                  if [ -f "${START_FILE}" ]; then
                    echo 'Elasticsearch is already running, lets check the node is healthy'
                    HTTP_CODE=$(http "/" "-w %{http_code}")
                    RC=$?
                    if [[ ${RC} -ne 0 ]]; then
                      echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with RC ${RC}"
                      exit ${RC}
                    fi
                    # ready if HTTP code 200, 503 is tolerable if ES version is 6.x
                    if [[ ${HTTP_CODE} == "200" ]]; then
                      exit 0
                    elif [[ ${HTTP_CODE} == "503" && "6" == "6" ]]; then
                      exit 0
                    else
                      echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with HTTP code ${HTTP_CODE}"
                      exit 1
                    fi

                  else
                    echo 'Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )'
                    if http "/_cluster/health?wait_for_status=green&timeout=1s" "--fail" ; then
                      touch ${START_FILE}
                      exit 0
                    else
                      echo 'Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )'
                      exit 1
                    fi
                  fi
            initialDelaySeconds: 10
            timeoutSeconds: 5
            periodSeconds: 10
            successThreshold: 3
            failureThreshold: 3
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          imagePullPolicy: IfNotPresent
          securityContext:
            capabilities:
              drop:
                - ALL
            runAsUser: 1000
            runAsNonRoot: true
      restartPolicy: Always
      serviceAccountName: gmsp-es-logging
      serviceAccount: gmsp-es-logging
      securityContext:
        runAsUser: 1000
        fsGroup: 1000
  volumeClaimTemplates:
    - kind: PersistentVolumeClaim
      apiVersion: v1
      metadata:
        name: gmsp-es-logging
      spec:
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 50Gi
  serviceName: gmsp-es-logging-headless

service.yaml =

apiVersion: v1
kind: Service
metadata:
  name: gmsp-es-logging-headless
  namespace: gcs-logging-poc
  labels:
    app: gmsp-es-logging
spec:
  ports:
    - name: http
      protocol: TCP
      port: 9200
      targetPort: 9200
    - name: transport
      protocol: TCP
      port: 9300
      targetPort: 9300
  selector:
    app: gmsp-es-logging
  clusterIP: None

serviceaccount.yaml =

apiVersion: v1
kind: ServiceAccount
metadata:
  name: gmsp-es-logging
  namespace: gcs-logging-poc
  labels:
    app: gmsp-es-logging
secrets:
  - name: gmsp-es-master-token-1k2ab

If I apply this code, the 3 pods are getting deployed only on 2 nodes.
I do not have any taints as this is a non production environment.
The env is hosted that means none of the node from the list is a master node.

I tried adding combination of node affinity by [indentation is disturbed after pasting here] =

                affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/os 
                operator: In
                values:
                - linux

But behavior is same.
1 pod is not able to get scheduled on 1 specific node.

Same result for pod affinity.

I tired to run sample nginx stateful set and each pod is getting on every node.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: example-statefulset
  labels:
    app: example
spec:
  serviceName: example
  replicas: 3
  selector:
    matchLabels:
      app: example
  template:
    metadata:
      labels:
        app: example
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/os 
                operator: In
                values:
                - linux
      containers:
      - name: example-container
        image: nginx:latest
        ports:
        - containerPort: 80
  volumeClaimTemplates:
    - kind: PersistentVolumeClaim
      apiVersion: v1
      metadata:
        name: example
      spec:
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 1Gi

@abhyankar Just to be clear. This issue is not related to OpenSearch software. You’re having an issue with the Kubernetes pod placement.
However, based on your error you’ve placed nodeSelector in the incorrect spec. You must place it under the .spec.template.spec.

Thanks.

I think they are placed under spec.template.spec =

image

Also,

If I try to use podAntiAffinity, I am getting following errors which are strange in nature =

The podAntiAffinity config:

      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app.kubernetes.io/instance
                operator: In
                values:
                - gmsp-es-logging
            topologyKey: kubernetes.io/hostname

Error:

Readiness probe failed: sh: line 2: local: can only be used in a function sh: line 3: local: can only be used in a function sh: -c: line 15: syntax error near unexpected token }’ sh: -c: line 15: }'

@abhyankar The error regards the redinessProbe command syntax this time. Check the indents and command syntax.