K8S Statefulset, readinessProbe and security bootstrap

Hi everyone,

My issue is quite similar to this one

I have a bunch of k8s manifests bootstrapping a basic Elasticsearch cluster:

  • 1 statefulset / 3 master pods
  • 1 statefulset / 2 data pods
  • Kibana and Elastic ingest node deployement

All of the ES master/data pods have init containers to install opendistro_security plugins 1.8.0.0 on the elasticsearch OSS 7.7.0 container

Elasticsearch.yml is like:

opendistro_security.disabled: false

opendistro_security.ssl.transport.pemkey_filepath: tls/tls.key
opendistro_security.ssl.transport.pemcert_filepath: tls/tls.crt
opendistro_security.ssl.transport.pemtrustedcas_filepath: tls/ca.crt

opendistro_security.ssl.transport.enforce_hostname_verification: false
opendistro_security.ssl.transport.resolve_hostname: false

opendistro_security.allow_default_init_securityindex: true

opendistro_security.nodes_dn:
  - "CN=elasticsearch-data,OU=elasticsearch+OU=production,O=home,C=FR"
  - "CN=elasticsearch-master,OU=elasticsearch+OU=production,O=home,C=FR"
  - "CN=*"
opendistro_security.authcz.admin_dn:
  - "CN=elasticsearch-admin,OU=elasticsearch+OU=production,O=home,C=FR"

I’m using the following readinessProbe for the elasticsearch container:

readinessProbe:
  httpGet:
    path: /_cluster/health?local=true
    port: 9200
  initialDelaySeconds: 3
  periodSeconds: 3

My problem is that /_cluster/_health?local=true keeps getting a 500 error “OpenDistro security not initialized”, preventing the Elasticsearch pods to go to the running state and K8S to start the other statefulset members of the ES cluster.

My only solution so far is to temporary remove the readiness probe at cluster bootstrap time and set it again once security has been initialized.

Could’nt we just have the security plugin to only activate the node-to-node encryption without requiring to fully initialize security ?
Any hints to bootstrap the cluster and keeps the readinessProbe (which are necessary to apply upgrade during the ES cluster lifecycle)

Thanks!

1 Like

@jeanfabrice I face the same issue too. Temporarily removing the probe and injecting them once the security is initialized works but it runs into failure during an update or evacuation where it does not get ready as get stuck at 0/3, readiness failed. So, I have to actively now remove it if #ready < #minnodes.

Did you already solve it by other means?

Thx!

Well, I did effectively, using the following readiness pattern (I don’t remember who to credit for this):

readinessProbe:
exec:
command:
- /bin/sh
- -c
- |
#!/usr/bin/env bash -e

  if [ -n "${K8SPROBE_USERNAME}" ] && [ -n "${K8SPROBE_PASSWORD}" ]; then
    BASIC_AUTH="-u ${K8SPROBE_USERNAME}:${K8SPROBE_PASSWORD}"
  else
    BASIC_AUTH=""
  fi

  HTTP_CODE=$(curl --output /dev/null -XGET -s -k "$@" ${BASIC_AUTH} "http://127.0.0.1:9200/_cluster/health?local=true" -w %{http_code})
  RC=$?
  if [[ ${RC} -ne 0 ]]; then
    echo "Probe failed with RC ${RC}"
    exit ${RC}
  fi
  if [[ ${HTTP_CODE} == "200" ]] || [[ ${HTTP_CODE} == "503" ]]; then
    exit 0
  else
    echo "Probe failed with HTTP code ${HTTP_CODE}"
    exit 1
  fi

initialDelaySeconds: 5
periodSeconds: 3

This is not the perfect solution, but it’s enough for my current needs

1 Like

@jeanfabrice Thank you for sharing.

For now, I simply check the readiness of the http port. I do not want K8s to take nodes out of reach. I need them just to work with pod budgets during scale down or updates.

readinessProbe:
  failureThreshold: 10
  tcpSocket:
     port: 9200
  initialDelaySeconds: 10

I shall also try out your approach!