Pods crash with demo certs on gke

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): 2.18.0

Describe the issue: the open search pods were coming up fine until yesterday suddenly the pods are crashing stating the root-ca.pem cannot be located or may not have permission. This is on gke

Configuration:

Relevant Logs or Screenshots:

@Nagpraveen Did you use official OpenSearch charts to deploy the cluster?
Is the root-ca.pem present in the pods or secret?

Hello @pablo, yes I used the charts 2.31.0 latest version with demo certs to deploy opensearch cluster on gcp. It was running fine since last 1 week but suddenly pods started crashing since yesterday with root-ca.pem error.
Except the gke kubectl node version upgrade by Google standard maintenance , nothing changed atleast from the open search deployment perspective.
And certs should be present, just to make sure i uninstalled and reinstalled helm release multiple times since yesterday but still seeing the same issue.

@Nagpraveen By “reinstalled” you mean destroying the cluster and deploying a new one?
Do you know what cause restarts? Missing cert shouldn’t cause that as it is read only once during the OpenSearch service start.

Reinstalling the cluster itself, uninstalled the helm release, nodes and pvc and fresh helm install
Still no luck

@Nagpraveen Could you share your current values.yml file?

Sure, here’s my values file.

@Nagpraveen Would it be possible for you to send the content in text instead of picture?
I could test it on my side then.

@Nagpraveen Could you also share the output of the following command?

kubectl exec -it <pod_name> -- ls -l config

Sure will do, mean while I found something in log during pod start up, could this be potential reason why cert isn’t written, just a hint. Marked in blue - says opensearch.yml seems to be already configured for security. Quit

@Nagpraveen I don’t think it is. Security plugin checks if opensearch.yml has all mandatory security configuration set.

1 Like
name: opensearch
replicas: 3
image:
  repository: opensearchproject/opensearch
  tag: "2.18.0"
nodeSelector:
  color: "blue"
tolerations:
  - key: "opensearch"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"
resources:
  requests:
    cpu: 13
    memory: 50Gi
  limits:
    cpu: 13
    memory: 50Gi
opensearchJavaOpts: "-Xms30g -Xmx30g -XX:G1ReservePercent=10 -XX:InitiatingHeapOccupancyPercent=60 -XX:MaxGCPauseMillis=150 -XX:ConcGCThreads=14 -XX:ParallelGCThreads=24"
extraEnvs:
  - name: OPENSEARCH_INITIAL_ADMIN_PASSWORD
    value: "*****************" #provide your custom password here
persistence:
  enabled: true
  size: 1000Gi
  storageClass: "pd-ssd"

# Service configurations
service:
  type: LoadBalancer
  annotations:
    cloud.google.com/load-balancer-type: "Internal"

Attached values file @pablo

@Nagpraveen If you used only this values.yml file to deploy the cluster then official helm charts would fail to start. Did you modify templates?

Please share output of

kubectl exec -it <pod_name> -- ls -l config

The opensearch installation worked perfectly fine with this values yaml since last week. I wouldnt think I am missing anything critical here. I did not require to modify any other templates.

kubectl exec -it <pod_name> – ls -l config wouldnt run - cannot connect to opensearch container

kubectl -n apollo exec -it opensearch-cluster-master-0 – ls -l config

Defaulted container “opensearch” out of: opensearch, fsgroup-volume (init), configfile (init)

error: Internal error occurred: unable to upgrade connection: container not found (“opensearch”)

For opensearch dashboards, i have these config:

-rw-r–r-- 1 opensearch-dashboards opensearch-dashboards 216 Oct 31 23:42 node.options

-rw-r–r-- 1 opensearch-dashboards opensearch-dashboards 1151 Jan 15 20:00 opensearch.example.org.cert

-rw-r–r-- 1 opensearch-dashboards opensearch-dashboards 1675 Jan 15 20:00 opensearch.example.org.key

-rw-r–r-- 1 opensearch-dashboards opensearch-dashboards 9927 Jan 15 20:00 opensearch_dashboards.yml

dashboards is up and running since last week,

@Nagpraveen By default, OpenSearch images contain node, admin, and root certificates and are located inside the OpenSearch config folder.
These certs were created in February last year and are all valid for 10 years.
If for some reason are missing, then maybe the config folder was overwritten by a volume mount.

root-ca.pem

I’ve used charts from OpenSearch Git. I copied over your configuration and tried to run it. It failed because of the missing securityConfig section.

How do you run your helm install? Maybe you have more than one values file.

Run the following command against the OpenSearch pod and check if there is extra mounting for the config folder or certificates.

kubectl describe pod <opensearch_pod>

This is an output of my OpenSearch pod.

Containers:
  opensearch:
    Container ID:   containerd://a321d3ed25036c39dc21f9b16c30ee5e565f9257535020e1f2e271b08403f65e
    Image:          opensearchproject/opensearch:2.18.0
    Image ID:       docker.io/opensearchproject/opensearch@sha256:7f6fa1efee8f39e94ca30a0a31f95d866c8a99f25a86e6cf6691142d1eab4f9e
    Ports:          9200/TCP, 9300/TCP, 9600/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP
    State:          Running
      Started:      Mon, 20 Jan 2025 17:47:16 +0000
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:      1
      memory:   100Mi
    Readiness:  tcp-socket :9200 delay=0s timeout=3s period=5s #success=1 #failure=3
    Startup:    tcp-socket :9200 delay=5s timeout=3s period=10s #success=1 #failure=30
    Environment:
      node.name:                          opensearch-cluster-master-0 (v1:metadata.name)
      cluster.initial_master_nodes:       opensearch-cluster-master-0,opensearch-cluster-master-1,opensearch-cluster-master-2,
      discovery.seed_hosts:               opensearch-cluster-master-headless
      cluster.name:                       opensearch-cluster
      network.host:                       0.0.0.0
      OPENSEARCH_JAVA_OPTS:               -Xmx512M -Xms512M
      node.roles:                         master,ingest,data,remote_cluster_client,
      OPENSEARCH_INITIAL_ADMIN_PASSWORD:  Eliatra123
    Mounts:
      /usr/share/opensearch/config/opensearch.yml from config-emptydir (rw,path="opensearch.yml")
      /usr/share/opensearch/data from opensearch-cluster-master (rw)

@pablo : Greetings,

a) The security section isnt provided as i am using demo certs and used to work fine as i mentioned earlier. as per doc i have overridden only the values.yaml file with required values to make the nodes up and running.

b) I am passing only one values.yaml file to the default opensearch charts via below command:

helm install "os-main-blue" opensearch/opensearch -f "os-main-blue.yaml" --namespace="custom"

c) Below is the o/p of describe of one of my OpenSearch crashing pod: (I dont see any extra mounts on this pod as such)

Containers:
  opensearch:
    Container ID:   containerd://b4e1dec204bface3c3062a0ab48100fdf6bc35c20724798be2ce409863a6f414
    Image:          opensearchproject/opensearch:2.18.0
    Image ID:       docker.io/opensearchproject/opensearch@sha256:7f6fa1efee8f39e94ca30a0a31f95d866c8a99f25a86e6cf6691142d1eab4f9e
    Ports:          9200/TCP, 9300/TCP, 9600/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 21 Jan 2025 08:21:08 +0530
      Finished:     Tue, 21 Jan 2025 08:21:18 +0530
    Ready:          False
    Restart Count:  143
    Limits:
      cpu:     13
      memory:  50Gi
    Requests:
      cpu:      13
      memory:   50Gi
    Readiness:  tcp-socket :9200 delay=0s timeout=3s period=5s #success=1 #failure=3
    Startup:    tcp-socket :9200 delay=5s timeout=3s period=10s #success=1 #failure=30
    Environment:
      node.name:                          opensearch-cluster-master-0 (v1:metadata.name)
      cluster.initial_master_nodes:       opensearch-cluster-master-0,opensearch-cluster-master-1,opensearch-cluster-master-2,
      discovery.seed_hosts:               opensearch-cluster-master-headless
      cluster.name:                       opensearch-cluster
      network.host:                       0.0.0.0
      OPENSEARCH_JAVA_OPTS:               -Xms30g -Xmx30g -XX:G1ReservePercent=10 -XX:InitiatingHeapOccupancyPercent=60 -XX:MaxGCPauseMillis=150 -XX:ConcGCThreads=14 -XX:ParallelGCThreads=24
      node.roles:                         master,ingest,data,remote_cluster_client,
      OPENSEARCH_INITIAL_ADMIN_PASSWORD:  **************
    Mounts:
      /usr/share/opensearch/config/opensearch.yml from config-emptydir (rw,path="opensearch.yml")
      /usr/share/opensearch/data from opensearch-cluster-master (rw)
Conditions:

@pablo : Greetings,

Wanted to share a follow up update.

I reinstalled the open search on a different node pool on same gke cluster, with same value.yaml. This time open search service started running fine and when i checked the logs, root-ca.pem cert was accessible looks like.

It is strange though why the certs were not found/corrputed/altered permissions on the earlier tries with the earlier different nodepool.


  1. Helm release name was different this time. but same values.yaml as above, no changes

I probably know what the bug was, the installer was not running the shell script to set admin password and install certs, note that in the one of the above screen shot in ealier messages, it stated : " opensearch.yaml" seems to be already configured for security. Quit.

But this time the opensearch ran the shell script to set admin passowrd and install the demo certs as shown below:

1 Like

@pablo : One last question, can you please help me understand on what basis, the demo script assumed opensearch.yam is already set up and did’nt proceed and how to avoid this?

And also, if there is any sample to set up custom certs, could you please share?
let’s say i have .key and .cert instead of .pem, can use them for key and cert? and is root-ca always necessary?
Thanks!

@Nagpraveen I did further testing and I found that install_demo_configuration.sh stops its execution when it finds a security plugin configuration in opensearch.yml. This will also prevent the recreation of demo certs in /usr/share/opensearch/config folder.

I wasn’t aware of that.

I had to delete all the security plugin configurations in opensearch.yml and then install_demo_configuration.sh was completed successfully.

I wouldn’t call it a bug as this script should be used only once to configure demo configuration. The securityadmin.sh script should be used to manage and update the security plugin afterwards.

However, I couldn’t reproduce that by recreating the OpenSearch pod or forcing the OpenSearch pod to restart by restarting the Kubernetes host.
Not sure how you ended up with the same opensearch.yml as it is held in the emptydir volume and not preserved.

    Mounts:
      /usr/share/opensearch/config/opensearch.yml from config-emptydir (rw,path="opensearch.yml")

Deleting pods or scaling down and up the statefulset of the OpenSearch cluster should fix your issue.

Regarding your second question.

  1. root-ca.pem is always necessary as that is used to validate your node certificates.
  2. Yes you can use .key and .crt files instead
  3. To present custom certificates you need to set the following in values.yml file:
  • disable demo configuration
extraEnvs: 
   - name: DISABLE_INSTALL_DEMO_CONFIG
     value: true
  • add configuration with custom certificates in config.opensearch.yml
  • create a secret with custom certificates
  • mount a secret with custom certificates
secretMounts:
  - name: opensearch-certs
    secretName: opensearch-certs
    path: /usr/share/opensearch/config
1 Like