Failed to obtain node locks, tried [[/opt/opensearch/data]] with lock id [0]

YaswaniT · January 31, 2024, 11:23am

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
1.2.4

Describe the issue:
I have installed opensearch in kubernetes cluster using our own helm chart.
I have 2 data pods, 3 master pods and one ingest pod.
I have done force deletion on all the pods by using “kubectl delete pods -n sample --all --force --grace-period=0”. After this, data pod is not ready even after 20 min.
But getting error with lock id.
Can anyone help to find the location of it and if there is anyway to remove the lock id without getting any error?

Relevant Logs or Screenshots:
“org.opensearch.bootstrap.StartupException: java.lang.IllegalStateException: failed to obtain node locks, tried [[/opt/opensearch/data]] with lock id [0]; maybe these locations are not writable or multiple nodes were started without increasing [node.max_local_storage_nodes] (was [1])?”

gaobinlong · February 1, 2024, 2:07pm

It seems that multiple nodes are using the same mounted volume, you check that configuration.

pablo · February 1, 2024, 2:46pm

@YaswaniT Did you delete the PVC after deleting the pods?

YaswaniT · February 2, 2024, 5:48am

No. I should not delete pvc. I have few indices in pvc. As part of the reboot, i had to force delete pods. Is there any other possible way to remove the lock?

pablo · February 2, 2024, 3:52pm

@YaswaniT Forcing the deletion of the pod and not allowing the service to stop gracefully could cause the reported issue. If you take a look at the data of the related PVC, you’ll find a file node.lock in <PV_data_folder>/nodes/0/
This file should be removed whenever you stop the service gracefully. Because you’ve forced it, the file remained.

Do you have snapshots/backups of those indices?

I’ve reproduced your issue. When I removed the node.lock file, the OpenSearch pod started again.

pablo · February 2, 2024, 3:54pm

@YaswaniT Why do you force deletion of the pod? Graceful deletion will also work and prevent node.lock issues.

YaswaniT · February 6, 2024, 9:36am

Hi @pablo ,
We have a testcase of force deleting the pods to check if it works without error. Now as this lock issue happened due to resilience issue(force delete or node down or power down), we are trying to find a solution.
So by removing node.lock file from the “<PV_data_folder>/nodes/0/”, will the opensearch work?

YaswaniT · February 6, 2024, 10:34am

Hi @pablo , i removed the node.lock and i restarted the failure pod manually as it didnot restart on its own. Now i didnt get the issue.
Thanks for your help.

Topic		Replies	Views
Unassigned shards after killed containers (blackout) OpenSearch troubleshoot	0	562	November 24, 2023
Problem with automatically deleting the opensearch index OpenSearch troubleshoot	3	369	October 6, 2023
Open Search Data Insert issue OpenSearch troubleshoot	4	462	July 6, 2023
Opensearch pod failling with CrashLoopBack Error DevOps	7	1079	January 5, 2022
Restarting Opensearch Corrupts Cluster OpenSearch discuss , troubleshoot	1	1739	January 25, 2023

Failed to obtain node locks, tried [[/opt/opensearch/data]] with lock id [0]

Related topics