Cannot create pre-baked docker image of OpenSearch 2.12+

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

2.12+

Describe the issue:

We’ve recently upgraded to OpenSearch 2.12 from 2.11.

Previously we were able to create a Docker image that populated the index and then stored the image in a new image. We could then run that pre-populated image in our tests.

Since upgrading to 2.12, we get the following Lucene error when starting the pre-populated image:

org.apache.lucene.store.AlreadyClosedException: Underlying file changed by an external force at 2024-05-27T22:02:00.14795819Z, (lock=NativeFSLock(path=/usr/share/opensearch/data/nodes/0/node.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=2024-05-27T22:01:23.629914316Z))
        at org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:179)
        at org.opensearch.env.NodeEnvironment.assertEnvIsLocked(NodeEnvironment.java:1149)
        at org.opensearch.env.NodeEnvironment.nodeDataPaths(NodeEnvironment.java:900)
        at org.opensearch.env.NodeEnvironment.assertCanWrite(NodeEnvironment.java:1373)
        at org.opensearch.env.NodeEnvironment.<init>(NodeEnvironment.java:376)
        at org.opensearch.env.NodeEnvironment.<init>(NodeEnvironment.java:301)
        at org.opensearch.node.Node.<init>(Node.java:535)
        at org.opensearch.node.Node.<init>(Node.java:417)
        at org.opensearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:242)
        at org.opensearch.bootstrap.Bootstrap.setup(Bootstrap.java:242)
        at org.opensearch.bootstrap.Bootstrap.init(Bootstrap.java:404)
        at org.opensearch.bootstrap.OpenSearch.init(OpenSearch.java:181)
        at org.opensearch.bootstrap.OpenSearch.execute(OpenSearch.java:172)
        at org.opensearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:104)
        at org.opensearch.cli.Command.mainWithoutErrorHandling(Command.java:138)
        at org.opensearch.cli.Command.main(Command.java:101)
        at org.opensearch.bootstrap.OpenSearch.main(OpenSearch.java:138)
        at org.opensearch.bootstrap.OpenSearch.main(OpenSearch.java:104)
For complete error details, refer to the log at /usr/share/opensearch/logs/opensearch-cluster.log

It seems that the lock file node.lock was created by the initial docker image to populate the index. The subsequent run of the docker image isn’t happy that the lock file has a different creation time to when started for the second time.

Our Dockerfile to create the pre-baked image looks like this:

FROM opensearchproject/opensearch:latest

RUN opensearch -p pid_file -E discovery.type=single-node -E http.port=9201 -d && \
    # script which populates index && \
    cd /usr/share/opensearch && \
    kill `cat pid_file` && \
    rm -f /usr/share/opensearch/data/nodes/0/node.lock


Once this is built, simply running it as follows fails:

docker run -it my-prebaked-os-image:2.12-SNAPSHOT

I’ve tried removing the script which populates the index, but get the same error (i.e. simply creating a new docker image which includes starting opensearch once prevents it being rerun in a later test).

This used to work fine in OpenSearch 2.11. I assume it may be related to the new version of Lucene - but I see the file lock check code in Lucene is many years old:

ore/src/java/org/apache/lucene/store/NativeFSLockFactory.java#L180

Is there a way I can cleanly shut down OpenSearch, and have it restart without hitting this error. I’ve tried killing the process and removing the lock file physically, but the error remains.

Configuration:

Relevant Logs or Screenshots:

I’ve found that if I add this line too to the Dockerfile:

           rm -f /usr/share/opensearch/data/nodes/0/_state/write.lock

Then this allows it to start up. However, I’d still prefer a way to perform a clean shutdown where OpenSearch cleans these files up itself.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.