Docker-compose startup failures using v.1.2.4 first-time starts

Will try to explain this simply - we have a 3-node cluster we start/build via docker-compose, and have custom admin certs and opensearch.yml files.

We map the config directory in the docker-compose file with each node mapping their config directory like:

./opensearch/node1:/usr/share/opensearch/config
./opensearch/node2:/usr/share/opensearch/config
./opensearch/node3/usr/share/opensearch/config

This reliably works on versions up 'til now - but starting instances for the first time with the 1.2.4 image now randomly fails with the following failure (will put it at the end so you don’t have to read thru the trace.)

When I say “randomly”, the failure occurs on 0, 1, 2, or all 3 nodes.

Basically what happens when things go wrong is that on start, the security plugin sees there’s a custom security configuration and exits, like you’d expect, that is:

Detected OpenSearch Security Version: 1.2.4.0
/usr/share/opensearch/config/opensearch.yml seems to be already configured for Security. Quit.
Enabling OpenSearch Security Plugin

But then the OpenSearchSecurityPlugin throws an exception, and it goes on to run the install_demo_configuration.sh script which overwrites the custom opensearch.yml and certificate files in the config directory.

And if any of them fail to come up with the custom config/certs, of course, the cluster fails to start.

It’s fairly random which nodes fail, and how many - but it happens on at least one node probably 8 out of 10 times.

We never saw this with previous builds, but 1.2.4 fails pretty reliably (like I said, about 8 out of 10 times on fresh startups.)

Any idea what’s gone wrong here?

Thanks,
Rick:

/usr/share/opensearch/config/opensearch.yml seems to be already configured for Security. Quit.
Enabling OpenSearch Security Plugin

[2022-02-10T22:49:33,829][ERROR][o.o.b.OpenSearchUncaughtExceptionHandler] [opensearch-node2] uncaught exception in thread [main]
org.opensearch.bootstrap.StartupException: java.lang.IllegalStateException: failed to load plugin class [org.op
ensearch.security.OpenSearchSecurityPlugin]
at org.opensearch.bootstrap.OpenSearch.init(OpenSearch.java:182) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.bootstrap.OpenSearch.execute(OpenSearch.java:169) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:100) ~[opensearch-1.
2.4.jar:1.2.4]
at org.opensearch.cli.Command.mainWithoutErrorHandling(Command.java:138) ~[opensearch-cli-1.2.4.jar:1.2
.4]
at org.opensearch.cli.Command.main(Command.java:101) ~[opensearch-cli-1.2.4.jar:1.2.4]
at org.opensearch.bootstrap.OpenSearch.main(OpenSearch.java:135) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.bootstrap.OpenSearch.main(OpenSearch.java:101) ~[opensearch-1.2.4.jar:1.2.4]
Caused by: java.lang.IllegalStateException: failed to load plugin class [org.opensearch.security.OpenSearchSecu
rityPlugin]
at org.opensearch.plugins.PluginsService.loadPlugin(PluginsService.java:790) ~[opensearch-1.2.4.jar:1.2
.4]
at org.opensearch.plugins.PluginsService.loadBundle(PluginsService.java:726) ~[opensearch-1.2.4.jar:1.2
.4]
at org.opensearch.plugins.PluginsService.loadBundles(PluginsService.java:528) ~[opensearch-1.2.4.jar:1.
2.4]
at org.opensearch.plugins.PluginsService.(PluginsService.java:194) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.node.Node.(Node.java:396) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.node.Node.(Node.java:319) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.bootstrap.Bootstrap$5.(Bootstrap.java:242) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.bootstrap.Bootstrap.setup(Bootstrap.java:242) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.bootstrap.Bootstrap.init(Bootstrap.java:412) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.bootstrap.OpenSearch.init(OpenSearch.java:178) ~[opensearch-1.2.4.jar:1.2.4]
… 6 more
Caused by: java.lang.reflect.InvocationTargetException
at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]
at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:64
) ~[?:?]
at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl
.java:45) ~[?:?]
at java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500) ~[?:?]
at java.lang.reflect.Constructor.newInstance(Constructor.java:481) ~[?:?]
at org.opensearch.plugins.PluginsService.loadPlugin(PluginsService.java:781) ~[opensearch-1.2.4.jar:1.2
.4]
at org.opensearch.plugins.PluginsService.loadBundle(PluginsService.java:726) ~[opensearch-1.2.4.jar:1.2
.4]
at org.opensearch.plugins.PluginsService.loadBundles(PluginsService.java:528) ~[opensearch-1.2.4.jar:1.
2.4]
at org.opensearch.plugins.PluginsService.(PluginsService.java:194) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.node.Node.(Node.java:396) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.node.Node.(Node.java:319) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.bootstrap.Bootstrap$5.(Bootstrap.java:242) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.bootstrap.Bootstrap.setup(Bootstrap.java:242) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.bootstrap.Bootstrap.init(Bootstrap.java:412) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.bootstrap.OpenSearch.init(OpenSearch.java:178) ~[opensearch-1.2.4.jar:1.2.4]
… 6 more
Caused by: org.opensearch.OpenSearchException: plugins.security.ssl.transport.keystore_filepath or plugins.secu
rity.ssl.transport.server.pemcert_filepath and plugins.security.ssl.transport.client.pemcert_filepath must be s
et if transport ssl is requested.
at org.opensearch.security.ssl.DefaultSecurityKeyStore.initTransportSSLConfig(DefaultSecurityKeyStore.j
ava:422) ~[?:?]
at org.opensearch.security.ssl.DefaultSecurityKeyStore.initSSLConfig(DefaultSecurityKeyStore.java:258)
~[?:?]
at org.opensearch.security.ssl.DefaultSecurityKeyStore.(DefaultSecurityKeyStore.java:179) ~[?:?]
at org.opensearch.security.ssl.OpenSearchSecuritySSLPlugin.(OpenSearchSecuritySSLPlugin.java:218)
~[?:?]
at org.opensearch.security.OpenSearchSecurityPlugin.(OpenSearchSecurityPlugin.java:252) ~[?:?]
at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]
at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:64
) ~[?:?]
at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl
.java:45) ~[?:?]
at java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500) ~[?:?]
at java.lang.reflect.Constructor.newInstance(Constructor.java:481) ~[?:?]
at org.opensearch.plugins.PluginsService.loadPlugin(PluginsService.java:781) ~[opensearch-1.2.4.jar:1.2
.4]
at org.opensearch.plugins.PluginsService.loadBundle(PluginsService.java:726) ~[opensearch-1.2.4.jar:1.2
.4]
at org.opensearch.plugins.PluginsService.loadBundles(PluginsService.java:528) ~[opensearch-1.2.4.jar:1.
2.4]
at org.opensearch.plugins.PluginsService.(PluginsService.java:194) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.node.Node.(Node.java:396) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.node.Node.(Node.java:319) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.bootstrap.Bootstrap$5.(Bootstrap.java:242) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.bootstrap.Bootstrap.setup(Bootstrap.java:242) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.bootstrap.Bootstrap.init(Bootstrap.java:412) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.bootstrap.OpenSearch.init(OpenSearch.java:178) ~[opensearch-1.2.4.jar:1.2.4]
… 6 more
Killing performance analyzer process 38
OpenSearch exited with code 1
Performance analyzer exited with code 143
Enabling execution of install_demo_configuration.sh for OpenSearch Security Plugin

Seems related to the security plugin more than the docker compose file. I’m going to move this to the security category.

@rick98 Could you share your docker-compose.yml file?

Hi @pablo,

this is the version for 1.2.4 - this works for version: 1.2.3 - recently added the depends_on thinking bringing them up in sequence might make the behavior more predictable, but it doesn’t.

Thanks,
Rick

version: ‘3’
services:
opensearch-node1:
restart: always
image: opensearchproject/opensearch:${OPENSEARCH_ES:-1.2.4}
container_name: opensearch-node1
logging:
options:
max-size: “30m”
max-file: “2”
environment:
- cluster.name=opensearch-cluster
- node.name=opensearch-node1
- discovery.seed_hosts=opensearch-node1,opensearch-node2, opensearch-node3
- cluster.initial_master_nodes=opensearch-node1,opensearch-node2,opensearch-node3
- bootstrap.memory_lock=true # along with the memlock settings below, disables swapping
- indices.query.bool.max_clause_count=4096
- path.repo=/var/lib/dbsat/snapshots
- “OPENSEARCH_JAVA_OPTS=-Xms1G -Xmx1G” # minimum and maximum Java heap size, recommend setting both to 50% of system RAM
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536 # maximum number of open files for the OpenSearch user, set to at least 65536 on modern systems
hard: 65536
volumes:
- ${DATA_PLANE:-/var/lib}/dbsat/node1:/usr/share/opensearch/data
- ${DATA_PLANE:-/var/lib}/dbsat/snapshots:/var/lib/dbsat/snapshots
- ${CONTROL_PLANE:-.}/opensearch/node1:/usr/share/opensearch/config
- ${CONTROL_PLANE:-.}/scripts:/usr/share/opensearch/scripts
- ${CONTROL_PLANE:-.}/securityconfig:/usr/share/opensearch/plugins/opensearch-security/securityconfig

ports:
  - 9200:9200
  - 9600:9600 # required for Performance Analyzer
networks:
  - opensearch-net

opensearch-node2:
restart: always
depends_on:
- opensearch-node1
image: opensearchproject/opensearch:${OPENSEARCH_ES:-1.2.4}
container_name: opensearch-node2
logging:
options:
max-size: “30m”
max-file: “2”
environment:
- cluster.name=opensearch-cluster
- node.name=opensearch-node2
- discovery.seed_hosts=opensearch-node1,opensearch-node2, opensearch-node3
- cluster.initial_master_nodes=opensearch-node1,opensearch-node2,opensearch-node3
- bootstrap.memory_lock=true
- indices.query.bool.max_clause_count=4096
- path.repo=/var/lib/dbsat/snapshots
- “OPENSEARCH_JAVA_OPTS=-Xms1G -Xmx1G”
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
volumes:
- ${DATA_PLANE:-/var/lib}/dbsat/node2:/usr/share/opensearch/data
- ${DATA_PLANE:-/var/lib}/dbsat/snapshots:/var/lib/dbsat/snapshots
- ${CONTROL_PLANE:-.}/opensearch/node2:/usr/share/opensearch/config
- ${CONTROL_PLANE:-.}/scripts:/usr/share/opensearch/scripts
- ${CONTROL_PLANE:-.}/securityconfig:/usr/share/opensearch/plugins/opensearch-security/securityconfig
networks:
- opensearch-net

opensearch-node3:
restart: always
depends_on:
- opensearch-node1
- opensearch-node2
image: opensearchproject/opensearch:${OPENSEARCH_ES:-1.2.4}
container_name: opensearch-node3
logging:
options:
max-size: “30m”
max-file: “2”
environment:
- cluster.name=opensearch-cluster
- node.name=opensearch-node3
- discovery.seed_hosts=opensearch-node1,opensearch-node2,opensearch-node3
- cluster.initial_master_nodes=opensearch-node1,opensearch-node2,opensearch-node3
- bootstrap.memory_lock=true
- indices.query.bool.max_clause_count=4096
- path.repo=/var/lib/dbsat/snapshots
- “OPENSEARCH_JAVA_OPTS=-Xms1G -Xmx1G”
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
volumes:
- ${DATA_PLANE:-/var/lib}/dbsat/node3:/usr/share/opensearch/data
- ${DATA_PLANE:-/var/lib}/dbsat/snapshots:/var/lib/dbsat/snapshots
- ${CONTROL_PLANE:-.}/opensearch/node3:/usr/share/opensearch/config
- ${CONTROL_PLANE:-.}/scripts:/usr/share/opensearch/scripts
- ${CONTROL_PLANE:-.}/securityconfig:/usr/share/opensearch/plugins/opensearch-security/securityconfig
networks:
- opensearch-net

@rick98 What was your OpenSearch working version?

@rick98 Could you also run ls -l against the config folder in the container and in the mapped folder and share the output?

Each version from 1.1. through 1.2.3 worked; 1.2.4 is where we first saw this behavior.

Here’s an instance where a node1 came up correctly, node2 didn’t - that is, on first start, node1’s mapped directory contains the correct custom certificates/configs from December, node2’s directory contains the “demo” config when the startup fails and our custom config is overwritten.

So, in node1, the security plugin saw that security had been configured, and exited, but on node2, the plugin blew up and re-ran install_demo_configuration.sh.

Node1 (filesystem)

-rw-------. 1 opc opc 1285 Nov 5 17:51 dbsat-root-ca.pem
-rw-------. 1 opc opc 1708 Dec 6 17:14 esnode-key.pem
-rw-------. 1 opc opc 1184 Dec 6 17:14 esnode.pem
-rw-r-----. 1 opc opc 2503 Nov 1 20:13 jvm.options
drwxr-x—. 2 opc opc 6 Oct 4 21:29 jvm.options.d
-rw-------. 1 opc opc 1704 Dec 6 17:14 kirk-key.pem
-rw-------. 1 opc opc 1172 Dec 6 17:14 kirk.pem
-rw-r–r–. 1 opc opc 285 Nov 1 20:13 log4j2.properties
-rw-r-----. 1 opc opc 196 Nov 1 20:13 opensearch.keystore
drwxr-x—. 2 opc opc 27 Oct 4 21:46 opensearch-notebooks
drwxr-x—. 2 opc opc 35 Oct 4 21:46 opensearch-reports-scheduler
-rw-r–r–. 1 opc opc 2310 Feb 10 22:49 opensearch.yml
-rw-------. 1 opc opc 1285 Dec 6 17:14 root-ca.pem

Node 2 (filesystem)
-rw-------. 1 opc opc 1704 Feb 10 22:49 esnode-key.pem
-rw-------. 1 opc opc 1720 Feb 10 22:49 esnode.pem
-rw-rw----. 1 opc opc 2503 Oct 5 21:15 jvm.options
drwxr-x—. 2 opc opc 6 Oct 4 21:29 jvm.options.d
-rw-------. 1 opc opc 1704 Feb 10 22:49 kirk-key.pem
-rw-------. 1 opc opc 1610 Feb 10 22:49 kirk.pem
-rw-r–r–. 1 opc opc 285 Oct 5 21:13 log4j2.properties
-rw-rw----. 1 opc opc 196 Oct 20 17:49 opensearch.keystore
drwxr-x—. 2 opc opc 27 Oct 4 21:46 opensearch-notebooks
drwxr-x—. 2 opc opc 35 Oct 4 21:46 opensearch-reports-scheduler
-rw-r–r–. 1 opc opc 1613 Feb 10 22:49 opensearch.yml
-rw-------. 1 opc opc 1444 Feb 10 22:49 root-ca.pem

Node1 inside the container:

-rw-------. 1 opensearch opensearch 1285 Nov 5 17:51 dbsat-root-ca.pem
-rw-------. 1 opensearch opensearch 1708 Dec 6 17:14 esnode-key.pem
-rw-------. 1 opensearch opensearch 1184 Dec 6 17:14 esnode.pem
-rw-r-----. 1 opensearch opensearch 2503 Nov 1 20:13 jvm.options
drwxr-x—. 2 opensearch opensearch 6 Oct 4 21:29 jvm.options.d
-rw-------. 1 opensearch opensearch 1704 Dec 6 17:14 kirk-key.pem
-rw-------. 1 opensearch opensearch 1172 Dec 6 17:14 kirk.pem
-rw-r–r–. 1 opensearch opensearch 285 Nov 1 20:13 log4j2.properties
drwxr-x—. 2 opensearch opensearch 27 Oct 4 21:46 opensearch-notebooks
drwxr-x—. 2 opensearch opensearch 35 Oct 4 21:46 opensearch-reports-scheduler
-rw-r-----. 1 opensearch opensearch 196 Nov 1 20:13 opensearch.keystore
-rw-r–r–. 1 opensearch opensearch 2310 Feb 10 22:49 opensearch.yml
-rw-------. 1 opensearch opensearch 1285 Dec 6 17:14 root-ca.pem

node2’s inside the container:

-rw-r–r–. 1 opensearch opensearch 11358 Jan 14 03:35 LICENSE.txt
-rw-r–r–. 1 opensearch opensearch 215355 Jan 14 03:42 NOTICE.txt
-rw-r–r–. 1 opensearch opensearch 1761 Jan 14 03:35 README.md
drwxr-xr-x. 2 opensearch opensearch 4096 Jan 14 03:58 bin
drwxr-xr-x. 5 opensearch opensearch 279 Jan 13 22:58 config
drwxrwxr-x. 3 opensearch opensearch 146 Nov 5 20:33 data
drwxr-xr-x. 9 opensearch opensearch 107 Jan 14 03:43 jdk
drwxr-xr-x. 3 opensearch opensearch 4096 Jan 14 03:43 lib
drwxr-xr-x. 1 opensearch opensearch 103 Feb 10 22:49 logs
-rw-r–r–. 1 opensearch opensearch 4414 Jan 14 03:59 manifest.yml
drwxr-xr-x. 19 opensearch opensearch 4096 Jan 14 03:43 modules
-rwxr-xr-x. 1 opensearch opensearch 4518 Jan 18 18:00 opensearch-docker-entrypoint.sh
-rwxr-xr-x. 1 opensearch opensearch 2171 Jan 18 18:00 opensearch-onetime-setup.sh
-rwxr-xr-x. 1 opensearch opensearch 2445 Jan 14 03:58 opensearch-tar-install.sh
drwxr-xr-x. 6 opensearch opensearch 59 Jan 14 03:58 performance-analyzer-rca
drwxr-xr-x. 1 opensearch opensearch 33 Jan 14 03:59 plugins
drwxrwxr-x. 2 opensearch opensearch 249 Feb 7 17:31 scripts
-rwxrwxr-x. 1 opensearch opensearch 315 Feb 10 22:49 securityadmin_demo.sh

Rick

@rick98 Do you see any file permission issues during the container’s startup?
What is the ID of the opc user and opc group on the host?

No, the only error message seems to be the Unhandled exception above.

opc is 1000:1000,

group is:
opc:x:1000:
docker:x:992:opc

@rick98 I reproduced reported errors.
In your scenario, the user and group opc have IDs 1000:1000. This corresponds with opensearch user inside the container. This way container doesn’t complain about file privileges.

However, you don’t map individual files but the folder. It seems it was acceptable in previous versions to map a host folder that doesn’t have a corresponding user and group ID with opensearch user and group inside the OpenSearch container.

You have two solutions here.

  1. Map individual files instead of the folder.

i.e.

- ./config/opensearch/opensearch.yml:/usr/share/opensearch/config/opensearch.yml
- ./config/opensearch/esnode-key.pem:/usr/share/opensearch/config/esnode-key.pem
- ./config/opensearch/esnode.pem:/usr/share/opensearch/config/esnode.pem
- ./config/opensearch/kirk-key.pem:/usr/share/opensearch/config/kirk-key.pem
- ./config/opensearch/kirk.pem:/usr/share/opensearch/config/kirk.pem
- ./config/opensearch/root-ca.pem:/usr/share/opensearch/config/root-ca.pem
  1. Change owner and group of the folder to opc:opc on the host.
1 Like

@pablo - have tried to map the individual files, and so far, it seems to work, thanks!

I’ll probably just keep that approach.

But out of curiousity, not quite understanding what you mean in alternative 2 - opc:opc (1000:1000) owns the config folder on the host system, and opensearch:opensearch (1000:1000) owns the “config” directory in the container .

So, what would you have me change?

Thanks again!
Rick

@rick98
Option2 is in case you’d like to still map the full folder and not single files.

In that case folder ${CONTROL_PLANE:-.}/opensearch/node1 must have user and group opc assigned as they have IDs 1000 in /etc/passwd and /etc/groups.

OpenSearch will verify permission based on ID and inside the container, the mapped folder will have opensearch user and group assigned.

It worked in my lab on any version of OpenSearch including 1.2.4.