Hostname verification failure with keystore-trustore

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
opensearch-2.11.1

  • Linux - RedHat
  • ubi8/ubi-minimal

Describe the issue:

I came across this issue here (Hostname verification failure · Issue #3997 · opensearch-project/security · GitHub), but am unsure whether it is related or not.

I set up an OpenSearch cluster in OCP, through a StatefulSet, using kustomize. The security plugin has been enabled and has been configured to use keystore and truststore certificates. With a single node I have no issues deploying the cluster and I can reach it via the external Route. Status is green. All fine. As soon as I add an additional node to the StatefulSet, and adjust the config accordingly, the first node/pod will start successfully. However the second node can’t be created because it can’t connect to the first:

[opensearch-1] Exception during establishing a SSL connection: javax.net.ssl.SSLHandshakeException: No subject alternative DNS name matching opensearch-0.opensearch found.

I then remove the StatefulSet and re-apply, but thus time I can’t get node-0 to run:

[opensearch-0] failed to resolve host [opensearch-0.opensearch]
java.net.UnknownHostException: opensearch-0.opensearch: Name or service not known

Only once I re-deploy using an explicit single-node configuration, can node-0 be created inside the multi-node cluster, but I still can’t get the second node to run, and I am stuck with the same problem.

Configuration:
cluster.name: opensearch-cluster

network.host: 0.0.0.0
network.bind_host: 0.0.0.0
http.port: 9200

node.roles: [ coordinating, master, data, ingest ]

discovery.seed_hosts: [ “opensearch-0.opensearch”, “opensearch-1.opensearch” ]
cluster.initial_cluster_manager_nodes: [ “opensearch-0”, “opensearch-1” ]

plugins.security.disabled: false
plugins.security.allow_default_init_securityindex: true
plugins.security.system_indices.enabled: true

plugins.security.ssl.transport.keystore_filepath: certificates/tls.pfx
plugins.security.ssl.transport.truststore_filepath: certificates/truststore

Relevant Logs or Screenshots:

Hi @Carla,

Could you add: plugins.security.ssl.transport.enforce_hostname_verification: false to your configuration (opensearch.yml) and test it again?

Best,
mj

One more thing you could try:

change from:

discovery.seed_hosts: [ “opensearch-0.opensearch”, “opensearch-1.opensearch” ]
cluster.initial_cluster_manager_nodes: [ “opensearch-0”, “opensearch-1” ]

to:

discovery.seed_hosts: [ “opensearch-0”, “opensearch-1” ]
cluster.initial_cluster_manager_nodes: [ “opensearch-0”, “opensearch-1” ]

best,
mj

I have tried all these previously.

My assumption is that the SANs are the issue, as the error message seems to suggest. To address this, I have since updated my config like so:

cluster.name: opensearch-cluster

network.host: 0.0.0.0
network.bind_host: 0.0.0.0
http.port: 9200

transport.tcp.port: 9300

node.roles:
  - coordinating
  - master
  - data
  - ingest

discovery.seed_hosts:
  - opensearch-0.opensearch.xxx.svc.cluster.local
  - opensearch-1.opensearch.xxx.svc.cluster.local

cluster.initial_cluster_manager_nodes:
  - opensearch-0
  - opensearch-1

#discovery.type: single-node

plugins.security.disabled: false
plugins.security.allow_default_init_securityindex: true
plugins.security.system_indices.enabled: true
plugins.security.ssl.transport.enforce_hostname_verification: false

plugins.security.ssl.transport.keystore_filepath: certificates/tls.pfx
plugins.security.ssl.transport.truststore_filepath: certificates/truststore

plugins.security.nodes_dn:
  - "CN=opensearch-0.opensearch.xxx.svc.cluster.local, OU=xxxx, O=xx, C=DE"
  - "CN=opensearch-1.opensearch.xxx.svc.cluster.local, OU=xxxx, O=xx, C=DE"

The certificate has also been updated with the SAN of the individual nodes:

  san_fqdn:
    # certs for external routes
    - opensearch-xxx.apps.xxxxxx
    # certs for internal routes
    - opensearch.xxx.cluster.local
    - opensearch-0.opensearch.xxx.svc.cluster.local
    - opensearch-1.opensearch.xxx.svc.cluster.local
    - opensearch-2.opensearch.xxx.svc.cluster.local

Now I get a different error:

[2024-02-19T15:25:55,512][WARN ][o.o.d.HandshakingTransportAddressConnector] [opensearch-1] handshake failed for [connectToRemoteMasterNode[10.130.7.190:9300]]
org.opensearch.transport.RemoteTransportException: [opensearch-0][10.130.7.190:9300][internal:transport/handshake]
Caused by: org.opensearch.**OpenSearchException: Transport client authentication no longer supported.**


	at org.opensearch.security.ssl.util.ExceptionUtils.createTransportClientNoLongerSupportedException(ExceptionUtils.java:68) ~[?:?]
...

[WARN ][o.o.d.HandshakingTransportAddressConnector] [opensearch-1] handshake failed for [connectToRemoteMasterNode[10.130.7.190:9300]]
org.opensearch.transport.RemoteTransportException: [opensearch-0][10.130.7.190:9300][internal:transport/handshake]
Caused by: org.opensearch.**OpenSearchException: Transport client authentication no longer supported**.

From other posts, this seems to refer to an error in the configuration of the plugins.security.nodes_dn parameter. I updated this and tried various options, but I remain stuck.

NOTE: with nslookup I DO get a positive reaction when pinging opensearch-0 from opensearch-1. The first node (opensearch-0) starts fine, the second one keeps experiencing this issue.

sh-4.4$ nslookup 10.130.7.190
190.7.130.10.in-addr.arpa       name = opensearch-0.opensearch.xxx.svc.cluster.local.
190.7.130.10.in-addr.arpa       name = 10-130-7-190.xxxxx.xxx.svc.cluster.local.
190.7.130.10.in-addr.arpa       name = 10-130-7-190.opensearch-runtime.xxx.svc.cluster.local.

I have tried all these options out last week. discovery.seed_hosts expects an IP. Since the error reates to the inability to resolve the SAN, I have adjusted the config as such:

cluster.name: opensearch-cluster

network.host: 0.0.0.0
network.bind_host: 0.0.0.0
http.port: 9200

transport.tcp.port: 9300

node.roles:
  - coordinating
  - master
  - data
  - ingest

discovery.seed_hosts:
  - opensearch-0.opensearch.xxx.svc.cluster.local
  - opensearch-1.opensearch.xxx.svc.cluster.local

cluster.initial_cluster_manager_nodes:
  - opensearch-0
  - opensearch-1

plugins.security.disabled: false
plugins.security.allow_default_init_securityindex: true
plugins.security.system_indices.enabled: true
plugins.security.ssl.transport.enforce_hostname_verification: false

plugins.security.ssl.transport.keystore_filepath: certificates/tls.pfx
plugins.security.ssl.transport.truststore_filepath: certificates/truststore

plugins.security.nodes_dn:
  - "C=DE,O=XXX,OU=xxx,CN=opensearch-0.opensearch.xxx.svc.cluster.local"
  - "C=DE,O=XXX,OU=xxx,CN=opensearch-1.opensearch.xxx.svc.cluster.local"

And I have added the node’s SANs to the certificate’d deployment configuration:

san_fqdn:
  # certs for external routes
  - opensearch-xxx.apps.dt.ocp.tc.corp
  # certs for internal routes
  - opensearch.xxx.svc.cluster.local
  - opensearch-0.opensearch.xxx.svc.cluster.local
  - opensearch-1.opensearch.xxx.svc.cluster.local

The first node starts normally (opensearch-0), but the second one does not, and I receive the following error message:

[WARN ][o.o.d.HandshakingTransportAddressConnector] [opensearch-1] handshake failed for [connectToRemoteMasterNode[10.128.19.156:9300]]
org.opensearch.transport.RemoteTransportException: [opensearch-0][10.128.19.156:9300][internal:transport/handshake]
**Caused by: org.opensearch.OpenSearchException: Transport client authentication no longer supported.**

According the various other posts on this forum, this seems to relate to the plugins.security.nodes_dn configuration.

I can, however, contact opensearch-0 from opensearch-1 using nslookup/dig, using the IP address stated in the error message:

sh-4.4$ nslookup 10.128.19.156
156.19.128.10.in-addr.arpa      name = opensearch-0.opensearch.xxx.svc.cluster.local.
156.19.128.10.in-addr.arpa      name = 10-128-19-156.xx.xxx.svc.cluster.local.
156.19.128.10.in-addr.arpa      name = 10-128-19-156.opensearch-runtime.xxx.cluster.local.

Hi @Carla

Could you please test with the TCP port pointing to 9200?

Best,
Ma

I have found the solution. This was the point where the OpenSearch documentation has not been clear in terms of what the information is I need to pass to the plugins.security.nodes_dn field, and where I get this from. This OpenDistro page describes how you can extract the relevant data. Once I did that, and updated the field (in my case I only needed the one value because I just have the one certificate), all nodes ran problem-free.