Opensearch Dashboards - Getting connection error with opensearch message":"[ConnectionError]: socket hang up"} after enabling SSL

NOTE:
Certain sensitive information, I have replaced with ***** pattern.

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
opensearch-dashboard:2.9.0
opensearch:2.9.0
Kubernetes server version: 1.28

Describe the issue:
We have enabled SSL for HTTP as well as transport layer in Opensearch.
(Reference for the same: "Transport client authentication no longer supported." error while implementing third party CA cert for transport layer)

Now are trying to setup dashboards and it is throwing below error in the dashboard pod logs;
{“type”:“log”,“@timestamp”:“2024-06-07T19:23:25Z”,“tags”:[“error”,“opensearch”,“data”],“pid”:7,“message”:“[ConnectionError]: socket hang up”}

To enable SSL communication b/w Opensearch dashboard and Opensearch, I have set the below parameters in opensearch_dashboards.yml;

opensearch.hosts: [“https://*******:9200"]
opensearch.ssl.verificationMode: certificate
opensearch.ssl.certificateAuthorities: ["/usr/share/opensearch/config/
.crt”, “/usr/share/opensearch/config/********.crt”]

In above config, I have provided the root certificate as well the primary certificate of opensearch in the certificateAuthorities part.

Setting parameters is not helping.
I have set opensearch.ssl.verificationMode: none and still I am getting the same error.
Also set logging.verbose: true in dashboard yml and still nothing helpful getting printed in logs.

If I disable all security(plugins.security.disabled: true) or set the plugins.security.ssl.http.enabled=false, the dashboard is able to talk to opensearch and there is no issue.

Am I missing something here on the server side? Please help.

Configuration:

opensearch_dashboards.yml: |
server.name: opensearch-dashboard
server.host: “0.0.0.0”
opensearch.hosts: [“https://*************:9200”]
opensearch.ssl.verificationMode: certificate
opensearch.username: ${KIBANA_USER}
opensearch.password: ${KIBANA_PASS}
opensearch.requestHeadersWhitelist: [authorization, securitytenant]
server.basePath: “/opensearch”
server.rewriteBasePath: “true”
opensearch.ssl.certificateAuthorities: [“/usr/share/opensearch/config/combined.crt”, “/usr/share/opensearch/config/********.crt”]
opensearch_security.multitenancy.enabled: true
opensearch_security.multitenancy.tenants.preferred: [“Private”, “Global”]
opensearch_security.readonly_mode.roles: [“kibana_read_only”]
# Use this setting if you are running opensearch-dashboards without https
opensearch_security.cookie.secure: false
logging.verbose: true

Relevant Logs or Screenshots:

“type”:“log”,“@timestamp”:“2024-06-07T19:38:37Z”,“tags”:[“error”,“opensearch”,“data”],“pid”:7,“message”:“[ConnectionError]: socket hang up”}
{“type”:“log”,“@timestamp”:“2024-06-07T19:38:40Z”,“tags”:[“error”,“opensearch”,“data”],“pid”:7,“message”:“[ConnectionError]: socket hang up”}
{“type”:“log”,“@timestamp”:“2024-06-07T19:38:42Z”,“tags”:[“error”,“opensearch”,“data”],“pid”:7,“message”:“[ConnectionError]: socket hang up”}
{“type”:“log”,“@timestamp”:“2024-06-07T19:38:45Z”,“tags”:[“error”,“opensearch”,“data”],“pid”:7,“message”:“[ConnectionError]: socket hang up”}
{“type”:“log”,“@timestamp”:“2024-06-07T19:38:47Z”,“tags”:[“error”,“opensearch”,“data”],“pid”:7,“message”:“[ConnectionError]: socket hang up”}

I am also able to reach the opensearch URL from inside the dashboard pod. So reachability is not a problem here.

bash-5.1$ curl -vguadmin:admin https://****************.com:9200/_cluster/health?pretty --cacert /usr/share/opensearch/config/*******.crt
*   Trying 15.240.0.101:9200...
* Connected to ****************.com (15.240.0.101) port 9200 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
*  CAfile: /usr/share/opensearch/config/********.crt
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS header, Finished (20):
* TLSv1.2 (IN), TLS header, Unknown (23):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.2 (OUT), TLS header, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS header, Unknown (23):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.2 (OUT), TLS header, Unknown (23):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: C=**; ST=**; L=******; O=*******; CN=*******
*  start date: Jan 11 00:00:00 2024 GMT
*  expire date: Feb 10 23:59:59 2025 GMT
*  subjectAltName: host "****************.com" matched cert's "*********"
*  issuer: C=US; O=DigiCert Inc; CN=DigiCert TLS RSA SHA256 2020 CA1
*  SSL certificate verify ok.
* Server auth using Basic with user 'admin'
* TLSv1.2 (OUT), TLS header, Unknown (23):
> GET /_cluster/health?pretty HTTP/1.1
> Host: ****************.com:9200
> Authorization: Basic *********
> User-Agent: curl/7.76.1
> Accept: */*
>
* TLSv1.2 (IN), TLS header, Unknown (23):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.2 (IN), TLS header, Unknown (23):
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< content-type: application/json; charset=UTF-8
< content-length: 544
<
{
  "cluster_name" : "opensearch2x-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 8,
  "number_of_data_nodes" : 5,
  "discovered_master" : true,
  "discovered_cluster_manager" : true,
  "active_primary_shards" : 12,
  "active_shards" : 28,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}
* Connection #0 to host ****************.com left intact

Moved issue to security category. Requesting for some help on this topic as we are not able to progress.

In master pod logs, can see this only entry; (here the localAddress is the master pod ClusterIP and remoteAddress is the dashboard pod ClusterIP)

[2024-06-12T10:25:51,275][WARN ][o.o.h.AbstractHttpServerTransport] [platform-opensearch-master-0] caught exception while handling client http traffic, closing connection Netty4HttpChannel{localAddress=/*.*.*.189:9200, remoteAddress=/*.*.*.2:56970}
io.netty.handler.codec.DecoderException: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 474554202f5f6e6f6465733f66696c7465725f706174683d6e6f6465732e2a2e76657273696f6e2532436e6f6465732e2a2e687474702e7075626c6973685f616464726573732532436e6f6465732e2a2e697020485454502f312e310d0a617574686f72697a6174696f6e3a2042617369632061326c695957356863325679646d56794f6d7470596d467559584e6c636e5a6c63673d3d0d0a757365722d6167656e743a206f70656e7365617263682d6a732f312e312e3020286c696e757820352e31352e302d313035322d617a7572652d7836343b204e6f64652e6a73207631362e32302e30290d0a782d6f70656e7365617263682d70726f647563742d6f726967696e3a206f70656e7365617263682d64617368626f617264730d0a486f73743a20706c6174666f726d2d6f70656e7365617263682d6d61737465723a393230300d0a436f6e6e656374696f6e3a206b6565702d616c6976650d0a0d0a
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:499) ~[netty-codec-4.1.94.Final.jar:4.1.94.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) ~[netty-codec-4.1.94.Final.jar:4.1.94.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) [netty-transport-4.1.94.Final.jar:4.1.94.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.94.Final.jar:4.1.94.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.94.Final.jar:4.1.94.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.94.Final.jar:4.1.94.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) [netty-transport-4.1.94.Final.jar:4.1.94.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.94.Final.jar:4.1.94.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.94.Final.jar:4.1.94.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) [netty-transport-4.1.94.Final.jar:4.1.94.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) [netty-transport-4.1.94.Final.jar:4.1.94.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) [netty-transport-4.1.94.Final.jar:4.1.94.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) [netty-transport-4.1.94.Final.jar:4.1.94.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) [netty-transport-4.1.94.Final.jar:4.1.94.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) [netty-common-4.1.94.Final.jar:4.1.94.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.94.Final.jar:4.1.94.Final]
	at java.lang.Thread.run(Thread.java:833) [?:?]

If I decode the byte string from the exception message; this is what I get;

GET /_nodes?filter_path=nodes.*.version%2Cnodes.*.http.publish_address%2Cnodes.*.ip HTTP/1.1
authorization: Basic a2liYW5hc2VydmVyOmtpYmFuYXNlcnZlcg==
user-agent: opensearch-js/1.1.0 (linux 5.15.0-1052-azure-x64; Node.js v16.20.0)
x-opensearch-product-origin: opensearch-dashboards
Host: platform-opensearch-master:9200
Connection: keep-alive

It appears that the call from dashboard is coming as HTTP instead of HTTPS, even though in dashboards yaml we have setup HTTPS.

Below is the current opensearch dashboards.yaml

  opensearch_dashboards.yml: |
    server.ssl.enabled: true
    server.ssl.certificate: /usr/share/opensearch/config/***********.crt
    server.ssl.key: /usr/share/opensearch/config/***********.key
    #server.ssl.clientAuthentication: required
    #server.ssl.certificateAuthorities: /usr/share/opensearch/config/******.crt
    server.name: opensearch2x-dashboard
    server.host: "0.0.0.0"
    opensearch.hosts: ["http://***********.com:9200"]
    opensearch.ssl.verificationMode: full
    opensearch.username: ${KIBANA_USER}
    opensearch.password: ${KIBANA_PASS}
    opensearch.requestHeadersWhitelist: [authorization, securitytenant]
    opensearch.ssl.certificateAuthorities: ["/usr/share/opensearch/config/**********.crt", "/usr/share/opensearch/config/***************.crt"]
    opensearch.ssl.alwaysPresentCertificate: true
    opensearch.ssl.certificate: /usr/share/opensearch/config/************.crt
    opensearch.ssl.key: /usr/share/opensearch/config/**************.key
    server.basePath: "/opensearch"
    server.rewriteBasePath: "true"
    opensearch_security.multitenancy.enabled: true
    opensearch_security.multitenancy.tenants.preferred: ["Private", "Global"]
    opensearch_security.readonly_mode.roles: ["kibana_read_only"]
    logging.verbose: true
    # Use this setting if you are running opensearch-dashboards without https
    opensearch_security.cookie.secure: true

Hi @V2nD,

Just to confirm do you have TLS set on our OpenSearch nodes as per your description or is it off as per your opensearch_dashboards.yml ?

Thanks,
mj

@Mantas ,

TLS is enabled on our Opensearch side. My bad, I had posted the wrong yaml as I was trying to test with SSL disabled. Apologies for that.

This is current dashboard.yml;

 opensearch_dashboards.yml: |
    server.ssl.enabled: true
    server.ssl.certificate: /usr/share/opensearch/config/*******.crt
    server.ssl.key: /usr/share/opensearch/config/*******.key
    #server.ssl.clientAuthentication: required
    #server.ssl.certificateAuthorities: /usr/share/opensearch/config/*******.crt
    server.name: opensearch2x-dashboard
    server.host: "0.0.0.0"
    opensearch.hosts: ["https://*******.com:9200"]
    opensearch.ssl.verificationMode: full
    opensearch.username: ${KIBANA_USER}
    opensearch.password: ${KIBANA_PASS}
    opensearch.requestHeadersWhitelist: [authorization, securitytenant]
    opensearch.ssl.certificateAuthorities: ["/usr/share/opensearch/config/combined.crt", "/usr/share/opensearch/config/*******.crt"]
    opensearch.ssl.alwaysPresentCertificate: true
    opensearch.ssl.certificate: /usr/share/opensearch/config/*******.crt
    opensearch.ssl.key: /usr/share/opensearch/config/*******.key
    server.basePath: "/opensearch"
    server.rewriteBasePath: "true"
    opensearch_security.multitenancy.enabled: true
    opensearch_security.multitenancy.tenants.preferred: ["Private", "Global"]
    opensearch_security.readonly_mode.roles: ["kibana_read_only"]
    logging.verbose: true

This is the opensearch.yml;

 opensearch.yml: |
    cluster.name: opensearch2x-cluster
    cluster.remote.initial_connect_timeout: 180s
    http.max_header_size: 40kb
    path.repo: ["/nfs/opensearch/"]
    # Bind to all interfaces because we don't know what IP address Docker will assign to us.
    #network.host: "_global:ipv6_"
    #network.bind_host: "[::]"
    network.host: "0.0.0.0"
      #network.bind_host: "[::]"
    compatibility.override_main_response_version: true
    node.roles: "master,remote_cluster_client,"
    http.max_content_length: "200mb"
    # minimum_master_nodes need to be explicitly set when bound on a public IP
    # set to 1 to allow single node clusters
    # discovery.zen.minimum_master_nodes: 1

    # Setting network.host to a non-loopback address enables the annoying bootstrap checks. "Single-node" mode disables them again.
    # discovery.type: single-node

    # Start OpenSearch Security Demo Configuration
    # WARNING: revise all the lines below before you go into production
    #plugins.security.disabled: true
    plugins:
      security:
        ssl:
          transport:
            pemcert_filepath: **********.crt
            pemkey_filepath: **********.key
            pemtrustedcas_filepath: **********.crt
            enforce_hostname_verification: false
          http:
            enabled: true
            pemcert_filepath: **********.crt
            pemkey_filepath: **********.key
            pemtrustedcas_filepath: **********.crt
            #    clientauth_mode: OPTIONAL
        allow_unsafe_democertificates: false
        allow_default_init_securityindex: true
        nodes_dn:
          - CN=***********,O=**********,L=**********,ST=**********,C=**********
        authcz:
          admin_dn:
            - CN=kirk,OU=client,O=client,L=test,C=de
        enable_snapshot_restore_privilege: true
        check_snapshot_restore_write_privileges: true
        restapi:
          roles_enabled: ["all_access", "security_rest_api_access"]
        system_indices:
          enabled: true
          indices:
            [
              ".opendistro-alerting-config",
              ".opendistro-alerting-alert*",
              ".opendistro-anomaly-results*",
              ".opendistro-anomaly-detector*",
              ".opendistro-anomaly-checkpoints",
              ".opendistro-anomaly-detection-state",
              ".opendistro-reports-*",
              ".opendistro-notifications-*",
              ".opendistro-notebooks",
              ".opendistro-asynchronous-search-response*",
            ]

Still seeing the same socket hang up error;

{"type":"log","@timestamp":"2024-06-12T11:57:57Z","tags":["error","opensearch","data"],"pid":7,"message":"[ConnectionError]: socket hang up"}
{"type":"log","@timestamp":"2024-06-12T11:57:59Z","tags":["debug","metrics"],"pid":7,"message":"Refreshing metrics"}
{"type":"log","@timestamp":"2024-06-12T11:58:00Z","tags":["error","opensearch","data"],"pid":7,"message":"[ConnectionError]: socket hang up"}
{"type":"log","@timestamp":"2024-06-12T11:58:02Z","tags":["error","opensearch","data"],"pid":7,"message":"[ConnectionError]: socket hang up"}
{"type":"log","@timestamp":"2024-06-12T11:58:04Z","tags":["debug","metrics"],"pid":7,"message":"Refreshing metrics"}
{"type":"log","@timestamp":"2024-06-12T11:58:05Z","tags":["error","opensearch","data"],"pid":7,"message":"[ConnectionError]: socket hang up"}
{"type":"log","@timestamp":"2024-06-12T11:58:07Z","tags":["error","opensearch","data"],"pid":7,"message":"[ConnectionError]: socket hang up"}

Same URL is reponding if I exec into the dashboard pod and curl the URL;
(Due to company policy, I have hidden the site name and few sensitive information);

k exec -it platform-opensearch-dashboards-699885944d-xb6jr -n ************* -- bash
bash-5.1$ curl -vguadmin:admin https://*************.com:9200/_cluster/health?pretty --cacert /usr/share/opensearch/config/*************.crt
*   Trying *.*.*.*:9200...
* Connected to *************.com (*.*.*.*) port 9200 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
*  CAfile: /usr/share/opensearch/config/*************.crt
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS header, Finished (20):
* TLSv1.2 (IN), TLS header, Unknown (23):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.2 (OUT), TLS header, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS header, Unknown (23):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.2 (OUT), TLS header, Unknown (23):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: C=*******; ST=*******; L=******* O=*******; CN=*.*******.com
*  start date: Jan 11 00:00:00 2024 GMT
*  expire date: Feb 10 23:59:59 2025 GMT
*  subjectAltName: host "*************.com" matched cert's "*.*******.com"
*  issuer: C=US; O=DigiCert Inc; CN=DigiCert TLS RSA SHA256 2020 CA1
*  SSL certificate verify ok.
* Server auth using Basic with user 'admin'
* TLSv1.2 (OUT), TLS header, Unknown (23):
> GET /_cluster/health?pretty HTTP/1.1
> Host: *************.com:9200
> Authorization: Basic YWRtaW46YWRtaW4=
> User-Agent: curl/7.76.1
> Accept: */*
>
* TLSv1.2 (IN), TLS header, Unknown (23):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.2 (IN), TLS header, Unknown (23):
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< content-type: application/json; charset=UTF-8
< content-length: 544
<
{
  "cluster_name" : "opensearch2x-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 8,
  "number_of_data_nodes" : 5,
  "discovered_master" : true,
  "discovered_cluster_manager" : true,
  "active_primary_shards" : 20,
  "active_shards" : 45,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}
* Connection #0 to host *************.com left intact

@Mantas ,

We have resolved this issue.
There was an environment variable set in the opensearch statefulset definition OPENSEARCH_HOSTS which was overriding the custom value for opensearch.hosts field in opensearch_dashboards.yaml.
The environment variable was pointing to the default value - http://opensearch-master:9200.

Post updating the environment variable to required setting, the issue was resolved.

1 Like

Hi @V2nD,
Glad to hear you resolved it!

That is good to know, thanks for sharing!

Best,
mj