Cluster does not initialize, javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment

Hi all,

at the moment, I am trying to create a OpenDistro 1.7 ElasticSearch Cluster with 3 nodes. After testing with the demo certificates on a single node, I am using my own PKI for managing the node and client certificates.

On a single node server, everything is running fine.
In cluster mode, all nodes come up with the following error in high frequency:

[2020-05-19T14:48:15,794][ERROR][c.a.o.s.s.t.OpenDistroSecuritySSLNettyTransport] [xxxx.yyy.zzz.net] Exception during establishing a SSL connection: javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)

Note: I have read here that there is a know java issue and this message does not affect the operation… but for me, it does… (reference: Troubleshoot - Open Distro Documentation )

In between, I can see that the master node election was not yet done:

[2020-05-19T15:09:00,369][WARN ][o.e.c.c.ClusterFormationFailureHelper] [xxxx.yyy.zzz.net] master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [xxxx.yyy.zzz.net, yyyy.yyy.zzz.net, zzzz.yyy.zzz.net] to bootstrap a cluster: have discovered [{xxxx.yyy.zzz.net}{vpOXuYYNRkeqoMQ8kbv8cw}{F3vzvzkgSrqlMj_qhklEtw}{172.17.0.6}{172.17.0.6:29300}{dim}]; discovery will continue using [xx.aa.b.54:29300, xx.aa.b.55:29300, xx.aa.b.56:29300] from hosts providers and [{xxxx.yyy.zzz.net}{vpOXuYYNRkeqoMQ8kbv8cw}{F3vzvzkgSrqlMj_qhklEtw}{172.17.0.6}{172.17.0.6:29300}{dim}] from last-known cluster state; node term 0, last-accepted version 0 in term 0

I am running 3 docker containers on 3 different VMs.

  • discovery.seed_hosts and cluster.initial_master_nodes are set to the 3 host names.
  • node.name is the FQDN of each server
  • transport.profiles.default.port is set to 29300

The Certificate chain seems to be fine since I can use securityadmin.sh and my client certificate.

When I do a TLS test connection with my node certificate, everything seems also be fine:

openssl s_client -connect xxx.yyy.zzz.net:29300 -cert ./xxx.yyy.zzz.net.crt.pem -key ./xxx.yyy.zzz.key.pem
CONNECTED(00000003)

=> no works
leaving out the client cert/key:
139939766322832:error:14094412:SSL routines:ssl3_read_bytes:sslv3 alert bad certificate:s3_pkt.c:1498:SSL alert number 42
=> fails (as expected)

I am using a JKS keystore and JKS truststore for OpenDistro.
Checking the stores with keytool, everything seems to be fine.
PKI has been created using SearchGuards PKI scripts.

opendistro_security.nodes_dn is also configured to the DNs of the node certs.

My “feeling” is that OpenDistro does not use the node certificate as a client certificate when trying to negiotiate with the other nodes?

Any help would be highly appreciated!

Thanks
Chris

Finally got that working now. Must have been one the following parameters that was not set right.

discovery.seed_hosts: "{{ ansible_play_hosts_all|join(',') }}" cluster.initial_master_nodes: "{{ ansible_play_hosts_all[0] }}" transport.profiles.default.port: "{{ group.elasticsearch.transport_port }}" transport.port: "{{ group.elasticsearch.transport_port }}" http.port: "{{ group.elasticsearch.rest_port }}" network.publish_host: "{{ ansible_eth0.ipv4.address }}"

Troubleshooting is really hard if only the following error occurs:

javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)

Is there any way to debug/trace the Elastic node-to-node communication better?

1 Like

@shakazulu you can set the property in log4j2.properties as below:

rootLogger.level = trace

this will be very verbose however, therefore it’s recommended to disable anything else talking to nodes like kibana, logstash etc

I faced a similar issue while upgrading to opensearch 1.2.1. In my case, I had to set “majorVersion”: “7” in opensearch.yml file to make it work.

This issue is described in more detail in [1]([BUG] Opensearch SSL transport error, master not discovered or elected yet - Opensearch-Project/Helm-Charts).

This seems to an issue related to JDK.

https://bugs.openjdk.java.net/browse/JDK-8221218

It is fixed on JDK 17

@shakazulu I am getting same issue like this and checked the configs which you provided in your answer which fixed your problem but still the issue is there.
The issue is being observed when we added 3 nodes to the existing cluster of 3 nodes where all nodes are “dm”.

after adding the nodes, it was found that in existing node’s opensearch.yml file below configs were using only 3 nodes(existing only). Even after adding all 6 nodes in opensearch.yml file of every node, I can see the issue.

discovery.seed_hosts: "{{ ansible_play_hosts_all|join(',') }}" cluster.initial_master_nodes: "{{ ansible_play_hosts_all[0] }}"

I am using jdk 11. Please suggest if you faced the issue again

1 Like