Fluentbit TLS handshake error

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

Opensearch 2.13.0

Describe the issue:

I ran into [Bug] SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment · Issue #3299 · opensearch-project/security · GitHub after certs were renewed for the opensearch cluster. Specifically, we start to see that error when a fluentbit client tries to write to opensearch. Among the workarounds I tried based on the issue, one was to restrict to TLS v1.2, which then results in a different error

[2024-05-09T04:57:52,821][WARN ][o.o.h.AbstractHttpServerTransport] [opensearch-cluster-master-0] caught exception while handling client http traffic, closing connection Netty4HttpChannel{localAddress=/xxxx:9200, remoteAddress=/xxxx:38362}
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_expired
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:499) ~[netty-codec-4.1.107.Final.jar:4.1.107.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) ~[netty-codec-4.1.107.Final.jar:4.1.107.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) [netty-transport-4.1.107.Final.jar:4.1.107.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.107.Final.jar:4.1.107.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.107.Final.jar:4.1.107.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.107.Final.jar:4.1.107.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) [netty-transport-4.1.107.Final.jar:4.1.107.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.107.Final.jar:4.1.107.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.107.Final.jar:4.1.107.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) [netty-transport-4.1.107.Final.jar:4.1.107.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) [netty-transport-4.1.107.Final.jar:4.1.107.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) [netty-transport-4.1.107.Final.jar:4.1.107.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) [netty-transport-4.1.107.Final.jar:4.1.107.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) [netty-transport-4.1.107.Final.jar:4.1.107.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) [netty-common-4.1.107.Final.jar:4.1.107.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.107.Final.jar:4.1.107.Final]
	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
Caused by: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_expired
	at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:130) ~[?:?]
	at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:117) ~[?:?]
	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:365) ~[?:?]
	at java.base/sun.security.ssl.Alert$AlertConsumer.consume(Alert.java:287) ~[?:?]
	at java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:204) ~[?:?]
	at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:172) ~[?:?]
	at java.base/sun.security.ssl.SSLEngineImpl.decode(SSLEngineImpl.java:736) ~[?:?]
	at java.base/sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:691) ~[?:?]
	at java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:506) ~[?:?]
	at java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:482) ~[?:?]
	at java.base/javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:679) ~[?:?]
	at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:310) ~[netty-handler-4.1.107.Final.jar:4.1.107.Final]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1445) ~[netty-handler-4.1.107.Final.jar:4.1.107.Final]
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1338) ~[netty-handler-4.1.107.Final.jar:4.1.107.Final]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1387) ~[netty-handler-4.1.107.Final.jar:4.1.107.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529) ~[netty-codec-4.1.107.Final.jar:4.1.107.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468) ~[netty-codec-4.1.107.Final.jar:4.1.107.Final]
	... 16 more

I added -Djavax.net.debug=all which seemed to suggest that the Opensearch cluster is closing the connection during TLS handshake with fluentbit. I’m a bit confused why this would happen – for one, setting clientauth_mode: NONE doesn’t help. For another, the fluentbit client’s certificates are definitely not expired. I suppose opensearch thinks its own certificates are expired? Anyway, this made me wonder if this has something to do with certificate hot reloading. I tried using

curl --cacert <ca.pem> --cert <admin.pem> --key <admin.key> -XPUT https://localhost:9200/_plugins/_security/api/ssl/transport/reloadcerts

and

curl --cacert <ca.pem> --cert <admin.pem> --key <admin.key> -XPUT https://localhost:9200/_plugins/_security/api/ssl/http/reloadcerts

to reload the certs. Both of those commands return 200 and the appropriate message, but the opensearch node logs display

Not sure if that means whether the reload worked correctly or not.

Any thoughts or suggestions appreciated.

@mlathara The easiest way to find out if the OpenSearch node has reloaded the certs successfully is by opening https://<Opensearch_node_IP_or_FQDN>:9200 address. The browser will allow you to view the node certificate.

Alternatively, you can run the below openssl command.

openssl s_client -connect <Opensearch_node_IP_or_FQDN>:9200

Thanks – I am able to confirm that the certs were reloaded. Excerpted from the check below.

verify return:1
DONE
notBefore=May  3 05:27:07 2024 GMT
notAfter=Aug  1 05:27:07 2024 GMT

Anything else I could check?

@mlathara, could you please share your Fluent Bit configuration file? How do you authenticate your Fluent Bit user in OpenSearch?

@Eugene7 The issue was resolved for me after I switched from self signed certs to letsencrypt certs for the opensearch http requests. I’m still not quite sure why the self signed certs would work till renewal, and then start causing problems (and then only for fluentbit, while opensearch dashboards keep working).

For completeness, I use basic auth between fluentbit and opensearch. Here is my config

    [OUTPUT]
        Name opensearch
        Match *
        Host ${OPENSEARCH_HOST}
        Port ${OPENSEARCH_PORT}
        HTTP_User ${OPENSEARCH_USER}
        HTTP_Passwd ${OPENSEARCH_PASSWORD}
        Index logs
        Suppress_Type_Name on
        tls on
        tls.ca_file /etc/ssl/certs/ca-certificates.crt

Not sure that my workaround is a solution but this issue can probably be closed. Thanks.

1 Like