401 Errors in OpenSearch 2.11

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

OpenSearch 2.11.0-1 on Rocky 9

Describe the issue:

Since the 2.11 update, web browser login (Basic Auth, internal users) works as expected, but then if I reload the page, it re-prompts for Basic Auth login again (over and over and over). This issue doesn’t occur with 2.10.

Configuration:

We have a 7-node cluster with TLS set up between the nodes using our own CA cert to sign, with no issues. Logins work fine using curl with an internal user, or with the admin certificate.

The configuration wasn’t changed between 2.10 and 2.11.

Relevant Logs or Screenshots:

The only hint in the logs there is any kind of problem are the 2 messages below, which occur in both v2.10 and v2.11 (although in v2.10, the actual client IP/port is included)… and since I’m logging in with Basic Auth and not using a client certificate, they don’t really make sense…

[2023-10-18T09:40:35,671][ERROR][o.o.s.s.h.n.SecuritySSLNettyHttpServerTransport] [log-app1.escs.udel.edu] Exception during establishing a SSL connection: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
at sun.security.ssl.Alert.createSSLException(Alert.java:131) ~[?:?]
at sun.security.ssl.Alert.createSSLException(Alert.java:117) ~[?:?]
at sun.security.ssl.TransportContext.fatal(TransportContext.java:347) ~[?:?]
at sun.security.ssl.Alert$AlertConsumer.consume(Alert.java:293) ~[?:?]
at sun.security.ssl.TransportContext.dispatch(TransportContext.java:186) ~[?:?]
at sun.security.ssl.SSLTransport.decode(SSLTransport.java:172) ~[?:?]
at sun.security.ssl.SSLEngineImpl.decode(SSLEngineImpl.java:681) ~[?:?]
at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:636) ~[?:?]
at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:454) ~[?:?]
at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:433) ~[?:?]
at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:637) ~[?:?]
at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:309) ~[netty-handler-4.1.100.Final.jar:4.1.100.Final]
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1441) ~[netty-handler-4.1.100.Final.jar:4.1.100.Final]
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1334) ~[netty-handler-4.1.100.Final.jar:4.1.100.Final]
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1383) ~[netty-handler-4.1.100.Final.jar:4.1.100.Final]
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529) ~[netty-codec-4.1.100.Final.jar:4.1.100.Final]
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468) ~[netty-codec-4.1.100.Final.jar:4.1.100.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) ~[netty-codec-4.1.100.Final.jar:4.1.100.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) [netty-common-4.1.100.Final.jar:4.1.100.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.100.Final.jar:4.1.100.Final]
at java.lang.Thread.run(Thread.java:829) [?:?]
[2023-10-18T09:40:35,673][WARN ][o.o.h.AbstractHttpServerTransport] [log-app1.escs.udel.edu] caught exception while handling client http traffic, closing connection Netty4HttpChannel{localAddress=0.0.0.0/0.0.0.0:9200, remoteAddress=null}
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:499) ~[netty-codec-4.1.100.Final.jar:4.1.100.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) ~[netty-codec-4.1.100.Final.jar:4.1.100.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) [netty-common-4.1.100.Final.jar:4.1.100.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.100.Final.jar:4.1.100.Final]
at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
at sun.security.ssl.Alert.createSSLException(Alert.java:131) ~[?:?]
at sun.security.ssl.Alert.createSSLException(Alert.java:117) ~[?:?]
at sun.security.ssl.TransportContext.fatal(TransportContext.java:347) ~[?:?]
at sun.security.ssl.Alert$AlertConsumer.consume(Alert.java:293) ~[?:?]
at sun.security.ssl.TransportContext.dispatch(TransportContext.java:186) ~[?:?]
at sun.security.ssl.SSLTransport.decode(SSLTransport.java:172) ~[?:?]
at sun.security.ssl.SSLEngineImpl.decode(SSLEngineImpl.java:681) ~[?:?]
at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:636) ~[?:?]
at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:454) ~[?:?]
at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:433) ~[?:?]
at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:637) ~[?:?]
at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:309) ~[netty-handler-4.1.100.Final.jar:4.1.100.Final]
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1441) ~[netty-handler-4.1.100.Final.jar:4.1.100.Final]
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1334) ~[netty-handler-4.1.100.Final.jar:4.1.100.Final]
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1383) ~[netty-handler-4.1.100.Final.jar:4.1.100.Final]
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529) ~[netty-codec-4.1.100.Final.jar:4.1.100.Final]
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468) ~[netty-codec-4.1.100.Final.jar:4.1.100.Final]
… 16 more

More testing…

I did a complete wipe of OpenSearch on all 7 nodes, and reinstalled from scratch with 2.10 - no 401 errors.

Did another complete wipe of OpenSearch and reinstalled with 2.11 - the 401 errors return, and I get re-prompted for Basic Auth login every time I reload a simple page like:

https://thehost.example.com:9200/_cat/nodes

I should also note that each time I get the 401 error in the browser, I can see the same exact “Authorization” header being sent (and then OpenSearch 2.11 ignores it and returns a 401).

@ed-osrch Did you solve your issue?
If not, do you see all the nodes in the _cat/nodes?

Did you use the same rootCA to sign all the node certificates in the cluster?
Did you join any new nodes to the cluster recently?

Not solved yet.
Yes, all the nodes are working fine.
Yes, I use the same rootCA to sign all the node certs and it is configured to be trusted by all nodes.
I created the cluster from scratch multiple times… 2.10 - no 401 errors, but with 2.11 - 401 errors.

@ed-osrch How do you deploy your cluster?
Could you share your config.yml and opensearch.yml files?

Is the opensearch.yml the same across all the nodes?

@ed-osrch Did the reported error appear one a single OpenSearch node or multiple? If single, is it always the same node or random?
If multiple, is it always the same nodes or random?

Do you reload the page after a while or just after successfull authentication?
Additionally, please share your opensearch_dashboards.yml file.

I’m not using OpenSearch Dashboards.

After logging in with BasicAuth to the internal user account, it displays _cat/nodes. If I reload the page immediately, it always re-prompts for credentials (now with 2.11). If I wait 5 minutes or so and reload the page, it works without re-prompting (and if I wait ANOTHER 5 minutes it works without re-prompting, but if I reload the page twice in a row it re-prompts for credentials). I can see the Authorization header being sent with the same value for each request. OpenSearch ignores the valid header unless I wait 5 minutes or so before making another request (i.e. reload the page). The problem started happening in 2.11 (doesn’t happen in 2.10), and in 2.11 there were some new features introduced that seem related - new permissions system for the REST API calls? - but I’m not familiar enough with the source code to sleuth out the problem to determine what’s happening. The bug I submitted hasn’t produced a solution: [BUG] OpenSearch 2.11 401 errors even though valid Authorization header is sent · Issue #3678 · opensearch-project/security · GitHub

@ed-osrch I wasn’t aware that you connect OpenSearch directly. Thanks for sharing the bug link.
I’ll do testing on my side and comment on the bug if I get the same result.

What browser did you use during the testing?
Do you test with the private window?

@ed-osrch I’ve just checked 2.10 and 2.11 and had no issues. I can reload the page just after login and after a few minutes with no authentication prompt.

How do you connect to your OpenSearch nodes? Do you have a reverse proxy in front of the OpenSearch nodes or do you connect directly?