Dashboards stops working as intended after a opensearch node stops in a cluster

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Opensearch - 2.11.1 (I upgraded to the last version because I thought it might be the version)

Docker
Client: Docker Engine - Community
Version: 24.0.7
API version: 1.43
Go version: go1.20.10
Git commit: afdd53b
Built: Thu Oct 26 09:07:41 2023
OS/Arch: linux/amd64
Context: default

Server: Docker Engine - Community
Engine:
Version: 24.0.7
API version: 1.43 (minimum version 1.12)
Go version: go1.20.10
Git commit: 311b9ff
Built: Thu Oct 26 09:07:41 2023
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.25
GitCommit: d8f198a4ed8892c764191ef7b3b06d8a2eeb5c7f
runc:
Version: 1.1.10
GitCommit: v1.1.10-0-g18a0cb0
docker-init:
Version: 0.19.0
GitCommit: de40ad0

Hosting Operating System:
Ubuntu 22.04.3 LTS with EPYC cpu

Describe the issue:
(From the original post [weird behaviour] Dashboards doesn't work as intended after a opensearch node fails in a cluster · Issue #1712 · opensearch-project/security-dashboards-plugin · GitHub):
I have docker swarm with 5 nodes running Opensearch on each node. I have followed this guide: Docker - OpenSearch documentation. I have a healthy cluster, all nodes are working fine and I have a process indexing data at the rate of 15GB per day across multiple indices. All nodes are joined in the cluster because I checked it with _cat/nodes.

The problem begins when I shut down a node. At that point Dashboards stops working properly and I get kicked to the login page, and it fails to login every time.
The cluster entered in yellow state. I checked through the CLI and waited until it changed back to green. Then I tried to login again and it still doesn’t let me. But I can issue commands through the CLI with no problem and I confirmed that the documents are being inserted into Opensearch.

Configuration:
Every node has the following environment:

  OPENSEARCH_JAVA_OPTS: "-Xms4096m -Xmx4096m" # minimum and maximum Java heap size, recommend setting both to 50% of system RAM
  node.name: opensearch-node05
  discovery.seed_hosts: opensearch-node01,opensearch-node02,opensearch-node03,opensearch-node04,opensearch-node05
  cluster.initial_master_nodes: opensearch-node01,opensearch-node02,opensearch-node03,opensearch-node04,opensearch-node05
  plugins.security.ssl.transport.pemcert_filepath: certs/opensearch-nodeXX/nodeXX.pem
  plugins.security.ssl.transport.pemkey_filepath: certs/opensearch-nodeXX/nodeXX-key.pem
  plugins.security.ssl.http.pemcert_filepath: certs/opensearch-nodeXX/nodeXX.pem
  plugins.security.ssl.http.pemkey_filepath: certs/opensearch-nodeXX/nodeXX-key.pem
  DISABLE_INSTALL_DEMO_CONFIG: "true"
  bootstrap.memory_lock: "true" # along with the memlock settings below, disables swapping

The XX are replaced by the corresponding node number.

Every opensearch node shares the same opensearch.yml:

network.bind_host: "0.0.0.0"
network.host: "0.0.0.0"
plugins.security.ssl.transport.pemtrustedcas_filepath: certs/ca/root-ca.pem
plugins.security.ssl.http.pemtrustedcas_filepath: certs/ca/root-ca.pem
plugins.security.ssl.transport.enforce_hostname_verification: false
plugins.security.ssl.http.enabled: true
plugins.security.allow_default_init_securityindex: true
plugins.security.authcz.admin_dn:
    - CN=AAAAAAAAA-ADMIN,OU=AAAAAAAAA,O=AAAAAAAAA,L=AAAAAAAAA,ST=AAAAAAAAA,C=AAAAAAAAA
plugins.security.nodes_dn:
    - CN=opensearch-node01,OU=AAAAAAAAA,O=AAAAAAAAA,L=AAAAAAAAA,ST=AAAAAAAAA,C=AAAAAAAAA
    - CN=opensearch-node02,OU=AAAAAAAAA,O=AAAAAAAAA,L=AAAAAAAAA,ST=AAAAAAAAA,C=AAAAAAAAA
    - CN=opensearch-node03,OU=AAAAAAAAA,O=AAAAAAAAA,L=AAAAAAAAA,ST=AAAAAAAAA,C=AAAAAAAAA
    - CN=opensearch-node04,OU=AAAAAAAAA,O=AAAAAAAAA,L=AAAAAAAAA,ST=AAAAAAAAA,C=AAAAAAAAA
    - CN=opensearch-node05,OU=AAAAAAAAA,O=AAAAAAAAA,L=AAAAAAAAA,ST=AAAAAAAAA,C=AAAAAAAAA
plugins.security.audit.type: internal_opensearch
plugins.security.enable_snapshot_restore_privilege: true
plugins.security.check_snapshot_restore_write_privileges: true
plugins.security.restapi.roles_enabled: ["all_access", "security_rest_api_access"]
cluster.routing.allocation.disk.threshold_enabled: false
opendistro_security.audit.config.disabled_rest_categories: NONE
opendistro_security.audit.config.disabled_transport_categories: NONE

Environment of Dashboards:

  OPENSEARCH_HOSTS: '["https://opensearch-node01:9200","https://opensearch-node02:9200","https://opensearch-node03:9200","https://opensearch-node04:9200","https://opensearch-node05:9200"]' # must be a string with no spaces when specified as an environment variable
  DISABLE_INSTALL_DEMO_CONFIG: "true"

opensearch_dashboards.yml:

opensearch.ssl.verificationMode: none
opensearch.username: kibanaserver
opensearch.password: XXXXXXXXXXXXXXXXXXXXXXXXXX
opensearch.requestHeadersWhitelist: [authorization, securitytenant]
opensearch_security.multitenancy.enabled: true
opensearch_security.multitenancy.tenants.enable_global: false
opensearch_security.multitenancy.tenants.enable_private: false
opensearch_security.multitenancy.tenants.preferred: [Private, Global]
opensearch_security.readonly_mode.roles: [kibana_read_only]
opensearch_security.cookie.secure: false
server.host: '0.0.0.0'

I also made changes to the config.yml in the Opensearch-Security:

_meta:
  type: "config"
  config_version: 2

config:
  dynamic:
    kibana:
      multitenancy_enabled: true
      private_tenant_enabled: true
      default_tenant: Global
      server_username: kibanaserver
      index: '.kibana'
    do_not_fail_on_forbidden: false
    http:
      anonymous_auth_enabled: false
      xff:
        enabled: false
    authc:
      basic_auth_internal:
        http_enabled: true
        transport_enabled: true
        order: 1
        http_authenticator:
          type: basic
          challenge: true
        authentication_backend:
          type: internal

Relevant Logs or Screenshots:

Failed to resolve user tenant: Error: Request Timeout after 30000ms | Failed to resolve user tenant: Error: Request Timeout after 30000ms | type=log Failed to resolve user tenant: Error: Request Timeout after 30000ms | Failed to resolve user tenant: Error: Request Timeout after 30000ms | type=log @timestamp=2023-12-27T11:16:18Z Failed to resolve user tenant: Error: Request Timeout after 30000ms | Failed to resolve user tenant: Error: Request Timeout after 30000ms | type=log Failed to resolve user tenant: Error: Request Timeout after 30000ms | Failed to resolve user tenant: Error: Request Timeout after 30000ms | type=log @timestamp=2023-12-27T11:16:18Z tags=error,plugins,securityDashboards Failed to resolve user tenant: Error: Request Timeout after 30000ms | Failed to resolve user tenant: Error: Request Timeout after 30000ms | type=log Failed to resolve user tenant: Error: Request Timeout after 30000ms | Failed to resolve user tenant: Error: Request Timeout after 30000ms | type=log @timestamp=2023-12-27T11:16:18Z Failed to resolve user tenant: Error: Request Timeout after 30000ms | Failed to resolve user tenant: Error: Request Timeout after 30000ms | type=log Failed to resolve user tenant: Error: Request Timeout after 30000ms | Failed to resolve user tenant: Error: Request Timeout after 30000ms | type=log @timestamp=2023-12-27T11:16:18Z tags=error,plugins,securityDashboards pid=1
Error: Request Timeout after 30000ms


at SecurityClient.dashboardsinfo (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/backend/opensearch_security_client.ts:130:13)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at BasicAuthentication.resolveTenant (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:249:28)
at /usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:172:24
at Object.interceptAuth [as authenticate] (/usr/share/opensearch-dashboards/src/core/server/http/lifecycle/auth.js:133:22)
at exports.Manager.execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
at module.exports.internals.Auth._authenticate (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/auth.js:273:30)
at Request._lifecycle (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:371:32)
at Request._execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:281:9) | Error: Request Timeout after 30000ms
at SecurityClient.dashboardsinfo (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/backend/opensearch_security_client.ts:130:13)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at BasicAuthentication.resolveTenant (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:249:28)
at /usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:172:24
at Object.interceptAuth [as authenticate] (/usr/share/opensearch-dashboards/src/core/server/http/lifecycle/auth.js:133:22)
at exports.Manager.execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
at module.exports.internals.Auth._authenticate (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/auth.js:273:30)
at Request._lifecycle (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:371:32)
at Request._execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:281:9) | type=log Error: Request Timeout after 30000ms
at SecurityClient.dashboardsinfo (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/backend/opensearch_security_client.ts:130:13)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at BasicAuthentication.resolveTenant (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:249:28)
at /usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:172:24
at Object.interceptAuth [as authenticate] (/usr/share/opensearch-dashboards/src/core/server/http/lifecycle/auth.js:133:22)
at exports.Manager.execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
at module.exports.internals.Auth._authenticate (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/auth.js:273:30)
at Request._lifecycle (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:371:32)
at Request._execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:281:9) | Error: Request Timeout after 30000ms
at SecurityClient.dashboardsinfo (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/backend/opensearch_security_client.ts:130:13)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at BasicAuthentication.resolveTenant (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:249:28)
at /usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:172:24
at Object.interceptAuth [as authenticate] (/usr/share/opensearch-dashboards/src/core/server/http/lifecycle/auth.js:133:22)
at exports.Manager.execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
at module.exports.internals.Auth._authenticate (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/auth.js:273:30)
at Request._lifecycle (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:371:32)
at Request._execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:281:9) | type=log @timestamp=2023-12-27T11:16:18Z Error: Request Timeout after 30000ms
at SecurityClient.dashboardsinfo (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/backend/opensearch_security_client.ts:130:13)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at BasicAuthentication.resolveTenant (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:249:28)
at /usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:172:24
at Object.interceptAuth [as authenticate] (/usr/share/opensearch-dashboards/src/core/server/http/lifecycle/auth.js:133:22)
at exports.Manager.execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
at module.exports.internals.Auth._authenticate (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/auth.js:273:30)
at Request._lifecycle (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:371:32)
at Request._execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:281:9) | Error: Request Timeout after 30000ms
at SecurityClient.dashboardsinfo (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/backend/opensearch_security_client.ts:130:13)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at BasicAuthentication.resolveTenant (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:249:28)
at /usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:172:24
at Object.interceptAuth [as authenticate] (/usr/share/opensearch-dashboards/src/core/server/http/lifecycle/auth.js:133:22)
at exports.Manager.execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
at module.exports.internals.Auth._authenticate (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/auth.js:273:30)
at Request._lifecycle (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:371:32)
at Request._execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:281:9) | type=log Error: Request Timeout after 30000ms
at SecurityClient.dashboardsinfo (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/backend/opensearch_security_client.ts:130:13)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at BasicAuthentication.resolveTenant (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:249:28)
at /usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:172:24
at Object.interceptAuth [as authenticate] (/usr/share/opensearch-dashboards/src/core/server/http/lifecycle/auth.js:133:22)
at exports.Manager.execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
at module.exports.internals.Auth._authenticate (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/auth.js:273:30)
at Request._lifecycle (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:371:32)
at Request._execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:281:9) | Error: Request Timeout after 30000ms
at SecurityClient.dashboardsinfo (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/backend/opensearch_security_client.ts:130:13)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at BasicAuthentication.resolveTenant (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:249:28)
at /usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:172:24
at Object.interceptAuth [as authenticate] (/usr/share/opensearch-dashboards/src/core/server/http/lifecycle/auth.js:133:22)
at exports.Manager.execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
at module.exports.internals.Auth._authenticate (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/auth.js:273:30)
at Request._lifecycle (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:371:32)
at Request._execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:281:9) | type=log @timestamp=2023-12-27T11:16:18Z tags=error,http,server,OpenSearchDashboards Error: Request Timeout after 30000ms
at SecurityClient.dashboardsinfo (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/backend/opensearch_security_client.ts:130:13)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at BasicAuthentication.resolveTenant (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:249:28)
at /usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:172:24
at Object.interceptAuth [as authenticate] (/usr/share/opensearch-dashboards/src/core/server/http/lifecycle/auth.js:133:22)
at exports.Manager.execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
at module.exports.internals.Auth._authenticate (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/auth.js:273:30)
at Request._lifecycle (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:371:32)
at Request._execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:281:9) | Error: Request Timeout after 30000ms
at SecurityClient.dashboardsinfo (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/backend/opensearch_security_client.ts:130:13)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at BasicAuthentication.resolveTenant (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:249:28)
at /usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:172:24
at Object.interceptAuth [as authenticate] (/usr/share/opensearch-dashboards/src/core/server/http/lifecycle/auth.js:133:22)
at exports.Manager.execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
at module.exports.internals.Auth._authenticate (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/auth.js:273:30)
at Request._lifecycle (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:371:32)
at Request._execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:281:9) | type=log Error: Request Timeout after 30000ms
at SecurityClient.dashboardsinfo (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/backend/opensearch_security_client.ts:130:13)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at BasicAuthentication.resolveTenant (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:249:28)
at /usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:172:24
at Object.interceptAuth [as authenticate] (/usr/share/opensearch-dashboards/src/core/server/http/lifecycle/auth.js:133:22)
at exports.Manager.execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
at module.exports.internals.Auth._authenticate (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/auth.js:273:30)
at Request._lifecycle (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:371:32)
at Request._execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:281:9) | Error: Request Timeout after 30000ms
at SecurityClient.dashboardsinfo (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/backend/opensearch_security_client.ts:130:13)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at BasicAuthentication.resolveTenant (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:249:28)
at /usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:172:24
at Object.interceptAuth [as authenticate] (/usr/share/opensearch-dashboards/src/core/server/http/lifecycle/auth.js:133:22)
at exports.Manager.execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
at module.exports.internals.Auth._authenticate (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/auth.js:273:30)
at Request._lifecycle (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:371:32)
at Request._execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:281:9) | type=log @timestamp=2023-12-27T11:16:18Z Error: Request Timeout after 30000ms
at SecurityClient.dashboardsinfo (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/backend/opensearch_security_client.ts:130:13)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at BasicAuthentication.resolveTenant (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:249:28)
at /usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:172:24
at Object.interceptAuth [as authenticate] (/usr/share/opensearch-dashboards/src/core/server/http/lifecycle/auth.js:133:22)
at exports.Manager.execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
at module.exports.internals.Auth._authenticate (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/auth.js:273:30)
at Request._lifecycle (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:371:32)
at Request._execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:281:9) | Error: Request Timeout after 30000ms
at SecurityClient.dashboardsinfo (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/backend/opensearch_security_client.ts:130:13)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at BasicAuthentication.resolveTenant (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:249:28)
at /usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:172:24
at Object.interceptAuth [as authenticate] (/usr/share/opensearch-dashboards/src/core/server/http/lifecycle/auth.js:133:22)
at exports.Manager.execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
at module.exports.internals.Auth._authenticate (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/auth.js:273:30)
at Request._lifecycle (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:371:32)
at Request._execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:281:9) | type=log Error: Request Timeout after 30000ms
at SecurityClient.dashboardsinfo (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/backend/opensearch_security_client.ts:130:13)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at BasicAuthentication.resolveTenant (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:249:28)
at /usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:172:24
at Object.interceptAuth [as authenticate] (/usr/share/opensearch-dashboards/src/core/server/http/lifecycle/auth.js:133:22)
at exports.Manager.execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
at module.exports.internals.Auth._authenticate (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/auth.js:273:30)
at Request._lifecycle (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:371:32)
at Request._execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:281:9) | Error: Request Timeout after 30000ms
at SecurityClient.dashboardsinfo (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/backend/opensearch_security_client.ts:130:13)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at BasicAuthentication.resolveTenant (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:249:28)
at /usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:172:24
at Object.interceptAuth [as authenticate] (/usr/share/opensearch-dashboards/src/core/server/http/lifecycle/auth.js:133:22)
at exports.Manager.execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
at module.exports.internals.Auth._authenticate (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/auth.js:273:30)
at Request._lifecycle (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:371:32)
at Request._execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:281:9) | type=log @timestamp=2023-12-27T11:16:18Z tags=error,http,server,OpenSearchDashboards pid=1

Failed to resolve user tenant: Error: Request Timeout after 30000ms | Failed to resolve user tenant: Error: Request Timeout after 30000ms | type=log Failed to resolve user tenant: Error: Request Timeout after 30000ms | Failed to resolve user tenant: Error: Request Timeout after 30000ms | type=log @timestamp=2023-12-27T11:16:18Z Failed to resolve user tenant: Error: Request Timeout after 30000ms | Failed to resolve user tenant: Error: Request Timeout after 30000ms | type=log Failed to resolve user tenant: Error: Request Timeout after 30000ms | Failed to resolve user tenant: Error: Request Timeout after 30000ms | type=log @timestamp=2023-12-27T11:16:18Z tags=error,plugins,securityDashboards Failed to resolve user tenant: Error: Request Timeout after 30000ms | Failed to resolve user tenant: Error: Request Timeout after 30000ms | type=log Failed to resolve user tenant: Error: Request Timeout after 30000ms | Failed to resolve user tenant: Error: Request Timeout after 30000ms | type=log @timestamp=2023-12-27T11:16:18Z Failed to resolve user tenant: Error: Request Timeout after 30000ms | Failed to resolve user tenant: Error: Request Timeout after 30000ms | type=log Failed to resolve user tenant: Error: Request Timeout after 30000ms | Failed to resolve user tenant: Error: Request Timeout after 30000ms | type=log @timestamp=2023-12-27T11:16:18Z tags=error,plugins,securityDashboards pid=1
Error: Request Timeout after 30000ms

ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error @timestamp=2023-12-27T11:28:02Z ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error @timestamp=2023-12-27T11:28:02Z tags= ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error @timestamp=2023-12-27T11:28:02Z ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error @timestamp=2023-12-27T11:28:02Z tags= pid=1 ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error @timestamp=2023-12-27T11:28:02Z ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error @timestamp=2023-12-27T11:28:02Z tags= ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error @timestamp=2023-12-27T11:28:02Z ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error @timestamp=2023-12-27T11:28:02Z tags= pid=1 error=[object Object] ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error @timestamp=2023-12-27T11:28:02Z ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error @timestamp=2023-12-27T11:28:02Z tags= ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error @timestamp=2023-12-27T11:28:02Z ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error @timestamp=2023-12-27T11:28:02Z tags= pid=1 ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error @timestamp=2023-12-27T11:28:02Z ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error @timestamp=2023-12-27T11:28:02Z tags= ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error @timestamp=2023-12-27T11:28:02Z ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error ERR ERR ERR ERR Internal Server Error | ERR ERR ERR ERR Internal Server Error | type=error @timestamp=2023-12-27T11:28:02Z tags= pid=1 error=[object Object] url=http://XXXXXXXXX/app/home`

Hi @goncalo.bpedras

Do you have any tenants other than private and global?

I have noticed that you enabled private tenant and set global tenant as the default for OpenSearch. Also, you disabled private and global tenants for OpenSearch Dashboard. I would change it.

Please try to enable global and private tenants for Opensearch and Opensearch Dashboard.

Hi @Eugene7 , thanks for the reply.

The problem persists, I’m kicked to the login page but the documents are still being inserted. I changed the config and restarted every node including the dashboard instance. Here’s what I’ve done:

The new opensearch_dashboards.yml:

opensearch.ssl.verificationMode: none
opensearch.username: kibanaserver
opensearch.password: XXXXXXXXXXXXXXXXXXX
opensearch.requestHeadersWhitelist: [authorization, securitytenant]

opensearch_security.multitenancy.enabled: true
opensearch_security.multitenancy.tenants.enable_global: true
opensearch_security.multitenancy.tenants.enable_private: true
opensearch_security.multitenancy.tenants.preferred: [Private, Global]
opensearch_security.readonly_mode.roles: [kibana_read_only]
opensearch_security.cookie.secure: false
server.host: '0.0.0.0'

The new config.yml of opensearch-security:

_meta:
  type: "config"
  config_version: 2

config:
  dynamic:
    kibana:
      multitenancy_enabled: true
      private_tenant_enabled: true
      default_tenant: Global
      server_username: kibanaserver
      index: '.kibana'
    do_not_fail_on_forbidden: false
    http:
      anonymous_auth_enabled: false
      xff:
        enabled: false

    authc:
      basic_auth_internal:
        http_enabled: true
        transport_enabled: true
        order: 1
        http_authenticator:
          type: basic
          challenge: true
        authentication_backend:
          type: internal

I also enabled the private tenant here:
imagem

To answer your question: Yes, I do have other tenants. Each tenant is for a specific group of users. Users cannot access other tenants. And I do not want users to have a private tenant.

The problem begins when I shut down a node.

Slightly different angle - This sounds like the connection from Dashboards to the OpenSearch backend is ‘stuck’ attempting to connect to the node that was shutdown. Maybe using a load balancer between these hosts would resolve the issue?

Looks like docker-compose has many features built around swarms that seem tailor made for your scenarios.

Hi @peternied ,

I don’t think that makes sense because after I shutdown a opensearch node, the backend still works, because I can index and issue commands with no problems at all. It’s the dashboards that struggles for some reason. Also, when all opensearch nodes are up, it works fine.

Only one opensearch node is running on each docker swarm host, so it’s pretty balanced. The dashboards is the only one with a exposed port. I’m not talking about when a docker node goes down, but the opensearch node.

@pperesbr Could you please share your docker pull request?

Hi @pperesbr

Plugins security options can’t be passed as variable and must be placed in the opensearch.yml file

@goncalo.bpedras
Hi, did you find the solution for the problem? I have the same issue (but using podman/rhel).
I have cluster based on latest 2.19.1 versions.

Hi.
It was long ago. But I think the problem was that the nodes couldn’t talk to each other for some reason. The docker configuration in the official website had a problem in the exposing port configuration. For some reason it was creating some kind of conflict in the network layer.
After that it was working fine (I removed the exposing port configuration).
I can’t give you more details because it was some time ago.
Does this help?

So now when you turn off one of the opensearch nodes, dashboards GUI is stll fine?

You have only one dashboard instance running?

My cluster is in green state (with one node down), so I guess it works fine = nodes are communicating.

Yes. I have a 5 node cluster. When 2 nodes go down, the cluster still works.
And yes, I have only one dashboard instance and it still works when one of the nodes go down.

@goncalo.bpedras
Thanx for confirming. We have different versions (2.11 vs 2.18 and 2.19 as I noticed this on both). And no idea if it matters.

I don’t think you should have different versions across the nodes.
I have all at the same version (2.11.1).

I have same versions on all nodes, simply noticed this problem for both 2.18 and 2.19.

@piotrfrq can you give more details on the issue you are having, are you deploying OS cluster using docker-compose? If so, could you share the docker-compose.yml. If you are taking down a master node in your tests, are you sure there are enough nodes to select a master?

I use ansible to deploy 3 nodes cluster, using podman containers. All nodes have the same roles. Network communication is ok. Cluster is working fine, as health check API returns green status, and 3 nodes.

But I try to follow the advice that opensearch instances have problems with communication, as in the opensearch logs I can see timed out SSL connections like this:

[2025-05-12T08:50:47,435][ERROR][o.o.t.n.s.SecureNetty4Transport] [plg-lms24-manager] Exception during establishing a SSL connection: java.io.IOException: Connection timed out
java.io.IOException: Connection timed out
        at java.base/sun.nio.ch.SocketDispatcher.read0(Native Method) ~[?:?]
        at java.base/sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:47) ~[?:?]
        at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:340) ~[?:?]
        at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:294) ~[?:?]
        at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:269) ~[?:?]
        at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:425) ~[?:?]
        at org.opensearch.transport.CopyBytesSocketChannel.readFromSocketChannel(CopyBytesSocketChannel.java:156) ~[transport-netty4-client-2.14.0.jar:2.14.0]
        at org.opensearch.transport.CopyBytesSocketChannel.doReadBytes(CopyBytesSocketChannel.java:141) ~[transport-netty4-client-2.14.0.jar:2.14.0]
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:151) [netty-transport-4.1.109.Final.jar:4.1.109.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) [netty-transport-4.1.109.Final.jar:4.1.109.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) [netty-transport-4.1.109.Final.jar:4.1.109.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) [netty-transport-4.1.109.Final.jar:4.1.109.Final]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) [netty-transport-4.1.109.Final.jar:4.1.109.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) [netty-common-4.1.109.Final.jar:4.1.109.Final]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.109.Final.jar:4.1.109.Final]
        at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
[2025-05-12T08:50:47,448][INFO ][o.o.c.c.FollowersChecker ] [plg-lms24-manager] FollowerChecker{discoveryNode={plg-lms24-idx2}{d0ntBqLWSJSESP2Q4gNvSw}{UZ3CJp8iRyKM_pmDE3r9_g}{10.17.229.224}{10.17.229.224:9300}{dim}{shard_indexing_p
ressure_enabled=true}, failureCountSinceLastSuccess=0, [cluster.fault_detection.follower_check.retry_count]=3} disconnected
[2025-05-12T08:50:47,449][INFO ][o.o.c.c.FollowersChecker ] [plg-lms24-manager] FollowerChecker{discoveryNode={plg-lms24-idx2}{d0ntBqLWSJSESP2Q4gNvSw}{UZ3CJp8iRyKM_pmDE3r9_g}{10.17.229.224}{10.17.229.224:9300}{dim}{shard_indexing_p
ressure_enabled=true}, failureCountSinceLastSuccess=0, [cluster.fault_detection.follower_check.retry_count]=3} marking node as faulty

But when I check cluster health it is still green, 3 nodes connected. Really hard to understand why.

Those timeouts repeat every ~8 minutes. But it looks like the cluster is operational, logs are processed, indices updated etc.

So I started the opensearch config investigation and here it is (part of it), for one the the nodes:

node.name: plg-lms24-idx
network.host: 0.0.0.0
network.publish_host: 10.17.229.248
discovery.seed_hosts: [plg-lms24-idx,plg-lms24-manager,plg-lms24-idx2]
cluster.initial_cluster_manager_nodes: [plg-lms24-manager,plg-lms24-idx,plg-lms24-idx2]
node.roles: [cluster_manager,data,ingest]

plugins.security.allow_default_init_securityindex: true
plugins.security.audit.type: internal_opensearch
plugins.security.enable_snapshot_restore_privilege: true
plugins.security.check_snapshot_restore_write_privileges: true
plugins.security.restapi.roles_enabled: ["all_access", "security_rest_api_access"]

plugins.security.ssl.transport.pemcert_filepath: plg-lms24-idx.pem
plugins.security.ssl.transport.pemkey_filepath: plg-lms24-idx.key
plugins.security.ssl.transport.pemtrustedcas_filepath: root-ca.pem
plugins.security.ssl.transport.enforce_hostname_verification: false
plugins.security.ssl.transport.resolve_hostname: false
plugins.security.ssl.http.enabled: true
plugins.security.ssl.http.pemcert_filepath: plg-lms24-idx_http.pem
plugins.security.ssl.http.pemkey_filepath: plg-lms24-idx_http.key
plugins.security.ssl.http.pemtrustedcas_filepath: root-ca.pem
plugins.security.nodes_dn:
- CN=plg-lms24-idx.localdomain,OU=Ops,O=localdomain\, Inc.,DC=localdomain
- CN=plg-lms24-manager.localdomain,OU=Ops,O=localdomain\, Inc.,DC=localdomain
- CN=plg-lms24-idx2.localdomain,OU=Ops,O=localdomain\, Inc.,DC=localdomain
plugins.security.authcz.admin_dn:
- CN=admin.localdomain,OU=Ops,O=localdomain\, Inc.,DC=localdomain

And I wonder if in nodes_dn: CN has FQDN, while discovery.seed_hosts use only hostnames
Does it matter?

@piotrfrq To answer you last question:

discovery.seed_hosts controls how nodes discover each other during cluster formation. It uses hostnames or IP addresses resolvable by DNS or /etc/hosts. Whether you use short hostnames (plg-lms24-idx) or FQDNs (plg-lms24-idx.localdomain) here it doesn’t matter, as long as they resolve properly.
plugins.security.nodes_dn defines which nodes are allowed to join the cluster, by validating their TLS certificates’ Subject DN (Distinguished Name).
Here, the exact DN string matters, including the FQDN in the CN. This value must match exactly what’s in the certificate. OpenSearch uses strict string matching, not DNS resolution.

But going back to the issue you are having, are you seeing the errors you provided after you disconnect one of the nodes and opensearch dashboards are no longer accessible?

If so can you provide the logs from dashboards (attempting to connect) and also opensearch_dashboards.yml file (redact anything sensitive).

No. After fresh deployment, I just wait until cluster is established, I can login into Dashboards, it looks fine. I do nothing, and after ~20 minutes I check opensearch logs and see those connection timeouts.

Dashboards are up, I don’t touch them, because I guess their problem is related to the opensearch/cluster problem. When I figure it out, I hope automagically dashboards will behave correctly.

So now I try to focus on opensearch only. And I wonder where should I pay special attention, in opensearch logs for example, maybe before those timeouts there could be some explanation what could the the root cause?

I’m afraid I dont follow, are you seeing any actual issues with a cluster apart from the error logs?

Can you confirm the Subject DN of the certificates matches exactly the configured plugins.security.nodes_dn

When you see the error logs, is the cluster down to 2 nodes? and the dashboards are fully accessible throughout these timeout?

No, cluster seams to be working fine.

Confirm. Otherwise the error would be different, and cluster health will not be green I guess.

plugins.security.nodes_dn:
- CN=plg-lms24-idx.localdomain,OU=Ops,O=localdomain,DC=localdomain
- CN=plg-lms24-manager.localdomain,OU=Ops,O=localdomain,DC=localdomain
- CN=plg-lms24-idx2.localdomain,OU=Ops,O=localdomain,DC=localdomain
openssl x509 -in plg-lms24-manager.pem -text -noout
....
Subject: DC=localdomain, O=localdomain, OU=Ops, CN=plg-lms24-manager.localdomain

The errors appear when all nodes are up. Dashboards are accessible.

GET /_cat/nodes?v"
ip            heap.percent ram.percent cpu load_1m load_5m load_15m node.role node.roles                  cluster_manager name
10.17.229.248           52          79   1    0.00    0.01     0.06 dim       cluster_manager,data,ingest *               plg-lms24-idx
10.17.229.238           16          81   1    0.04    0.01     0.03 dim       cluster_manager,data,ingest -               plg-lms24-manager
10.17.229.224           50          75   0    0.01    0.04     0.07 dim       cluster_manager,data,ingest -               plg-lms24-idx2


GET _cluster/health?pretty" 
{
  "cluster_name" : "lms-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "discovered_master" : true,
  "discovered_cluster_manager" : true,
  "active_primary_shards" : 17,
  "active_shards" : 37,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}