OpenSearch cluster becomes unresponsive when multiple LDAP users login

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): Latest versions 3.3.x

Describe the issue:

When a single user or local account browse Dashboards everything is fine. When multiple LDAP users browse others cannot login and Opensearch nodes start dropping from the cluster.

Configuration:

Load Balancer → OpenSearch Dashboards → OpenSearch Cluster

Load Balancer uses SSL based persistence for OpenSearch Dashboards.

OpenSearch Dashboards config.


---
opensearch:
  hosts:
  - https://opensearch-1.example.com:29200
  - https://opensearch-2.example.com:29200
  - https://opensearch-3.example.com:29200
  username: kibanaserver
  password: ...
  ssl:
    verificationMode: full
    key: "/usr/share/opensearch-dashboards/config/certs/node-key.pem"
    certificate: "/usr/share/opensearch-dashboards/config/certs/node.pem"
    certificateAuthorities:
    - "/usr/share/opensearch-dashboards/config/certs/ca.pem"
server:
  host: 0.0.0.0
  ssl:
    enabled: true
    key: "/usr/share/opensearch-dashboards/config/certs/node-key.pem"
    certificate: "/usr/share/opensearch-dashboards/config/certs/node.pem"
    certificateAuthorities:
    - "/usr/share/opensearch-dashboards/config/certs/ca.pem"
  port: 25601

OpenSearch config:


---
action:
  auto_create_index: ".watches,.triggered_watches,.watcher-history-*,logstash-*-stream"
bootstrap:
  memory_lock: 'true'
cluster:
  name: coordinating.prod.opensearch
  routing:
    allocation:
      awareness:
        attributes:
        - host
  filecache.remote_data_ratio: 5
  initial_cluster_manager_nodes:
  - opensearch-1-coordinating
  - opensearch-2-coordinating
  - opensearch-3-coordinating
  remote:
    vf:
      seeds:
      - opensearch-1.example.com:19300
      - opensearch-2.example.com:19300
      - opensearch-3.example.com:19300
http:
  compression: true
  port: 29200
  publish_host: opensearch-1.example.com
network:
  publish_host: opensearch-1.example.com
  bind_host: _site_
node:
  attr:
    host: opensearch-1.example.com
  roles:
  - ingest
  - remote_cluster_client
  - data
  - cluster_manager
  name: opensearch-1-coordinating
plugins:
  security:
    allow_default_init_securityindex: true
    restapi:
      roles_enabled:
      - all_access
    ssl:
      transport:
        pemcert_filepath: node.pem
        pemkey_filepath: node-key.pem
        pemtrustedcas_filepath: root-ca.pem
        enforce_hostname_verification: false
      http:
        enabled: true
        pemcert_filepath: node.pem
        pemkey_filepath: node-key.pem
        pemtrustedcas_filepath: root-ca.pem
    nodes_dn:
    - CN=opensearch-*.example.com,O=IDM.EXAMPLE.COM
    unsupported:
      passive_intertransport_auth_initially: true
discovery:
  seed_hosts:
  - opensearch-1.example.com:29300
  - opensearch-2.example.com:29300
  - opensearch-3.example.com:29300
prometheus:
  indices: false
  cluster:
    settings: true
  nodes:
    filter: _all
transport:
  port: 29300
  publish_host: opensearch-1.example.com

OpenSearch Security config.yml

---
_meta:
  type: config
  config_version: 2
config:
  dynamic:
    http:
      anonymous_auth_enabled: false
    authc:
      internal_auth:
        order: 0
        description: HTTP basic authentication using the internal user database
        http_enabled: true
        transport_enabled: true
        http_authenticator:
          type: basic
          challenge: false
        authentication_backend:
          type: internal
      ldap_auth:
        order: 1
        description: Authenticate using LDAP
        http_enabled: true
        transport_enabled: true
        http_authenticator:
          type: basic
          challenge: false
        authentication_backend:
          type: ldap
          config:
            enable_ssl: false
            enable_start_tls: false
            enable_ssl_client_auth: false
            verify_hostnames: true
            hosts:
            - freeipa-1.example.com:389
            - freeipa-2.example.com:389
            - freeipa-3.example.com:389
            bind_dn: uid=svc_opensearch,cn=users,cn=accounts,dc=idm,dc=example,dc=com
            password: ...
            userbase: cn=users,cn=accounts,dc=idm,dc=example,dc=com
            usersearch: "(uid={0})"
            username_attribute: uid
    authz:
      ldap_roles:
        description: Authorize using LDAP
        http_enabled: true
        transport_enabled: true
        authorization_backend:
          type: ldap
          config:
            enable_ssl: false
            enable_start_tls: false
            enable_ssl_client_auth: false
            verify_hostnames: true
            hosts:
            - freeipa-1.example.com:389
            - freeipa-2.example.com:389
            - freeipa-3.example.com:389
            bind_dn: uid=svc_opensearch,cn=users,cn=accounts,dc=idm,dc=example,dc=com
            password: ...
            userbase: cn=users,cn=accounts,dc=idm,dc=example,dc=com
            usersearch: "(uid={0})"
            username_attribute: uid
            skip_users:
            - admin
            - kibanaserver
            - kibana_server
            - logstash_internal
            - gitlab_manager
            - nrpe
            rolesearch_enabled: true
            rolebase: cn=groups,cn=accounts,dc=idm,dc=example,dc=com
            rolesearch: "(member={0})"
            userroleattribute:
            userrolename: none
            rolename: cn
            resolve_nested_roles: true

Relevant Logs or Screenshots:

OpenSearch Dashboards logs the following two items which seem related.

	ResponseError: Response Error
    at onBody (/usr/share/opensearch-dashboards/node_modules/@opensearch-project/opensearch/lib/Transport.js:426:23)
    at IncomingMessage.onEnd (/usr/share/opensearch-dashboards/node_modules/@opensearch-project/opensearch/lib/Transport.js:341:11)
    at IncomingMessage.emit (node:events:530:35)
    at endReadableNT (node:internal/streams/readable:1698:12)
    at processTicksAndRejections (node:internal/process/task_queues:82:21) {
  meta: {
    body: 'Authentication finally failed',
    statusCode: 401,
    headers: {
      'x-opaque-id': '7631be5b-1284-4396-bc31-5c5c9732291b',
      'x-opensearch-version': 'OpenSearch/3.3.2 (opensearch)',
      'content-type': 'text/plain; charset=UTF-8',
      'content-length': '29'
    },
    meta: {
      context: null,
      request: [Object],
      name: 'opensearch-js',
      connection: [Object],
      attempts: 0,
      aborted: false
    }
  },
  isBoom: true,
  isServer: false,
  data: null,
  output: {
    statusCode: 401,
    payload: {
      statusCode: 401,
      error: 'Unauthorized',
      message: 'Response Error'
    },
    headers: {}
  },
  [Symbol(SavedObjectsClientErrorCode)]: 'SavedObjectsClient/notAuthorized'
}

----

[TimeoutError]: Request timed out

OpenSearch eventually starts evicting nodes and this is logged:

{"type": "opensearch-log", "timestamp": "2025-12-17T15:04:18,132Z", "level": "ERROR", "component": "o.o.h.n.s.SecureNetty4HttpServerTransport", "cluster.name": "coordinating.prod.opensearch", "node.name": "opensearch-1-coordinating", "message": "Exception during establishing a SSL connection: java.net.SocketException: Connection reset", "cluster.uuid": "dZfy28OhQnqmXXXOujRg0A", "node.id": "eIFIXGcTRj25Wyf948ld-w" , 

I have Prometheus plugin pulling metrics and I see the cluster lose the member. Nothing looks wrong with the heap usage or cpu. Any further ideas would be helpful.

Found similar issue today. I deployed opensearch 3.3.0 on Kubernetes cluster but when connecting through ldaps, cluster froze and then start removing each other

@goldensand @doug_f Do you know how many LDAP groups your test user is a member of?
What LDAP server do you use?

@goldensand @doug_f Have you tried testing against a single OpenSearch node in opensearch_dashboards.yml?

This only seems to happen once a user with a large number of nested groups tries to login.

Once the user tries to login he is not authorized and continues to be blocked on subsequent login attempts. The user has 10 direct assigned groups and 75 indirect or nested groups he is a member of.

Once the user tries to login the logs in trace logs similar to the following.


DBGTRACE (10): escapedDn cn=bastion,cn=groups,cn=accounts,dc=example,dc=com
result nested attr count for depth 30 : 0
Results for LDAP group search for cn=bastion,cn=groups,cn=accounts,dc=example,dc=com in base convertedOldStyleSettings:
[]

DBGTRACE (10): escapedDn cn=bastion,cn=groups,cn=accounts,dc=example,dc=com
result nested attr count for depth 29 : 0
Results for LDAP group search for cn=bastion,cn=groups,cn=accounts,dc=example,dc=com in base convertedOldStyleSettings:
[]

This continues for several of the groups in our ldap. Another user with fewer groups can login and be authorized without issues. This logging goes away if I restart the cluster. It starts up again if the user tries to login.

This issue persists on v 3.4.0.

@pablo Have I answered your questions? Any other idea what needs to be done to fix this? Should I file an issue against the security plugin repo?