Performance impact using client certificate authentication

sezuan2 · August 23, 2021, 7:40am

Hello,

I post this question here because I suppose this also concerns opensearch.

I noticed some strange behavior when multiple clients (like a service scaled up by the autoscaler in kubernetes) connects to a elasticsearch/opendistro cluster. The connections were accepted slowly and were eventually rejected because of the full tcp backlog. After some investigation I noticed that connections using client certificates are slower established compared to the ones without client certificate.

A small test script visualizes the difference. It tries to establish 5000 connections, sends /_cluster/health every 15 seconds and timeouts and then retries after 5 seconds (‘ok’ means /_cluster/health request was successful, ‘connections’ are established connections)

with client certificate

without client certificate

without-clientcert

Is this due to a configuration error? Can the behaviour be improved?

Regards,
Matthias

pablo · August 23, 2021, 10:06am

Hello @sezuan2,

I think this behaviour would be expected as certificate-based connections will need extra time to encrypt and decrypt the packets.

How many ES nodes do you have in the cluster? Have you monitored RAM and Java Heap usage in those nodes?

sezuan2 · August 23, 2021, 12:20pm

Hello,

It’s a 6 nodes cluster, the connections are going to just one of the nodes. RAM and Java Heap looked good, gc times are good, too.

pablo · August 23, 2021, 3:26pm

What is the version of ODFE?
Do you use Kibana to connect with ES or you have custom app?

sezuan2 · August 24, 2021, 7:48am

Load test and the real application are custom.

pablo · August 24, 2021, 10:11am

What about the ODFE ES version?

pablo · August 25, 2021, 3:35pm

@sezuan2

Could you share your config.yml content?

sezuan2 · August 26, 2021, 7:04am

It’s elasticsearch 7.8.0 and Opendistro 1.9.0.

sezuan2 · August 31, 2021, 12:31pm

Here it is, with redacted parts:

  dynamic:
    filtered_alias_mode: "warn"
    disable_rest_auth: false
    disable_intertransport_auth: false
    respect_request_indices_options: false
    license: null
    kibana:
      multitenancy_enabled: true
      server_username: "kibanaserver"
      index: ".kibana"
    http:
      anonymous_auth_enabled: true
      xff:
        enabled: true
        internalProxies: "<redacted:regex>"
        remoteIpHeader: "X-Forwarded-For"
    authc:
      clientcert_auth_domain:
        description: "Authenticate via SSL client certificates"
        http_enabled: true
        transport_enabled: false
        order: 3
        http_authenticator:
          type: clientcert
          config:
            username_attribute: cn  #optional, if omitted DN becomes username
          challenge: false
        authentication_backend:
          type: "noop"
      ldap:
        http_enabled: true
        transport_enabled: false
        order: 1
        http_authenticator:
          challenge: false
          type: "basic"
          config: {}
        authentication_backend:
          type: "ldap"
          config:
            enable_ssl: true
            enable_start_tls: false
            enable_ssl_client_auth: false
            verify_hostnames: true
            hosts:
            - "<redacted:ldap-server"
            bind_dn: "<redacted:bind_dn>"
            password: "<redacted:password>"
            userbase: "<redacted:userbase>"
            usersearch: "(uid={0})"
            username_attribute: "uid"
        description: "Migrated from v6"
      basic_internal_auth_domain:
        http_enabled: true
        transport_enabled: true
        order: 2
        http_authenticator:
          challenge: false
          type: "basic"
          config: {}
        authentication_backend:
          type: "intern"
          config: {}
        description: "Migrated from v6"
    authz:
      roles_from_myldap:
        http_enabled: true
        transport_enabled: false
        authorization_backend:
          type: "ldap"
          config:
            enable_ssl: true
            enable_start_tls: false
            enable_ssl_client_auth: false
            verify_hostnames: true
            hosts:
            - "<redacted:ldap-server>"
            bind_dn: "<redacted:bind_dn>"
            password: "<redacted:password>"
            rolesearch: "(member={0})"
            userroleattribute: null
            userrolename: "disabled"
            rolename: "cn"
            resolve_nested_roles: true
            rolebase: "<redacted:rolebase>"
            usersearch: "(uid={0})"
            skip_users:
            - <redacted:various internal_users>
            - <redacted:*.domain which matches the client certs>
            - "opendistro_security_anonymous"
        description: "Migrated from v6"
    auth_failure_listeners: {}
    do_not_fail_on_forbidden: false
    multi_rolespan_enabled: false
    hosts_resolver_mode: "ip-only"
    transport_userrname_attribute: null
    do_not_fail_on_forbidden_empty: false

pablo · August 31, 2021, 2:29pm

@sezuan2

According to that config you’re using LDAP with SSL certificate. As far as I understood, you were testing LDAP with and without a secured connection (SSL cert). Without a secured connection (HTTP port 389) you have no performance issues (no timeouts). With SSL cert enabled (HTTPS port 636) you get timeouts with some requests.

Could you tell me what is your LDAP solution?

sezuan2 · August 31, 2021, 2:50pm

I’m testing ssl encrypted connections to elasticsearch, with and without client cert. I assume the ldap server should never be asked, because the client cert names and the anonymous user are in the skip_users list.

            skip_users:
            - <redacted:various internal_users>
            - <redacted:*.domain which matches the client certs>
            - "opendistro_security_anonymous"

pablo · September 3, 2021, 2:57pm

@sezuan2

skip_users will work only for authorization. Plug-in will still try to authenticate client certs with LDAP and basic authentication. Could you try to change the authentication order as per the below:

basic_auth
client_cert
ldap

sezuan2 · September 5, 2021, 7:23am

I’ll test the new order. Howerver, during my tests I’ve just tested client-cert vs. without-client-cert. In none of these tests a basic authentication header was sent. I would expect that in this case, the authc ldap part will be ignored.

sezuan2 · September 10, 2021, 11:50am

Hi Pablo!

this was a hint in the right direction. Removing the authz->ldap section made the client certificate requests fast. This is still confusing as no significant amount of ldap requests are visible with tcpdump.

Do you have any idea to limit the ldap role lookup to ldap users?

Anthony · September 10, 2021, 1:20pm

@sezuan2 if you change the authentication order, with ldap being last, the look up should only be done if ldap is used, meaning basic_auth and client_cert, failed.

Have you tried changing the order?

sezuan2 · September 10, 2021, 1:38pm

Yes, but it didn’t help. I also removed the ldap section from authentication, but it didn’t help, too. For unknown reason, it seems to do a ldap role lookup for client certificate users but not basic authenticated users.

sezuan2 · September 13, 2021, 6:58am

I did some more investigation. I observed when the authz.ldap section is configured, elasticsearch spends a lot of time while accessing the cache:

Without the ldap role section

With ldap role section

cert+ldaproles|643x500

Anthony · September 13, 2021, 6:55pm

@sezuan2 after further looking into this, I can see that the call to ldap is performed by design even for cert users, However I am not able to reproduce the delay that you are experiencing. You should be able to skip users using wildcard (like you have with .domain), this need to match the full cn, can you try to use "" as a starting point to see if this skips the ldap section altogether and work backwards from there?

sezuan2 · September 14, 2021, 1:12pm

@Anthony
I tried:

- ""
- "/.*/"

but to no avail.

I think it’s not caused by the ldap lookup itself. When you check the flamegraphs above you see the suspicious large amount of time spend in getEntry and lockedOrGetLoad. It’s strange that this doesn’t happen with basic auth. I also removed all skip_users, retested with basic auth, but the request time was still good.

If it’s really caused by lock issues, a high number of threads is probably required to replicate. I’m testing this on a cluster whose nodes have 52cores/104 threads.

Anthony · September 15, 2021, 5:43pm

Issue seems to be with caching on authz side, caching on authc works as expected, advised to raise a bug ticket here

Topic		Replies	Views
Encrypted communication between nodes and client Security	8	464	August 9, 2021
Client-Cert-Login from Kibana to ES hides Kibana-User Security	3	791	April 29, 2021
Opendistro security plugin and kubernetes readiness probe Security	3	1577	February 24, 2021
How to communicate with ODfE with certificates Security	1	750	April 28, 2021
SSL Woes with OpenDistro on Docker Security	6	1068	May 31, 2021

Performance impact using client certificate authentication

with client certificate

without client certificate

Without the ldap role section

With ldap role section

Related topics