Node data throw error continiously after restart

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
opensearch.x86_64 2.15.0-1

Describe the issue:
Data node encounter error

[2024-09-27T09:09:27,983][ERROR][o.o.s.f.SecurityFilter   ] [es8-data-h-01] OpenSearch Security not initialized for indices:admin/mapping/auto_put

[2024-09-27T09:11:36,485][ERROR][o.o.s.f.SecurityFilter   ] [es8-data-h-01] OpenSearch Security not initialized for indices:admin/delete
[2024-09-27T09:11:36,485][ERROR][o.o.i.i.s.d.AttemptDeleteStep] [es8-data-h-01] Failed to delete index [index=security-auditlog-2024.09.27]

[2024-09-27T09:16:14,450][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [es8-data-h-01] Cancelling the migration process.
[2024-09-27T09:16:16,262][INFO ][o.o.j.s.JobScheduler     ] [es8-data-h-01] Will delay 13432 miliseconds for next execution of job security-auditlog-2024.09.27

Configuration:

---
action.destructive_requires_name: "false"
bootstrap.memory_lock: "false"
cluster.name: "es8-infra"
discovery.seed_hosts:
- "es8-master-1.mycompany.com"
- "es8-master-2.mycompany.com"
- "es8-master-3.mycompany.com"
http.port: "9200"
network.host:
- "es8-data-h-02.mycompany.com"
- "_local_"
network.publish_host: "es8-data-h-02.mycompany.com"
node.name: "es8-data-h-02"
node.roles:
- "ingest"
- "data"
- "data_hot"
path.data:
- "/mnt/es8-infra/opensearch"
path.logs: "/var/log/opensearch"
compatibility.override_main_response_version: true
plugins.security.audit.type: internal_opensearch
plugins.security.allow_unsafe_democertificates: false
plugins.security.ssl_cert_reload_enabled: true
plugins.security.allow_default_init_securityindex: true
plugins.security.system_indices.enabled: true
plugins.security.system_indices.indices: [".opendistro-security", ".opensearch-observability"]
plugins.security.ssl.transport.enforce_hostname_verification: false
plugins.security.ssl.transport.enabled: true
plugins.security.ssl.transport.pemcert_filepath: "/etc/opensearch/config/data-h-02.pem"
plugins.security.ssl.transport.pemkey_filepath: "/etc/opensearch/config/data-h-02-key.pem"
plugins.security.ssl.transport.pemtrustedcas_filepath: "/etc/opensearch/config/root-ca.pem"
plugins.security.ssl.http.enabled: true
plugins.security.ssl.http.pemcert_filepath: "/etc/opensearch/config/data-h-02.pem"
plugins.security.ssl.http.pemkey_filepath: "/etc/opensearch/config/data-h-02-key.pem"
plugins.security.ssl.http.pemtrustedcas_filepath: "/etc/opensearch/config/root-ca.pem"
plugins.security.authcz.admin_dn:
- CN=es8.mycompany.com,OU=Infra,O=mycompany,L=Ho Chi Minh City,ST=District 3,C=VN
plugins.security.nodes_dn:
- 'CN=es8-*.mycompany.com,OU=Infra,O=mycompany,L=Ho Chi Minh City,ST=District 3,C=VN'

- 'CN=es8-master-1.mycompany.com,OU=Infra,O=mycompany,L=Ho Chi Minh City,ST=District 3,C=VN'
- 'CN=es8-master-2.mycompany.com,OU=Infra,O=mycompany,L=Ho Chi Minh City,ST=District 3,C=VN'
- 'CN=es8-master-3.mycompany.com,OU=Infra,O=mycompany,L=Ho Chi Minh City,ST=District 3,C=VN'
- 'CN=es8-data-h-02.hoiio.info,OU=Infra,O=mycompany,L=Ho Chi Minh City,ST=District 3,C=VN'
- 'CN=es8-data-h-01.hoiio.info,OU=Infra,O=mycompany,L=Ho Chi Minh City,ST=District 3,C=VN'

Seem like node data need more permission to interact with admin indices. Could I ask anybody encounter similarly error before?

@huynguyenb3 Do you run any ISM policy against audit logs? What initiates the deletion of the security-audit-2024.09.27 index?

you are right, I’ve set up ISM policy a few days ago. I dont know this policy really delete it because I set delete indices exceed 60 days

@huynguyenb3 This policy has two separate states. The first is Delete with no condition and the second is Transition in 60 days with action.
Try this instead.

@huynguyenb3 Also check if the user that created this ISM policy has permission to delete security-auditlog indices.

I removed the ISM policy to easier debug, but the node data still throw error in log

[2024-09-27T10:30:50,449][ERROR][o.o.s.f.SecurityFilter   ] [es8-data-h-01] OpenSearch Security not initialized for indices:admin/mapping/auto_put

@huynguyenb3 Can you login to the cluster?
If so, can you download the current configuration from OpenSearch cluster using securityadmin.sh and then reapply?

Double check the config files before reapplying to the OpenSearch cluster.

Yeah I can login opensearch dashboard normally,
I reapply in master node

[root@es8-master-2 opensearch-security]# /usr/share/opensearch/plugins/opensearch-security/tools/securityadmin.sh -cd /etc/opensearch/opensearch-security/ -cacert /etc/opensearch/certs/root-ca.pem -cert /etc/opensearch/certs/admin.pem -key /etc/opensearch/certs/admin-key.pem -icl -nhnv -arc --diagnose
**************************************************************************
** This tool will be deprecated in the next major release of OpenSearch **
** https://github.com/opensearch-project/security/issues/1755           **
**************************************************************************
Security Admin v7
Will connect to localhost:9200 ... done
Connected as "CN=es8.mycompany.com,OU=Infra,O=mycompanys,L=Ho Chi Minh City,ST=District 3,C=VN"
OpenSearch Version: 2.15.0
Diagnostic trace written to: /etc/opensearch/opensearch-security/securityadmin_diag_trace_2024-Sep-27_10-49-46.txt
Contacting opensearch cluster 'opensearch' ...
Clustername: es8-infra
Clusterstate: GREEN
Number of nodes: 5
Number of data nodes: 2
.opendistro_security index already exists, so we do not need to create one.
Populate config from /etc/opensearch/opensearch-security/
Will update '/config' with /etc/opensearch/opensearch-security/config.yml
   SUCC: Configuration for 'config' created or updated
Will update '/roles' with /etc/opensearch/opensearch-security/roles.yml
   SUCC: Configuration for 'roles' created or updated
Will update '/rolesmapping' with /etc/opensearch/opensearch-security/roles_mapping.yml
   SUCC: Configuration for 'rolesmapping' created or updated
Will update '/internalusers' with /etc/opensearch/opensearch-security/internal_users.yml
   SUCC: Configuration for 'internalusers' created or updated
Will update '/actiongroups' with /etc/opensearch/opensearch-security/action_groups.yml
   SUCC: Configuration for 'actiongroups' created or updated
Will update '/tenants' with /etc/opensearch/opensearch-security/tenants.yml
   SUCC: Configuration for 'tenants' created or updated
Will update '/nodesdn' with /etc/opensearch/opensearch-security/nodes_dn.yml
   SUCC: Configuration for 'nodesdn' created or updated
Will update '/whitelist' with /etc/opensearch/opensearch-security/whitelist.yml
   SUCC: Configuration for 'whitelist' created or updated
Will update '/audit' with /etc/opensearch/opensearch-security/audit.yml
   SUCC: Configuration for 'audit' created or updated
Will update '/allowlist' with /etc/opensearch/opensearch-security/allowlist.yml
   SUCC: Configuration for 'allowlist' created or updated
SUCC: Expected 10 config types for node {"updated_config_types":["allowlist","tenants","rolesmapping","nodesdn","audit","roles","whitelist","actiongroups","config","internalusers"],"updated_config_size":10,"message":null} is 10 (["allowlist","tenants","rolesmapping","nodesdn","audit","roles","whitelist","actiongroups","config","internalusers"]) due to: null
SUCC: Expected 10 config types for node {"updated_config_types":["allowlist","tenants","rolesmapping","nodesdn","audit","roles","whitelist","actiongroups","config","internalusers"],"updated_config_size":10,"message":null} is 10 (["allowlist","tenants","rolesmapping","nodesdn","audit","roles","whitelist","actiongroups","config","internalusers"]) due to: null
SUCC: Expected 10 config types for node {"updated_config_types":["allowlist","tenants","rolesmapping","nodesdn","audit","roles","whitelist","actiongroups","config","internalusers"],"updated_config_size":10,"message":null} is 10 (["allowlist","tenants","rolesmapping","nodesdn","audit","roles","whitelist","actiongroups","config","internalusers"]) due to: null
SUCC: Expected 10 config types for node {"updated_config_types":["allowlist","tenants","rolesmapping","nodesdn","audit","roles","whitelist","actiongroups","config","internalusers"],"updated_config_size":10,"message":null} is 10 (["allowlist","tenants","rolesmapping","nodesdn","audit","roles","whitelist","actiongroups","config","internalusers"]) due to: null
SUCC: Expected 10 config types for node {"updated_config_types":["allowlist","tenants","rolesmapping","nodesdn","audit","roles","whitelist","actiongroups","config","internalusers"],"updated_config_size":10,"message":null} is 10 (["allowlist","tenants","rolesmapping","nodesdn","audit","roles","whitelist","actiongroups","config","internalusers"]) due to: null
Done with success

and data node log:

2024-09-27T10:49:48,387][ERROR][o.o.s.f.SecurityFilter   ] [es8-data-h-01] OpenSearch Security not initialized for indices:admin/mapping/auto_put
[2024-09-27T10:49:48,543][WARN ][o.o.s.c.ConfigurationRepository] [es8-data-h-01] Unable to reload configuration, initalization thread has not yet completed.
[2024-09-27T10:49:50,256][ERROR][o.o.s.f.SecurityFilter   ] [es8-data-h-01] OpenSearch Security not initialized for indices:admin/mapping/auto_put

the config file opensearch.yml master is same data node. Am I perform correctly?

@huynguyenb3 Have you tried restarting the data node again?
Do you see these errors in all data nodes?

Good morning, I have 2 data nodes, only 1 data node has these errors, if I restart the data node which encountered errors, the error will move to another one :joy:
E.g: error happen in data node 1, I restart data node 1, data node 2 will encounter errors and vice versa.
I don’t restart both 2 data nodes as the same time because it will be lost shard data.

Hi guy, after restart both master node, I received this error on master node:

[2024-09-30T02:38:44,337][ERROR][o.o.t.n.s.SecureNetty4Transport] [es8-master-2] Exception during establishing a SSL connection: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown

[2024-09-30T02:38:44,338][WARN ][i.n.c.AbstractChannelHandlerContext] [es8-master-2] An exception 'OpenSearchSecurityException[The provided TCP channel is invalid.]; nested: DecoderException[javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown]; nested: SSLHandshakeException[Received fatal alert: certificate_unknown];' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:

what is relate previous error :joy:

@huynguyenb3 Did you replace any TLS certificates recently? New certificates won’t apply until the node restart.
Maybe your certs have expired.

Thank you for your considering, I don’t update any TLS certs in master and data nodes. Theoretically, if TLS cert fail, expire or revoke, cluster was down due to nodes can not communicate each other. In my case, cluster is green status, but all master node throw error certificate_unknown every miliseconds. That is so so weird :smiling_face_with_tear:
Also, I’ve checked all node certs expiry:

[root@es8-master-3 certs]# openssl x509 -enddate -noout -in root-ca.pem
notAfter=Jul 11 03:27:05 2029 GMT
[root@es8-master-3 certs]# openssl x509 -enddate -noout -in master-3.pem
notAfter=Jul 11 10:18:08 2029 GMT

@huynguyenb3 What about certs in the other nodes? Your error is in the transport layer (9300-9400). This is node-to-node communication.

Do you see any other transport errors? Do you see any transport errors in other nodes?