May I know how opensearch doing peer check? Is it possible to run the same check as NodeConnectionsService to the remote targets(client, data or master) via curl to validate the network or pod condition.
Instances are added/removed randomly.
Is there known bug in 1.2.4 release for certain symptom? In a small scale setup, not seeing this problem. The problem is observed from a OpenSearch consist of 140 data, 80 masters and 80 clients.
OpenSearch version: 1.2.4
[2022-04-08T08:35:23,752][WARN ][o.o.c.NodeConnectionsService] [opensearch-cluster-master-1] failed to connect to {opensearch-cluster-client-51}{eo7Rd_i8Q-GATVzpQ--zmg}{sJmfxsuGSmOCPwSaTUxD5w}{10.42.4.6}{10.42.4.6:9300}{ir}{shard_indexing_pressure_enabled=true} (tried [1] times)
org.opensearch.transport.ConnectTransportException: [opensearch-cluster-client-51][10.42.4.6:9300] connect_timeout[30s]
at org.opensearch.transport.TcpTransport$ChannelsConnectedListener.onTimeout(TcpTransport.java:1070) ~[opensearch-1.2.4.jar:1.2.4]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:733) ~[opensearch-1.2.4.jar:1.2.4]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
at java.lang.Thread.run(Thread.java:832) [?:?]
[2022-04-08T08:05:37,583][INFO ][o.o.c.s.ClusterApplierService] [opensearch-cluster-master-73] added {{opensearch-cluster-client-69}{nigWd0MeSN66jOw3YwPXEA}{qZgQ6wE8TkC8aNxcdtNeMQ}{10.42.4.11}{10.42.4.11:9300}{ir}{shard_indexing_pressure_enabled=true},{opensearch-cluster-client-54}{XucxhUbPRWOtL_OZR0qQ9Q}{YywOjo3dQ5iuDphaPpr8Qg}{10.42.3.206}{10.42.3.206:9300}{ir}{shard_indexing_pressure_enabled=true},{opensearch-cluster-client-70}{9bQn9ukQS5uL6Iv2Udk-ng}{6era5uDjQpO2Xo19mRiBMw}{10.42.6.223}{10.42.6.223:9300}{ir}{shard_indexing_pressure_enabled=true},{opensearch-cluster-client-56}{p4IdoCTZSaqQI5n1NlAOag}{JSLy_WXGTNaF_hlNElYqyQ}{10.42.0.116}{10.42.0.116:9300}{ir}{shard_indexing_pressure_enabled=true}}, term: 304, version: 19993, reason: ApplyCommitRequest{term=304, version=19993, sourceNode={opensearch-cluster-master-61}{fsmqOP7NQ9Ku0Pk3tmRf2g}{fTD_OX6QQTKbNJg7qXOVrw}{10.42.3.187}{10.42.3.187:9300}{m}{shard_indexing_pressure_enabled=true}}
[2022-04-08T08:07:43,336][INFO ][o.o.c.s.ClusterApplierService] [opensearch-cluster-master-73] removed {{opensearch-cluster-client-69}{nigWd0MeSN66jOw3YwPXEA}{qZgQ6wE8TkC8aNxcdtNeMQ}{10.42.4.11}{10.42.4.11:9300}{ir}{shard_indexing_pressure_enabled=true}}, term: 304, version: 19999, reason: ApplyCommitRequest{term=304, version=19999, sourceNode={opensearch-cluster-master-61}{fsmqOP7NQ9Ku0Pk3tmRf2g}{fTD_OX6QQTKbNJg7qXOVrw}{10.42.3.187}{10.42.3.187:9300}{m}{shard_indexing_pressure_enabled=true}}
I would check your logs for more detail. Your instances might be undersized, and some load event is pushing them over, so they are crashing and restarting, and likewise are leaving and rejoining your cluster. I suspect that is actually what’s happening, here, but can’t say for certain based on the info provided.
Be sure to check your syslog if there isn’t any evidence in opensearch’s log; often there will be some barfage output in your linux distro syslog, and/or systemd’s journal.
Another thought might be that cluster_state is very large, causing timeouts on connection attempt. I saw your host is named master-18… Your cluster doesn’t have 18 masters, does it? The recommended number of masters in a cluster is 3, with a quorum of 2.
Another thought might be that your acting master instance is too loaded, and cannot keep up with heartbeat type things, and nodes are dropping and rejoining for that reason.
Either way, You need to look at more logs. Look for evidence of crashing, or focus on the log from your acting master node.
Hi mhoydis,
Thanks for the response. Some more context about this cluster and use case.
This cluster is build by helm chart and running in a k8s which consist of 12 powerful physical machines. Each has 12 NVMEs. Each NVME is used as opensearch-data role. So we have around 104 data role nodes ( k8s pods). And there’s 80 opensearch-client pods and 80 openserach-master.
10TB indexed documents per day. That’s around 16 billions documents per day.
green open filebeat-proxy-server-2022.04.07 OK-Y_e0RT-iksJWuj2C6FA 50 1 4,559,306,980 0 5.5tb 2.7tb
green open filebeat-object-server-2022.04.07 uW2SV62rRfqAx14kPrRK_A 50 0 12,049,821,773 0 4.8tb 4.8tb
green open filebeat-object-server-2022.04.08 EIX_9FHqTAqlMrBfJAX1vQ 50 0 12,765,540,385 0 5.1tb 5.1tb
Another thought might be that cluster_state is very large
How to check the cluster_state size?
check your logs for more detail
Unfortunately, these instances are running as k8s pod and there’s no system log. It’s lacking of basic troubleshooting tools inside the image.
Your cluster doesn’t have 18 masters, does it?
well, we had 80 and I lower it to 9 since reading your response.
9 is still too many masters. You want exactly 3 masters with a quorum of 2. It could be a reason why your nodes drop - there is a lot of overhead involved in having so many masters.
After lowering the master to 9 from 80, new data nodes can be added to the cluster after awhile.
9 is still too many masters. You want exactly 3 masters with a quorum of 2.
Got it. I’m going to lower it to 3 now.
The problem seems related to how many concurrent connections between new node and other old instances. They establish 22~30 connections peer to peers which result in thousands connections in paralle during initilization the new node.
Why nodes establish such many connections to new peer? Can this be improved?
According to the situation while having many nodes in the cluster, does it mean that there’s scale limitaiton of a OpenSearch cluster? How’s the best practice for storing 16 billions documents per day?
16bil documents per day is not very much. I do that in about 15 minutes.
The problem seems related to how many concurrent connections between new node and other old instances. They establish 22~30 connections peer to peers which result in thousands connections in paralle during initilization the new node.
^ This doesn’t really make a lot of sense, I’m not sure what you’re talking about. “Thousands of connections” is not a lot. You might just need to make sure you allow enough tcp connection to your process, via the systemd unit definition.
Even if there is some case that I can’t imagine based on the description, you still should have only 3 master nodes, with a quarum of 2. If you are talking about having something external to your cluster connect to the api en masse, then use client nodes, and not master nodes, for that purpose. A “client node” is just a node with master=false and data=false.
Ya, a few thoudsands of connections for a physical machine is not a big deal. These instances are running with k8s pod. We’ll try to tune to k8s in the host level (we do but may miss parameters).
By launching the OpenSearch cluster from Helm chart, we have dedicated master, data and ir(ingest & remote_cluster_client) roles. All documents are the output of filebeats from 1k servers for centralized logging purposes.
I think we might misunderstand the data flow. All filebeats dump logs in json to these ports. I guess the IR nodes(port:9200) receive the data streaming from Filebeats and then determine which shard and data node for storing the indexed data. The IR nodes forward the data bytes to data node(port:9300) Am I right?
Thanks for your advice. I have a more clear picture than beginning now. Gonna need to read more documents in detail for how it works underlying. We have down the master to 3 now and waching it. There’s 100 NVMEs are going to be added to the cluster. I need to think about the best way to make this cluster more efficiently.
New data pod takes long time to respond. Is there any bug within the community docker image?
[opensearch@opensearch-cluster-data-0 tools]$ time ./securityadmin.sh -h 10.42.13.60 -icl -nhnv -cacert ~/config/root-ca.pem -cert ~/config/kirk.pem -key ~/config/kirk-key.pem
Security Admin v7
Will connect to 10.42.13.60:9300 ... done
ERR: Cannot connect to OpenSearch. Please refer to opensearch logfile for more information
Trace:
NoNodeAvailableException[None of the configured nodes are available: [{#transport#-1}{ouBSFZOJQO-LMjniQVlzTw}{10.42.13.60}{10.42.13.60:9300}]]
at org.opensearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(TransportClientNodesService.java:381)
at org.opensearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:272)
at org.opensearch.client.transport.TransportProxyClient.execute(TransportProxyClient.java:79)
at org.opensearch.client.transport.TransportClient.doExecute(TransportClient.java:484)
at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:433)
at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:419)
at org.opensearch.security.tools.SecurityAdmin.execute(SecurityAdmin.java:524)
at org.opensearch.security.tools.SecurityAdmin.main(SecurityAdmin.java:157)
real 0m41.568s
user 0m6.263s
sys 0m0.345s
[opensearch@opensearch-cluster-data-0 tools]$ time ./securityadmin.sh -h 10.42.13.60 -icl -nhnv -cacert ~/config/root-ca.pem -cert ~/config/kirk.pem -key ~/config/kirk-key.pem
Security Admin v7
Will connect to 10.42.13.60:9300 ... done
ERR: Cannot connect to OpenSearch. Please refer to opensearch logfile for more information
Trace:
NoNodeAvailableException[None of the configured nodes are available: [{#transport#-1}{YCxjrb7kREOuwrUnexs5eg}{10.42.13.60}{10.42.13.60:9300}]]
at org.opensearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(TransportClientNodesService.java:381)
at org.opensearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:272)
at org.opensearch.client.transport.TransportProxyClient.execute(TransportProxyClient.java:79)
at org.opensearch.client.transport.TransportClient.doExecute(TransportClient.java:484)
at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:433)
at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:419)
at org.opensearch.security.tools.SecurityAdmin.execute(SecurityAdmin.java:524)
at org.opensearch.security.tools.SecurityAdmin.main(SecurityAdmin.java:157)
real 0m38.634s
user 0m6.394s
sys 0m0.431s
The data node is not ready for minutes since it’s started.
The docker CPU usage sticks to 100% forever. What’s consuming the cpu resource of a new spawned pod by opensearch?
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
61a81b48eff8 k8s_opensearch-data_opensearch-cluster-data-123_cs-opensearch_2d665deb-1221-4cbd-b7d0-788c2967f750_0 101.64% 16.88GiB / 32GiB 52.76% 0B / 0B 0B / 515MB 74
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
61a81b48eff8 k8s_opensearch-data_opensearch-cluster-data-123_cs-opensearch_2d665deb-1221-4cbd-b7d0-788c2967f750_0 100.73% 16.88GiB / 32GiB 52.76% 0B / 0B 0B / 515MB 74
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
61a81b48eff8 k8s_opensearch-data_opensearch-cluster-data-123_cs-opensearch_2d665deb-1221-4cbd-b7d0-788c2967f750_0 100.73% 16.88GiB / 32GiB 52.76% 0B / 0B 0B / 515MB 74