org.opensearch.transport.ConnectTransportException: [opensearch-cluster-master-18][10.42.2.20:9300] connect_timeout[30s]

16bil documents per day is not very much. I do that in about 15 minutes.

The problem seems related to how many concurrent connections between new node and other old instances. They establish 22~30 connections peer to peers which result in thousands connections in paralle during initilization the new node.

^ This doesn’t really make a lot of sense, I’m not sure what you’re talking about. “Thousands of connections” is not a lot. You might just need to make sure you allow enough tcp connection to your process, via the systemd unit definition.
Even if there is some case that I can’t imagine based on the description, you still should have only 3 master nodes, with a quarum of 2. If you are talking about having something external to your cluster connect to the api en masse, then use client nodes, and not master nodes, for that purpose. A “client node” is just a node with master=false and data=false.