Setting up three-node cluster for high availability

I am needing to set up a fresh OpenSearch install, and I want to use a cluster with nodes on multiple VMs for better availability. According to the docs

There are many ways to design a cluster

So I’m wondering what is the best cluster architecture for my requirements:

  • High availability for ingest
  • OpenSearch Dashboards available at a single domain name
  • Distribution of the data over multiple VMs in case one of them goes down

Hi @merlinz01,

To get your architecture right check docs here: Intro to OpenSearch - OpenSearch Documentation and https://opensearch.org/docs/latest/getting-started/intro/#shards

For best practice ideas, you can check: Operational best practices for Amazon OpenSearch Service - Amazon OpenSearch Service

You, also, might be interested in Optimize OpenSearch index shard sizes · OpenSearch for some performance optimization in your OpneSearch cluster.

best,
mj

Thank you, that is helpful.

Some more questions:
How do I decide what roles to give to each node?
Do I need a dedicated coordinating node (vs. a cluster manager) if I have more than one data node?
For high availability, can I ingest documents into any of the data nodes, or must they all go through a certain node?
Which node should I install Dashboards on?

@merlinz01, your nodes will form a cluster which will be managed by the manager cluster (it will be elected from cluster-manager-eligible nodes on your cluster), by default, each node is a cluster-manager-eligible, data, ingest, and coordinating node (you can configure it accordingly more here: Creating a cluster - OpenSearch Documentation).

OS Dashboards are instaled and configured separately from your nodes, please find more here:

# The URLs of the OpenSearch instances to use for all your queries.
opensearch.hosts: ["http://localhost:9200"]

more here: OpenSearch-Dashboards/config/opensearch_dashboards.yml at main · opensearch-project/OpenSearch-Dashboards · GitHub

Best,
mj

Will all the ingest network traffic have to route through the manager node? Or will the clients be able to load-balance which node they send the logs to? If I send data to various nodes with the ingest role on the same cluster, will that cause problems?

In the opensearch.hosts configuration, do I put all the cluster nodes, or just the manager node?

Thanks for your help.