At the core of Open Distro for Elasticsearch’s ability to provide a seamless scaling experience, lies its ability distribute its workload across machines. This is achieved via
sharding. When you create an index you set a primary and replica shard count for that index. Elasticsearch distributes your data and requests across those shards, and the shards across your data nodes.
The capacity and performance of your cluster depend critically on how Elasticsearch allocates shards on nodes. If all of your traffic goes to one or two nodes because they contain the active indexes in your cluster, those nodes will show high CPU, RAM, disk, and network use. You might have tens or hundreds of nodes in your cluster sitting idle while these few nodes melt down.
In this post, I will dig into Elasticsearch’s shard allocation strategy and discuss the reasons for “hot” nodes in your cluster. With this understanding, you can fix the root cause to achieve better performance and a more stable cluster.
This is a companion discussion topic for the original entry at https://opendistro.github.io/for-elasticsearch/blog/open%20distro%20for%20elasticsearch%20updates/2019/08/Demystifying-Elasticsearch-Shard-Allocation-categories/