Possibility of Autoscaling Opensearch cluster

Is it possible to autoscale the opensearch cluster without data loss? (using docker or k8s). For example, scale up the number of data nodes when under heavy load and scale down (delete the nodes) when not necessary.

I am thinking that a HorizontalPodAutoscaler to scale the opensearch statefulset should be possible. But I’m doubtful whether its possible to change the replicas dynamically

Hi. Yes you can, here’s a blog post on doing this for logs: Autoscaling Elasticsearch for Logs with a Kubernetes Operator

Thanks for your work on the operator! Unfortunately in my case, it was decided that the cluster is to be deployed on AWS Fargate.

Therefore, now I would like to know if I can setup the nodes to scale down automatically without any data loss.

My first idea is to set the shards to the max no. of nodes that we limit the scale-out to, and set a single replica. I however think that this config might lead to data loss when the node gets scaled down because of the shard allocation to all other nodes.

My other approach is to set a data node or two to have entire shard allocation and spin up new ingest nodes when scaling (we use ingest pipelines as well). This approach will lead to the data nodes facing the entire indexing and query load. And, this is only a partial scaling solution. Is there a possibility to scale-out and scale-in the data nodes without any data loss?

If there is any other alternative method to scale-out so that indexing and search do not take a performance hit on peak traffic and optimize cost when not used?

I’m not really familiar with AWS Fargate, but I imagine you could have some sort of a hook when you scale down (and shut down a node). If you do, then you can exclude that node from allocation (via shard allocation filtering) then wait for the relocation to finish (i.e. the node will be empty) before shutting it down.

This should work no matter what your sharding strategy is: you mentioned oversharding so you can scale out by the number of shards or having as many replicas as you have nodes. In fact, if you have as many replicas as your number of nodes, things should just work out of the box: it’s just that the extra replicas (for nodes that aren’t up when you’re not completely scaled out) will be unassigned.

@ [radu.gheorghe]
Couple of questions here, I’m also planning to build similar system from scratch.
High Scalable and Available with lot of concurrent users to access the system, could you pl help me with initial understanding of architecture I’m planning to use OpenSearch.

What should be initial implementation POC point?
Can I got with K8s → elk pods deployed on it? or Should I go with vm based ( i guess not recommended for High Scalable and Available )
If I go with K8s what should be my volume mapping? Stateful sets or managed volume to store the db data from elastic search?