Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): 1.2.3
Describe the issue: I have been tasked with improving the performance and stability of an OpenSearch cluster and I am trying to understand the following:
- Is it detrimental to the performance of the cluster if data nodes do not have equal size of data on each i.e. some have 70-80%+ and others 20-30%?
- Why are the data nodes at so high of RAM percentage (95+ or 100) the whole time and is that detrimental to performance?
- Currently there are 23k+ shards on the cluster. They have sizes spanning from a few KBs to 1100MB+. Does the amount of shards and this huge of a difference in their size detrimental to performance?
- Do you see overall in the configuration and screenshots anything else that seems wrong to you that can be problematic?
Configuration:
data node:
replicas: 28
heap_size: â10240mâ
storage: â200Giâ
resources:
limits:
cpu: â1â
memory: 12960Mi
requests:
cpu: 500m
memory: 10240Mi
max shards per node: 2000
Relevant Logs or Screenshots:
- Usually it is, just because more data typically means queries on these shards are more expensive, so youâll have imbalanced load.
- Itâs because OpenSearch will (by default) memory map the indices. So if indices are more than your RAM, youâd normally see this behavior and itâs OK, itâs just the OS using RAM as caches.
- The difference in shard sizes isnât a problem in itself, itâs just that:
- if shards of different sizes get allocated to different nodes (see 1), you typically have imbalanced load
- if you have tons of shards in total (youâre not there, but you seem to be getting there), the cluster will have to deal with a big cluster state, which will slow down non-data operations such as adding an index or recovering from a full cluster restart. I would try to keep the number of shards under 10K if possible, as a rule of thumb
- Yes, OpenSearch will likely need more âreservedâ memory than your heap size. So if you request 10GB and allocate 10GB as heap, youâre not likely to bump into the 12GB limit, but you might exceed the 10GB you requested and maybe the node wonât have it.
If youâre looking for other tunables, especially for logs and other time-series data, I think youâll find this presentation useful (oldie but hopefully goldie): Elasticsearch for logs and metrics: A deep dive â Velocity 2016, OâREILLY CONFERENCES - YouTube
Thank you for the thorough answer with helpful insights. I have a few follow up questions.
As I learn more and more about how OpenSearch works and the current cluster, I found out that the sharding strategy applied right now is 1:1 with 1 replica. So these almost 26k shards represent 13k indices with their replicas.
-
Lets say I try to keep the shards under 10k (which would in our case mean 5k indices + replicas), that would mean that the incoming data (somewhere in Filebeat or Logstash) should be restructured if I am understanding correctly?
-
Or if we abide by the 10k max shards rule, should we just include another OpenSearch instance?
-
The number of shards as I understand reflects on the number of requests the allocation system makes and that in turn reflects on the RAM usage of the client nodes, correct?
Youâre welcome! To answer your follow-up questions:
- You donât have to restructure your data, itâs just that youâll want to use Index State Management to roll your indices by size. Thereâs some info in the video above about that, but since you say youâre using Kubernetes, maybe this post (thereâs a video there, too) will be more useful: Autoscaling Elasticsearch for Logs with a Kubernetes Operator
The main idea there is to spread your indices (those that have significant traffic) to all your nodes. Which also implies that if you can have fewer&bigger nodes it will be easier to manage than having many small nodes.
-
You can also have multiple clusters if that works for your use-case (e.g. different âclassesâ of logs are searched separately). Judging by the total size, I donât think thatâs something you have to do in order to stay under 10K (youâll likely be able to achieve that just by rotating indices by size). But if you can afford to do that, it might make things easier to manage, more independent. And certainly easier to scale.
-
There is a RAM (heap) overhead for each shard. And - given the same merge policy - more shards will imply more segments, so data will be less âcompactedâ, occupying more disk space, cache memory and it will be slower to search (just the full-text search part, the aggregation performance shouldnât suffer). So I think the short answer is âno, not significantlyâ, but I usually recommend limiting the number of shards because:
A) Once you have too many, the cluster will become unstable (e.g. some master-related operations will timeout because the master may be too slow to replicate the cluster state changes).
B) Itâs easier to balance the number of shards across your data nodes if you rotate indices by size. And really the balance of load across your data nodes is - in my experience - the biggest factor when it comes to indexing&query performance.
Thank you!
From all the information I was able to gather, I devised the following strategy.
- About a 1000 containers sending data towards Filebeat.
- Logstash then filters the data by time and creates monthly index for each container. Main reason being that huge portion of those indices contain small amounts of data, between 5-10mb and less than 1gb for an entire month.
- OpenSearch then having a 1:1 sharding strategy. Having single shards for the majority of indices would help with performance.
Result: ~2000 shards.
However, shards would still be very unbalanced because of the portion of indices sending up to 450GB of data per month. Not to mention the CPU limits.
Is it possible to apply a policy to all indices that would say something like:
- if index goes over 10gb â rollover (so that large indices are broken down basically to more shards automatically)
- and if data in index older than 7 days â delete data
If possible, how would it look like?
Yes, itâs possible. You can rollover on min_age
and min_primary_shard_size
, and whichever comes first will apply.
For indices that are very large, you can spread them across your cluster. But this is problematic with 28 nodes, because youâd need 14 shards to be balanced. And if you have e.g. 14GB/day, it means youâd rotate after 10 days in order to get 10GB per shard. Thatâs too much, IMO, so you can either rotate at 5GB per shard, or live with some unbalance. Or maybe you can have fewer bigger nodes, then it will be easier to balance (youâd need fewer shards).
1 Like
MulČumesc for all of the suggestions!