I want to have an ISM policy with ‘min_size’ condition for rollover.
The documentation for ‘min_size’ says “The minimum size of the total primary shard storage (not counting replicas) required to transition”.
So let’s say I have 5 shards in my index and 100Gb of min_size.
One possibility is having about 20Gb in each shard before rollover.
Another possibility, in a multi-tenant system for example, is that specific tenant data is routed into a specific shard and this tenant is much more noisy than all the others.
It means that the size of one shard can become 80Gb, while the size of all other 4 shards is only 5Gb each.
I don’t want to have 80Gb in a single shard, and I want to keep it to max size of 30Gb.
Is there any recommendation to prevent such thing from happening?
Can I somehow cause the policy to rollover based on the size of the largest shard (maybe introduce the feature of ‘min_shard_size’ )?
one way is to create a separate index for each tenant, so rather than routing docs to individual shard you route to a index. you can still search docs across all tenants using regex if you follow some naming convention.
if you have large number of tenants then you can create a tenant field in each document and then allow elasticsearch to randomly allocate docs to any shard of your index. that way you will have a almost equal sized shards in a index.
I have over 1k tenants (and growing), therefore a separate index per tenant approach is not suitable for my use case. I’ve already started with that and then migrated to current approach of single index with multiple shards with routing.
The other approach of randomly distributed docs over the shards also doesn’t seem optimal. I fear it will hurt performance badly since every search will have to be performed over all shards rather than a single one.
In most cases, shards are more or less balanced. There are only some extreme cases when there is a noisy tenant.
Is there any other option? Do you think a ‘min_shard_size’ policy can be beneficial to this case and maybe to other people?
This PR added a “min_primary_shard_size” condition to rollover
“It will evaluate to true when there is a single primary shard in the index that has a greater size than the condition.”
It should be coming out in the next 1.3 release.
@dbbaughe , that’s awesome !!