Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch 2.14
Describe the issue:
We has two Opensearch cluster (prod-1 and prod-2) with a lot of indices. Most of indices created on daily pattern (i.e. index_name-YYYY-MM-DD) and watched by ISM policy
{
"_id": "delete_policy",
"_seq_no": 917888812,
"_primary_term": 11,
"policy": {
"policy_id": "delete_policy",
"description": "Delete policy",
"last_updated_time": 1745321459747,
"schema_version": 21,
"error_notification": null,
"default_state": "hot",
"states": [
{
"name": "hot",
"actions": [],
"transitions": [
{
"state_name": "delete",
"conditions": {
"min_index_age": "718h"
}
}
]
},
{
"name": "delete",
"actions": [
{
"retry": {
"count": 3,
"backoff": "exponential",
"delay": "1m"
},
"delete": {}
}
],
"transitions": []
}
],
"ism_template": [
{
"index_patterns": [
"*"
],
"priority": 0,
"last_updated_time": 1745321459747
}
]
}
}
For some index pattern we create new rollover policy and index template
Policy:
{
"_id": "ocp-prod-int-prod",
"_seq_no": 917886289,
"_primary_term": 11,
"policy": {
"policy_id": "ocp-prod-int-prod",
"description": "Rollover policy for ocp-prod-int-prod",
"last_updated_time": 1745321440295,
"schema_version": 21,
"error_notification": null,
"default_state": "rollover",
"states": [
{
"name": "rollover",
"actions": [
{
"retry": {
"count": 3,
"backoff": "exponential",
"delay": "1m"
},
"rollover": {
"min_primary_shard_size": "15gb",
"copy_alias": false
}
}
],
"transitions": [
{
"state_name": "delete",
"conditions": {
"min_rollover_age": "30d"
}
}
]
},
{
"name": "delete",
"actions": [
{
"retry": {
"count": 3,
"backoff": "exponential",
"delay": "1m"
},
"delete": {}
}
],
"transitions": []
}
],
"ism_template": [
{
"index_patterns": [
"ocp-prod-int-prod-*"
],
"priority": 10,
"last_updated_time": 1745321440295
}
]
}
}
Template:
{
"name": "ocp-prod-int-prod",
"index_template": {
"index_patterns": [
"ocp-prod-int-prod*"
],
"template": {
"settings": {
"index": {
"number_of_shards": "5",
"opendistro": {
"index_state_management": {
"policy_id": "ocp-prod-int-prod",
"rollover_alias": "ocp-prod-int-prod"
}
},
"plugins": {
"index_state_management": {
"policy_id": "ocp-prod-int-prod",
"rollover_alias": "ocp-prod-int-prod"
}
}
}
}
},
"composed_of": [],
"priority": 5,
"version": 1
}
}
]
}
Indexes and alias created and start correctly.
But, after few rollover circles newest created indexes get wrong ISM policy (delete_policy) and just grow in size without rollover. After manual ISM changed to correct policy (ocp-prod-int-prod) its rollover to new indexes, then after few rollovers its happen again.
For indexes with wrong applied policy we seen difference in index settings
"number_of_shards": "5",
"plugins": {
"index_state_management": {
"policy_id": "ocp-prod-int-prod",
"rollover_alias": "ocp-prod-int-prod",
"auto_manage": "false"
}
}
and ISM explain:
{
"ocp-prod-int-prod-000008": {
"index.plugins.index_state_management.policy_id": "delete_policy",
"index.opendistro.index_state_management.policy_id": "delete_policy",
"index": "ocp-prod-int-prod-000008",
"index_uuid": "0iAeiA9RTzuCwLmvfNzVCw",
"policy_id": "delete_policy",
"policy_seq_no": 874381947,
"policy_primary_term": 11,
"index_creation_date": 1745059907830,
"state": {
"name": "hot",
"start_time": 1745223088184
}
As you can see, in index settings there correct policy_id from index template (and rollover_alias), but ISM matched index as “delete_policy”
index pri.store.size pri
ocp-prod-int-prod-000018 38.6gb 5
ocp-prod-int-prod-000005 75.4gb 5
ocp-prod-int-prod-000004 75.5gb 5
ocp-prod-int-prod-000014 75.6gb 5
ocp-prod-int-prod-000006 75.7gb 5
ocp-prod-int-prod-000017 75.8gb 5
ocp-prod-int-prod-000002 76gb 5
ocp-prod-int-prod-000012 76.1gb 5
ocp-prod-int-prod-000003 76.4gb 5
ocp-prod-int-prod-000011 76.8gb 5
ocp-prod-int-prod-000016 77.5gb 5
ocp-prod-int-prod-000007 78gb 5
ocp-prod-int-prod-000009 78.1gb 5
ocp-prod-int-prod-000010 112.8gb 5
ocp-prod-int-prod-000013 115.5gb 5
ocp-prod-int-prod-000001 121.3gb 5
ocp-prod-int-prod-000015 169.5gb 5
ocp-prod-int-prod-000008 792.6gb 5
As you can see its happen with indexes 000001, 000008, 000010, 000013, 000015
on cluster prod-2
On cluster prod-1 its happen only once, with indice 000004:
index pri.store.size pri
ocp-prod-int-prod-000016 59.3gb 5
ocp-prod-int-prod-000002 75.3gb 5
ocp-prod-int-prod-000011 75.8gb 5
ocp-prod-int-prod-000007 75.9gb 5
ocp-prod-int-prod-000001 76.3gb 5
ocp-prod-int-prod-000010 76.4gb 5
ocp-prod-int-prod-000013 76.4gb 5
ocp-prod-int-prod-000012 76.5gb 5
ocp-prod-int-prod-000009 77.2gb 5
ocp-prod-int-prod-000015 77.3gb 5
ocp-prod-int-prod-000006 77.5gb 5
ocp-prod-int-prod-000003 77.6gb 5
ocp-prod-int-prod-000014 77.7gb 5
ocp-prod-int-prod-000008 77.7gb 5
ocp-prod-int-prod-000005 78.1gb 5
ocp-prod-int-prod-000004 843.8gb 5
We are double checked configs, master node stats, etc and not have any clue. Why its happen and why so random?
In logs its only
2025-04-22 02:50:52
[2025-04-21T23:50:52,103][INFO ][o.o.i.i.ManagedIndexCoordinator] [prod-cluster-opensearch-cluster-masters-1] Index [ocp-prod-int-prod-000015] matched ISM policy template and will be managed by delete_policy
Why ISM periodically match indice with delete_policy, if in template and index settings we got
"plugins": {
"index_state_management": {
"policy_id": "ocp-prod-int-prod"
and ocp-prod-int-prod policy has more priority (10 against 0)