ISM policy not always correctly applied after index rollover

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

OpenSearch 2.14

Describe the issue:

We has two Opensearch cluster (prod-1 and prod-2) with a lot of indices. Most of indices created on daily pattern (i.e. index_name-YYYY-MM-DD) and watched by ISM policy

    {
      "_id": "delete_policy",
      "_seq_no": 917888812,
      "_primary_term": 11,
      "policy": {
        "policy_id": "delete_policy",
        "description": "Delete policy",
        "last_updated_time": 1745321459747,
        "schema_version": 21,
        "error_notification": null,
        "default_state": "hot",
        "states": [
          {
            "name": "hot",
            "actions": [],
            "transitions": [
              {
                "state_name": "delete",
                "conditions": {
                  "min_index_age": "718h"
                }
              }
            ]
          },
          {
            "name": "delete",
            "actions": [
              {
                "retry": {
                  "count": 3,
                  "backoff": "exponential",
                  "delay": "1m"
                },
                "delete": {}
              }
            ],
            "transitions": []
          }
        ],
        "ism_template": [
          {
            "index_patterns": [
              "*"
            ],
            "priority": 0,
            "last_updated_time": 1745321459747
          }
        ]
      }
    }

For some index pattern we create new rollover policy and index template
Policy:

    {
      "_id": "ocp-prod-int-prod",
      "_seq_no": 917886289,
      "_primary_term": 11,
      "policy": {
        "policy_id": "ocp-prod-int-prod",
        "description": "Rollover policy for ocp-prod-int-prod",
        "last_updated_time": 1745321440295,
        "schema_version": 21,
        "error_notification": null,
        "default_state": "rollover",
        "states": [
          {
            "name": "rollover",
            "actions": [
              {
                "retry": {
                  "count": 3,
                  "backoff": "exponential",
                  "delay": "1m"
                },
                "rollover": {
                  "min_primary_shard_size": "15gb",
                  "copy_alias": false
                }
              }
            ],
            "transitions": [
              {
                "state_name": "delete",
                "conditions": {
                  "min_rollover_age": "30d"
                }
              }
            ]
          },
          {
            "name": "delete",
            "actions": [
              {
                "retry": {
                  "count": 3,
                  "backoff": "exponential",
                  "delay": "1m"
                },
                "delete": {}
              }
            ],
            "transitions": []
          }
        ],
        "ism_template": [
          {
            "index_patterns": [
              "ocp-prod-int-prod-*"
            ],
            "priority": 10,
            "last_updated_time": 1745321440295
          }
        ]
      }
    }

Template:

    {
      "name": "ocp-prod-int-prod",
      "index_template": {
        "index_patterns": [
          "ocp-prod-int-prod*"
        ],
        "template": {
          "settings": {
            "index": {
              "number_of_shards": "5",
              "opendistro": {
                "index_state_management": {
                  "policy_id": "ocp-prod-int-prod",
                  "rollover_alias": "ocp-prod-int-prod"
                }
              },
              "plugins": {
                "index_state_management": {
                  "policy_id": "ocp-prod-int-prod",
                  "rollover_alias": "ocp-prod-int-prod"
                }
              }
            }
          }
        },
        "composed_of": [],
        "priority": 5,
        "version": 1
      }
    }
  ]
}

Indexes and alias created and start correctly.

But, after few rollover circles newest created indexes get wrong ISM policy (delete_policy) and just grow in size without rollover. After manual ISM changed to correct policy (ocp-prod-int-prod) its rollover to new indexes, then after few rollovers its happen again.
For indexes with wrong applied policy we seen difference in index settings

        "number_of_shards": "5",
        "plugins": {
          "index_state_management": {
            "policy_id": "ocp-prod-int-prod",
            "rollover_alias": "ocp-prod-int-prod",
            "auto_manage": "false"
          }
        }

and ISM explain:

{
  "ocp-prod-int-prod-000008": {
    "index.plugins.index_state_management.policy_id": "delete_policy",
    "index.opendistro.index_state_management.policy_id": "delete_policy",
    "index": "ocp-prod-int-prod-000008",
    "index_uuid": "0iAeiA9RTzuCwLmvfNzVCw",
    "policy_id": "delete_policy",
    "policy_seq_no": 874381947,
    "policy_primary_term": 11,
    "index_creation_date": 1745059907830,
    "state": {
      "name": "hot",
      "start_time": 1745223088184
    }

As you can see, in index settings there correct policy_id from index template (and rollover_alias), but ISM matched index as “delete_policy”

index                    pri.store.size pri
ocp-prod-int-prod-000018         38.6gb   5
ocp-prod-int-prod-000005         75.4gb   5
ocp-prod-int-prod-000004         75.5gb   5
ocp-prod-int-prod-000014         75.6gb   5
ocp-prod-int-prod-000006         75.7gb   5
ocp-prod-int-prod-000017         75.8gb   5
ocp-prod-int-prod-000002           76gb   5
ocp-prod-int-prod-000012         76.1gb   5
ocp-prod-int-prod-000003         76.4gb   5
ocp-prod-int-prod-000011         76.8gb   5
ocp-prod-int-prod-000016         77.5gb   5
ocp-prod-int-prod-000007           78gb   5
ocp-prod-int-prod-000009         78.1gb   5
ocp-prod-int-prod-000010        112.8gb   5
ocp-prod-int-prod-000013        115.5gb   5
ocp-prod-int-prod-000001        121.3gb   5
ocp-prod-int-prod-000015        169.5gb   5
ocp-prod-int-prod-000008        792.6gb   5

As you can see its happen with indexes 000001, 000008, 000010, 000013, 000015
on cluster prod-2

On cluster prod-1 its happen only once, with indice 000004:

index                    pri.store.size pri
ocp-prod-int-prod-000016         59.3gb   5
ocp-prod-int-prod-000002         75.3gb   5
ocp-prod-int-prod-000011         75.8gb   5
ocp-prod-int-prod-000007         75.9gb   5
ocp-prod-int-prod-000001         76.3gb   5
ocp-prod-int-prod-000010         76.4gb   5
ocp-prod-int-prod-000013         76.4gb   5
ocp-prod-int-prod-000012         76.5gb   5
ocp-prod-int-prod-000009         77.2gb   5
ocp-prod-int-prod-000015         77.3gb   5
ocp-prod-int-prod-000006         77.5gb   5
ocp-prod-int-prod-000003         77.6gb   5
ocp-prod-int-prod-000014         77.7gb   5
ocp-prod-int-prod-000008         77.7gb   5
ocp-prod-int-prod-000005         78.1gb   5
ocp-prod-int-prod-000004        843.8gb   5

We are double checked configs, master node stats, etc and not have any clue. Why its happen and why so random?
In logs its only

2025-04-22 02:50:52	
[2025-04-21T23:50:52,103][INFO ][o.o.i.i.ManagedIndexCoordinator] [prod-cluster-opensearch-cluster-masters-1] Index [ocp-prod-int-prod-000015] matched ISM policy template and will be managed by delete_policy

Why ISM periodically match indice with delete_policy, if in template and index settings we got

        "plugins": {
          "index_state_management": {
            "policy_id": "ocp-prod-int-prod"

and ocp-prod-int-prod policy has more priority (10 against 0)

@Oldo How many nodes do you have in your cluster? Do you see any errors in OpenSearch logs regarding ISM policy execution?
If you noticed any ISM errors, were they located on the same node?

@pablo
Two clusters, each 49 nodes (40 data, 3 masters, rest is coordinators)

There is no any ISM errors. Rollover with wrong ISM policy in logs:

2025-04-22 02:50:48	
[2025-04-21T23:50:48,205][INFO ][o.o.p.PluginsService     ] [prod-cluster-opensearch-cluster-masters-1] PluginService:onIndexModule index:[ocp-prod-int-prod-000015/ViLj75CJTC2QRinHu2y_BA]Show context
2025-04-22 02:50:48	
[2025-04-21T23:50:48,206][INFO ][o.o.c.m.MetadataCreateIndexService] [prod-cluster-opensearch-cluster-masters-1] [ocp-prod-int-prod-000015] creating index, cause [rollover_index], templates [ocp-prod-int-prod], shards [5]/[1]
2025-04-22 02:50:50	
[2025-04-21T23:50:50,546][INFO ][o.o.p.PluginsService     ] [prod-cluster-opensearch-cluster-data-6] PluginService:onIndexModule index:[ocp-prod-int-prod-000015/ViLj75CJTC2QRinHu2y_BA]
2025-04-22 02:50:50	
[2025-04-21T23:50:50,554][INFO ][o.o.p.PluginsService     ] [prod-cluster-opensearch-cluster-data-22] PluginService:onIndexModule index:[ocp-prod-int-prod-000015/ViLj75CJTC2QRinHu2y_BA]
2025-04-22 02:50:50	
[2025-04-21T23:50:50,561][INFO ][o.o.p.PluginsService     ] [prod-cluster-opensearch-cluster-data-1] PluginService:onIndexModule index:[ocp-prod-int-prod-000015/ViLj75CJTC2QRinHu2y_BA]
2025-04-22 02:50:50	
[2025-04-21T23:50:50,561][INFO ][o.o.p.PluginsService     ] [prod-cluster-opensearch-cluster-data-0] PluginService:onIndexModule index:[ocp-prod-int-prod-000015/ViLj75CJTC2QRinHu2y_BA]
2025-04-22 02:50:50	
[2025-04-21T23:50:50,573][INFO ][o.o.p.PluginsService     ] [prod-cluster-opensearch-cluster-data-33] PluginService:onIndexModule index:[ocp-prod-int-prod-000015/ViLj75CJTC2QRinHu2y_BA]
2025-04-22 02:50:52	
[2025-04-21T23:50:52,103][INFO ][o.o.i.i.ManagedIndexCoordinator] [prod-cluster-opensearch-cluster-masters-1] Index [ocp-prod-int-prod-000015] matched ISM policy template and will be managed by delete_policy
2025-04-22 02:50:52	
[2025-04-21T23:50:52,125][INFO ][o.o.j.s.JobScheduler     ] [prod-cluster-opensearch-cluster-data-17] Will delay 10153 miliseconds for next execution of job ocp-prod-int-prod-000015

Logs with correct policy applied:

2025-04-22 12:28:56	
[2025-04-22T09:28:56,087][INFO ][o.o.p.PluginsService     ] [prod-cluster-opensearch-cluster-masters-1] PluginService:onIndexModule index:[ocp-prod-int-prod-000017/EVL0ptf_S86PK2CycnaiaA]
2025-04-22 12:28:56	
[2025-04-22T09:28:56,088][INFO ][o.o.c.m.MetadataCreateIndexService] [prod-cluster-opensearch-cluster-masters-1] [ocp-prod-int-prod-000017] creating index, cause [rollover_index], templates [ocp-prod-int-prod-new], shards [5]/[1]
2025-04-22 12:28:57	
[2025-04-22T09:28:57,872][INFO ][o.o.p.PluginsService     ] [prod-cluster-opensearch-cluster-data-36] PluginService:onIndexModule index:[ocp-prod-int-prod-000017/EVL0ptf_S86PK2CycnaiaA]
2025-04-22 12:28:57	
[2025-04-22T09:28:57,888][INFO ][o.o.p.PluginsService     ] [prod-cluster-opensearch-cluster-data-24] PluginService:onIndexModule index:[ocp-prod-int-prod-000017/EVL0ptf_S86PK2CycnaiaA]
2025-04-22 12:28:57	
[2025-04-22T09:28:57,894][INFO ][o.o.p.PluginsService     ] [prod-cluster-opensearch-cluster-data-20] PluginService:onIndexModule index:[ocp-prod-int-prod-000017/EVL0ptf_S86PK2CycnaiaA]
2025-04-22 12:28:57	
[2025-04-22T09:28:57,951][INFO ][o.o.p.PluginsService     ] [prod-cluster-opensearch-cluster-data-31] PluginService:onIndexModule index:[ocp-prod-int-prod-000017/EVL0ptf_S86PK2CycnaiaA]
2025-04-22 12:28:57	
[2025-04-22T09:28:57,964][INFO ][o.o.p.PluginsService     ] [prod-cluster-opensearch-cluster-data-30] PluginService:onIndexModule index:[ocp-prod-int-prod-000017/EVL0ptf_S86PK2CycnaiaA]
2025-04-22 12:29:03	
[2025-04-22T09:29:03,060][INFO ][o.o.i.i.ManagedIndexCoordinator] [prod-cluster-opensearch-cluster-masters-1] Index [ocp-prod-int-prod-000017] matched ISM policy template and will be managed by ocp-prod-int-prod

@pablo
Can we use conditions inside ism_template, such as for delete_policy:

{
  "ism_template": [{
    "index_patterns": ["*"],  
    "priority": 0,
    "conditions": {
      "exclude": ["ocp-prod-int-prod-*"]  
    }
  }]
}

So delete_policy shouldn’t take precedent over ocp-prod-int-prod policy?

Your delete_policy is applied to all indices, *. Your ocp-prod-int-prod policy is applied to ocp-prod-int-prod-*.
I bet the 2 index patterns are fighting. Make your delete_policy pattern more specific, to avoid the overlap.

We are found root cause.
It’s opensearch-k8s-operator. delete_policy created via opensearch API when ocp-prod-int-prod creared via CRD
Due this bug (Fix ISM policy reconcile condition by evheniyt · Pull Request #972 · opensearch-project/opensearch-k8s-operator · GitHub) operator periodically validate policy with request to wrong key in CRD, which escalate policy update. Update change policy creation timestamp. Policy applied to only indices which has oldest creation date (compare to policy creation time).
This is race condition, and why sometime delete_policy applied. When we move delete_policy to operator CRD, we take index without any policy upplied.
At this time we move all policies to opensearch API, none in CRD and all work as intended.