Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch v2.4.1
Describe the issue:
ISM fails to finish an Index Rollover even though the condition is met(“min_primary_shard_size” : “32gb”). It returns an “Action timed out” error and the step_status is “condition_not_met”. I had to manually click “retry policy” in the dashboard to trigger “Evaluating transition conditions”. Then that index can finally roll over. I am not sure if this action time-out event happens before shards reach 32g and ISM stops evaluating there. GET _plugins/_ism/explain
doesn’t provide me enough info for troubleshooting
Here are the shards. I noticed this timeout when container-log-000001 reach 52g and manually trigger a retry rollover to container-log-000002. But now it never roll over to 000003
GET _cat/shards/container-log
container-log-000001 2 p STARTED 202966859 52.7gb 10.42.8.56 nodes-15
container-log-000001 2 r STARTED 202966859 52.7gb 10.42.17.45 nodes-10
container-log-000001 1 p STARTED 202965843 52.7gb 10.42.6.31 nodes-13
container-log-000001 1 r STARTED 202965843 52.7gb 10.42.12.148 nodes-11
container-log-000001 0 r STARTED 202993646 52.7gb 10.42.4.153 nodes-6
container-log-000001 0 p STARTED 202993646 52.7gb 10.42.13.42 nodes-7
container-log-000002 2 p STARTED 206571388 38.2gb 10.42.7.158 nodes-3
container-log-000002 2 r STARTED 206582267 36.9gb 10.42.9.33 nodes-5
container-log-000002 1 p STARTED 206611403 37gb 10.42.14.38 nodes-12
container-log-000002 1 r STARTED 206611403 36.9gb 10.42.10.35 nodes-8
container-log-000002 0 r STARTED 206594416 36.9gb 10.42.5.31 nodes-4
container-log-000002 0 p STARTED 206610735 38.2gb 10.42.17.45 nodes-10
Configuration:
Below is my policy
"policies" : [
{
"_id" : "container_log_policy",
"_seq_no" : 1,
"_primary_term" : 1,
"policy" : {
"policy_id" : "container_log_policy",
"description" : "A default policy for container log",
"last_updated_time" : 1673373909185,
"schema_version" : 17,
"error_notification" : null,
"default_state" : "hot",
"states" : [
{
"name" : "hot",
"actions" : [
{
"timeout" : "1h",
"retry" : {
"count" : 3,
"backoff" : "constant",
"delay" : "1h"
},
"rollover" : {
"min_index_age" : "30d",
"min_primary_shard_size" : "32gb"
}
}
],
"transitions" : [
{
"state_name" : "warm",
"conditions" : {
"min_index_age" : "30d"
}
}
]
},
{
"name" : "warm",
"actions" : [
{
"timeout" : "1d",
"retry" : {
"count" : 3,
"backoff" : "constant",
"delay" : "1h"
},
"force_merge" : {
"max_num_segments" : 1
}
}
],
"transitions" : [
{
"state_name" : "snapshot",
"conditions" : {
"min_index_age" : "7d"
}
}
]
},
{
"name" : "snapshot",
"actions" : [
{
"timeout" : "1d",
"retry" : {
"count" : 3,
"backoff" : "constant",
"delay" : "1h"
},
"snapshot" : {
"repository" : "s3_repository",
"snapshot" : "container-log"
}
}
],
"transitions" : [
{
"state_name" : "delete",
"conditions" : {
"min_index_age" : "540d"
}
}
]
},
{
"name" : "delete",
"actions" : [
{
"timeout" : "1h",
"retry" : {
"count" : 3,
"backoff" : "constant",
"delay" : "1h"
},
"delete" : { }
}
],
"transitions" : [ ]
}
],
"ism_template" : [
{
"index_patterns" : [
"*container*"
],
"priority" : 20,
"last_updated_time" : 1673373909185
}
]
}
},
]
Relevant Logs or Screenshots:
GET _plugins/_ism/explain
"container-log-000002" : {
"index.plugins.index_state_management.policy_id" : "container_log_policy",
"index.opendistro.index_state_management.policy_id" : "container_log_policy",
"index" : "container-log-000002",
"index_uuid" : "CZo4H3bZQfWmKEyirxnAow",
"policy_id" : "container_log_policy",
"policy_seq_no" : -2,
"policy_primary_term" : 0,
"rolled_over" : false,
"index_creation_date" : 1673456547419,
"state" : {
"name" : "hot",
"start_time" : 1673456940996
},
"action" : {
"name" : "rollover",
"start_time" : 1673457258959,
"index" : 0,
"failed" : true,
"consumed_retries" : 0,
"last_retry_time" : 0
},
"step" : {
"name" : "attempt_rollover",
"start_time" : 1673457258959,
"step_status" : "condition_not_met"
},
"retry_info" : {
"failed" : false,
"consumed_retries" : 0
},
"info" : {
"message" : "Action timed out"
},
"enabled" : false
}
I don’t see any event about 000003 in ISM history
GET .opendistro-ism-managed-index-history*/_search
{
"query": {
"match_all": {}
}
}
{
"_index" : ".opendistro-ism-managed-index-history-2023.01.10-1",
"_id" : "iVHOoYUB6ArgHadVeg0z",
"_score" : 1.0,
"_source" : {
"managed_index_meta_data" : {
"index" : "container-log-000002",
"index_uuid" : "CZo4H3bZQfWmKEyirxnAow",
"policy_id" : "container_log_policy",
"policy_seq_no" : -2,
"policy_primary_term" : 0,
"index_creation_date" : 1673456547419,
"state" : {
"name" : "hot",
"start_time" : 1673456940996
},
"retry_info" : {
"failed" : false,
"consumed_retries" : 0
},
"info" : {
"message" : "Successfully initialized policy: container_log_policy"
},
"history_timestamp" : 1673456941619
}
}
}