ISM policy fails on StateMetaData

dxturner · May 29, 2024, 2:30pm

Opensearch 2.11.1

We have an ISM policy for indices matching “network--rollover-” that rolls them over after they are 1 hour old and have a primary shard size of at least 1 GB. The indices are kept for 1 day before being deleted.

This succeeds for two of the three matching indices, but fails on the third index with the error message:
Failed to find state=StateMetaData(name=rollover, startTime=1716488483625) in policy=network

My guess is that this is somehow related to the startTime on the index but can’t imagine how/why this is failing.

I do see this SSL error in the logs that was recorded at around that startTime ( 1716488483625 = Thursday, May 23, 2024 6:21:23.625 PM )

[2024-05-23T18:16:52,824][ERROR][o.o.s.s.h.n.SecuritySSLNettyHttpServerTransport] [flow-app] Exception during establishing a SSL connection: java.net.SocketException: Connection reset
java.net.SocketException: Connection reset
        at sun.nio.ch.SocketChannelImpl.throwConnectionReset(SocketChannelImpl.java:394) ~[?:?]
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:426) ~[?:?]
        at org.opensearch.transport.CopyBytesSocketChannel.readFromSocketChannel(CopyBytesSocketChannel.java:156) ~[transport-netty4-client-2.11.1.jar:2.11.1]
        at org.opensearch.transport.CopyBytesSocketChannel.doReadBytes(CopyBytesSocketChannel.java:141) ~[transport-netty4-client-2.11.1.jar:2.11.1]
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:151) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) [netty-common-4.1.100.Final.jar:4.1.100.Final]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.100.Final.jar:4.1.100.Final]
        at java.lang.Thread.run(Thread.java:833) [?:?]
[2024-05-23T18:17:27,955][INFO ][o.o.p.PluginsService     ] [flow-app] PluginService:onIndexModule index:[network-FCODEX-2.3-rollover-000001/hJ8hRwvATWWxfSJBjbzemA]
[2024-05-23T18:17:28,081][INFO ][o.o.c.m.MetadataMappingService] [flow-app] [network-FCODEX-2.3-rollover-000001/hJ8hRwvATWWxfSJBjbzemA] update_mapping [_doc]
[2024-05-23T18:17:28,213][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [flow-app] Detected cluster change event for destination migration
[2024-05-23T18:19:12,901][INFO ][o.o.j.s.JobSweeper       ] [flow-app] Running full sweep
[2024-05-23T18:20:41,765][INFO ][o.o.j.s.JobScheduler     ] [flow-app] Will delay 136366 miliseconds for next execution of job network-PCODEX-2.3-rollover-000001
[2024-05-23T18:20:42,165][INFO ][o.o.i.i.ManagedIndexRunner] [flow-app] Executing attempt_rollover for network-PCODEX-2.3-rollover-000001
[2024-05-23T18:20:42,170][INFO ][o.o.i.i.ManagedIndexRunner] [flow-app] Finished executing attempt_rollover for network-PCODEX-2.3-rollover-000001
[2024-05-23T18:21:06,649][INFO ][o.o.j.s.JobScheduler     ] [flow-app] Will delay 78082 miliseconds for next execution of job network-TLMTRY_FCODEX-2.3-rollover-000001
[2024-05-23T18:21:07,197][INFO ][o.o.i.i.ManagedIndexRunner] [flow-app] Executing attempt_rollover for network-TLMTRY_FCODEX-2.3-rollover-000001
[2024-05-23T18:21:07,200[INFO ][o.o.i.i.ManagedIndexRunner] [flow-app] Finished executing attempt_rollover for network-TLMTRY_FCODEX-2.3-rollover-000001
[2024-05-23T18:21:23,586][INFO ][o.o.j.s.JobScheduler     ] [flow-app] Will delay 1970 miliseconds for next execution of job network-FCODEX-2.3-rollover-000001
[2024-05-23T18:21:23,617][INFO ][o.o.j.s.JobScheduler     ] [flow-app] Descheduling jobId: hJ8hRwvATWWxfSJBjbzemA

Any help in translating the error message or restarting the policy so that it picks up the current indices would be appreciated.

pablo · May 30, 2024, 8:59pm

@dxturner Do yo get any other connectivity errors? Is the reported error only visible during rollover process?
How many nodes do you have in your cluster?
Is your cluster status always Green?

Topic		Replies	Views
ISM Index Rollover fails with an "action time out" error message OpenSearch	0	753	January 12, 2023
ISM policy not always correctly applied after index rollover Index Management index-management	6	46	April 29, 2025
ISM policy seems to stop managing certain indexes Index Management troubleshoot	1	450	January 1, 2024
SOLVED: Index stuck in ISM policy transition Index Management	1	61	March 17, 2025
When the disk is full, ISM policy (rollover and delete) will fail OpenSearch discuss	1	13	April 23, 2025

ISM policy fails on StateMetaData

Related topics