Delete documents older than 30 days

Hello,

I want to implement a policy that deletes documents that are older than 30 days. From the ISM docs Index State Management - OpenSearch documentation it seems it only supports deleting entire indices. Is there an automatic way to only delete old documents?

Thanks in advance!

1 Like

Hi @guang
There is no automatic way to delete old documents in elasticsearch(ilm) or opendistro(ism) so far.
In elasticsearch, delete index is more effective than delete some documents.
Deleting too many documents might cause some problems.
It might affect the query performance. Check this
If you still want to implement by yourself, you can try delete_by_query api.
But planning your index to rollover by day / week / month and deleting index by 30 days would be a better way.

1 Like

Hi @mudboyzh
My current use case for deleting old documents are only for application logs. So the only query done would be interactive but non-time sensitive. I will give delete_by_query a try and see if over time there are performance degradations. Will switch over to rollover, if needed.
Thank you for your help!

Hi @mudboyzh ,
But if i will delete the index, what is the recreation index procedure with the same name?
I mean can be this process be automatically ( deleting the old index and creating new one).

If you add your documents to an index name that has the date in it, e.g. myindex-2021.12.19 then use index templates on an index pattern of myindex-* to automatically create indices when new documents get added then this becomes simple. Just when you query use the index pattern myindex-* too.

If you are using logstash to ingest the data you can specify the index name like index => "myindex-%{+YYYY.MM.dd}" to automatically generate the correct index name

You could index by day or week, depending on how many documents you have as you don’t want to have too many indicies in your cluster.

1 Like

Hi @AndreyB ,
I not pretty sure your case, but I usually use rollover to do that.
It allows you to access by alias, something like custom-index.
You can insert or search data by alias.
The real index name is like custom-index-000001 / custom-index-000002
Using index management policy to decide when to rollover to next index.

Alias can be mapping to many indices, so you can search multiple indices with one alias.
But there is only one index can be write if you use alias to access.
That depends on which index has the is_write_index
Just like the example of index management.
image
Once you set rollover policy and initial the first index like the example.
Index management will automatically to create new index and pass is_write_index to the new index for your alias if the policy condition is matched.
And you can set the delete state in policy to delete the oldest index.

If you case is really need to keep only one index.
There is no automatically way so far.
You can implement by yourself.
I have some case is synchronizing data from RDB per month.
There is a script to can check if the new index has finished data synchronization.
Until the data synchronization is complete, the script will use Alias API to pass the alias to new index.

POST /_aliases
{
  "actions": [
    {
      "remove": {
        "index": "{{old-index-name}}",
        "alias": "custom-index"
      }
    },
    {
      "add": {
        "index": "{{new-index-name}}",
        "alias": "custom-index"
      }
    }
  ]
}

Alias API is atomic operation, that is important.
It made the application which can always use the latest data without downtime.

1 Like