Search and Bulk Delete Operation

Hi everyone.

I am trying to delete old documents in the index. The way I am achieving this is through Search and Bulk Delete operation. Suppose there are 50K documents in my index that I want to delete. I want to do this in small batches of 10K (for other reasons). So if I do this in a recursive loop (delete 10K documents in a loop that runs 5 times), I have noticed that in the first loop even though the Bulk delete operation returns success (deletes the documents), in the second loop when I search for more documents to delete, I get the same documents again but If I try to delete them the API returns 404 not found. It means the documents were actually deleted in the first loop. So the problem is that the search API should not return the documents that were deleted. For example:
Loop-1: Search return document with id = 1221
Loop-1: Bulk delete operation deletes the document with id = 1221 successfully.
Loop-2: Search returns the same document with id = 1221
Loop-2: Bulk delete operation return 404 document with id = 1221 not found.

Does anyone know how to get around this problem OR if I am doing something wrong. I tried adding some delay but that didn’t work too.

I am using OpenSearch.Client (v1.7.0) for dotnet.

you can try with Delete by query API | Elasticsearch Guide [7.10] | Elastic, that should works for you.

You need to refresh the index after the delete operation or set refresh to true when calling the delete API, but just as @hailong said, delete_by_query API is the recommended way to delete some documents in the index, you can set wait_for_completion to false, the API will return a task ID immediately, you can check the progress of the whole operation by calling the tasks API: GET _tasks/{taskId}.

1 Like