Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
2.14.0
Describe the issue:
We have Logstash creating daily indices of logs collected. I used curl to hit _cat/indices/$INDEX
twice in about 30 seconds. The first time it reported docs.count - 24347115
and docs.deleted - 562865
. The second time it reported docs.count - 24357773
and docs.deleted - 560369
.
How is that possible?
I was initially investigating the fact that we have such a high number of deleted documents for an index that should only ever be added to. I am misunderstanding what docs.deleted
means?
Actually, both OpenSearch and Elasticsearch cannot store deleted documents forever. When you try to delete a document, the cluster just checks whether it is to be searched or not because there is no need to merge segments in real time. Lucene uses Bitsets data structure for marking deletion.
(The tasks for merging segments need quite high CPU usage and Disk I/O. But you can still test it using _forcemerge API)
Data nodes periodically merge segments in background for optimizing the performance.
At the same time, deleted documents won’t be moved to new segments. (JUST alive documents will be)
Are you saying that docs.deleted
means the number of documents marked for deletion, not the number of documents actually deleted?
1 Like
yes, until each data node finish merging its segments. At the time old segments are combined into new one, you can say that the documents marked as to be deleted are completely passed away.
If that’s the case, then why does calling _forcemerge
on an index with the value of docs.deleted
in the hundreds of thousands not have any effect on that value?
I can guarantee that nothing is being written to that index.
I can also guarantee that there is no way hundreds of thousands of documents were deleted from that index in the first place.
Can you show the number and the size of indices?
Also, your indices might be read-only after a day, right? (as you explained Logstash collects logs daily.) The reason why I ask you whether the indices are read-only or not is that making force-merge an index receiving writes can cause very large segments to be produced and make snapshots more expensive than before. (Force merge API | Elasticsearch Guide [8.15] | Elastic)
When you try calling _forcemerge API, i recommend you to attach wait_for_completion=false query parameter(asynchronously) so the task can still be running in the background, not falling into a connection loss.
TieredMergePolicy which is allowed to merge non-adjacent segments (sorted by size) is a default merging policy in Lucene.
In my opinion(and i’m not sure), force-merge API doesn’t always allow us to delete documents.
As you see the below graph(Changing Bits: Visualizing Lucene's segment merges), the dark-grey band on top of each segment bar increases i.e the proportion of deletions in a segment is incrementing until TieredMergePolicy determines an optimizing merge based on “budget”.
TieredMergePolicy first computes the allowed “budget” of how many segments should be in the index, by counting how many steps the “perfect logarithmic staircase” would require given total index size, minimum segment size (floored), mergeAtOnce, and a new configuration maxSegmentsPerTier that lets you set the allowed width (number of segments) of each stair in the staircase. This is nice because it decouples how many segments to merge at a time from how wide the staircase can be.