Duplicate documents in different indexes

angelogabeira · January 3, 2023, 5:36pm

Folks,
I have two indexes A and B;
Index A has documents from 2022 and index B has documents from 2023.
The problem is that in my logstash load, the query takes the last 10 days, that is, at the beginning of index B, some documents were duplicated in index A and index B.
My solution would be to identify which documents I have in index A that are also in index B and delete them, so that there is only one record for each document.
Documents have an ID, making it “easier” to find them.
But how to do it?

jasonrojas · January 3, 2023, 7:56pm

Look up delete by query, document id’s are unique iirc so the duplicated docs will have unique ids.
You would need to find some other type of identifier for the delete by query then run that against the index you want to remove docs from.

Topic		Replies	Views
_cat/indices reports docs.deleted decreasing OpenSearch	6	261	September 10, 2024
Delete documents older than 30 days OpenDistro	5	12051	December 24, 2021
Comparing documents from two different indices OpenSearch	9	209	January 2, 2025
I have 2 indices, index_a and index_b. The 2 indices have documents with the almost the exact same template. index_b has some extra fields which was introduced as part of a new feature, but it also has all the fields already present in index_a. We have OpenSearch troubleshoot	1	505	August 31, 2023
Duplicate indexes are being created Open Source Elasticsearch and Kibana troubleshoot , configure	1	696	February 7, 2023

Duplicate documents in different indexes

Related topics