Folks,
I have two indexes A and B;
Index A has documents from 2022 and index B has documents from 2023.
The problem is that in my logstash load, the query takes the last 10 days, that is, at the beginning of index B, some documents were duplicated in index A and index B.
My solution would be to identify which documents I have in index A that are also in index B and delete them, so that there is only one record for each document.
Documents have an ID, making it “easier” to find them.
But how to do it?
Look up delete by query, document id’s are unique iirc so the duplicated docs will have unique ids.
You would need to find some other type of identifier for the delete by query then run that against the index you want to remove docs from.