I have 2 indices, index_a and index_b. The 2 indices have documents with the almost the exact same template. index_b has some extra fields which was introduced as part of a new feature, but it also has all the fields already present in index_a. We have

asten · August 23, 2023, 7:19am

I have 2 indices, index_a and index_b.

The 2 indices have documents with the almost the exact same template. index_b has some extra fields which was introduced as part of a new feature, but it also has all the fields already present in index_a.

We have noticed that that index_b is missing about 2000 documents which are present in index_a. We found this out by using the _count API.

Now the question is, is there a way to find out the actual missing documents? Only the missing Ids should also be enough for a start.

Both the indices have a field called member_id which is unique for each document and is the same as the document id, so retrieving only the missing id fields should also be enough.

I cannot compare the index directly to the source database because this data comes from an external API.

radu.gheorghe · August 31, 2023, 5:34am

What I would do:

have a script that scrolls through all data in index_a
for every page, take the list of IDs and write a terms query searching in index_b
if you get the whole page back (say your page size is 1000, you get 1000 docs back from your search), then all the documents in that page are in index_b. If not, you’ll have to compare the list in your script and identify the missing docs. Add them (or their IDs) to a list in your script
repeat all the above until you scrolled through all the data. In the end, the list of your script should have your missing documents

All this shouldn’t put too much load on OpenSearch.

Topic		Replies	Views
Comparing documents from two different indices OpenSearch	9	210	January 2, 2025
Not all results are shown for query OpenSearch	2	1001	August 7, 2022
Some of OpenSearch's search response documents are intermittently missing OpenSearch troubleshoot	0	215	October 7, 2024
Duplicate documents in different indexes OpenSearch	1	1300	January 3, 2023
Can I config OpenSearch to skip the same-ID-check Open Source Elasticsearch and Kibana discuss , configure , feature-request	0	568	January 26, 2023

I have 2 indices, index_a and index_b. The 2 indices have documents with the almost the exact same template. index_b has some extra fields which was introduced as part of a new feature, but it also has all the fields already present in index_a. We have

Related topics