Deduplication of search results

thedraketaylor · February 14, 2022, 1:59am

I have a few indices that contain fields like the following:

"download_link" : [
  {
    "link" : "http://thingiverse.com/download-url-1",
    "domain" : "thingiverse.com"
  },
  {
    "link" : "http://thingiverse.com/download-url-2",
    "domain" : "thingiverse.com"
  },
  {
    "link" : "http://thingiverse.com/download-url-3",
    "domain" : "thingiverse.com"
  }
],

The problem is, there are several documents that include that same field and data as they are scraped from sites and when I search for a term, all the documents that contain the exact same data are returned back. Is there a way I can remove the duplicates from the results? Is there a better way to structure the documents so that removing the duplicates would be easier? I need to re-index the documents so if I need to alter them, now’s the time.

Thanks!

Topic		Replies	Views
Not all results are shown for query OpenSearch	2	960	August 7, 2022
Delete duplicated data in query result Open Source Elasticsearch and Kibana	1	339	July 31, 2024
Getting duplicate records from Elastic search General Feedback	4	3209	August 12, 2022
I have 2 indices, index_a and index_b. The 2 indices have documents with the almost the exact same template. index_b has some extra fields which was introduced as part of a new feature, but it also has all the fields already present in index_a. We have OpenSearch troubleshoot	1	469	August 31, 2023
Duplicate record check in dashboard repository OpenSearch Dashboards	4	107	July 18, 2024

Deduplication of search results

Related topics