Duplicate record check in dashboard repository

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

Verision 2.15

Describe the issue:

I am inserting the data to dashboard repository as bulk records from a file and document id is auto generated by open search for individual records from the file.
In case i wanted to reprocess that file, how do i delete the old records or update the existing record?. At present it is inserting as new records into the repository which is creating duplicate records.
Any solution for this issue?

Configuration:

Relevant Logs or Screenshots:

1 Like

Any help would be highly appreciated.

is there any way to throw error in case the duplicate record exists?

For your case, there’re two ways to avoid duplicated documents:

  1. Before reprocessing the file, remove the index directly
  2. Do not use auto-generated document id, you can specify custom document id for each document.

@gaobinlong

I can not delete the index, as we are processing multiple files a day. If i remove the index, then i will loose all the data, not only that specific file.

Even if i generate document id, i may not be able to map the old file document id which is there in the dashboard and new file document id which is going to be generated.

Does each document have an unique identifier such as UUID or auto-increment ID? If so you can use that id as the document id directly. If you don’t want to use that id as the document id, executing an query to fetch the document id generated by OpenSearch before inserting the document, and then insert the document with the document id, now the old document with that id will be override by the new content.