Question about _reindex behavior when using custom _id values

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): OpenSearch-1.3.6

Describe the issue:

Hi everyone,

In our project, we use custom _id values based on a numeric sequence (1, 2, 3, …) instead of the automatically generated IDs.

I have a question regarding the behavior of the _reindex API:

  • If a document with a given _id already exists in the target index, and we run a _reindex operation from another index that contains the same _id, can you confirm that the existing document will be overwritten?

  • If we rely on automatically generated IDs instead, how does OpenSearch ensure that IDs won’t collide during a reindex into an index that already contains data?

  • More generally, what is the recommended way to guarantee that reindexing into an index with existing documents won’t accidentally overwrite a document that happens to share the same _id as one coming from the source index?

Thanks in advance for your help!

@Sylex-io By default reindex uses index operation, therefore it will overwrite any existing documents is the IDs collide.

You can use "op_type": "create" in the reindex operation to not override the existing values.

But the question is what do you want to happen to the collision? The reindex operation returns only the number of collisions, but not the IDs themselves.

Generally it is recommended to let OpenSearch determine the IDs for performance reason and better shard distribution.

Hope this helps