Reindexing with zero downtime - update document

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OS 2.7 to OS 2.12

Describe the issue:
Hello,

I am looking for solution to do zero downtime reindexing with all read, write, delete and update without any interruption/data loss.

In our case, we are doing data encryption using encryption keys in index data. So we usually do key rotation every 6 months.

More info
We are maintaining a mapping between encryption key and index name. We are not maintaining any specific field in doc for finding the encryption key. Each field value will be encrypted using the encryption key which is mapped to specific index name. When we wanted to rotate the key for existing data, we will create another index and map a new key and re-encrypt using new key using _reindex API.

Say we create index1 with different aliases initially. So read, write and delete operations will use different aliases in application.

index1

  • read_alias - for search operations
  • write_alias - for write opeartions
  • delete_alias - for delete operations

So when we want to reindex, we will create index2 for doing reindex from index1 to index2 (new encryption key) to do the key rotation.
Also we will create index3 for writing new documents (with new encryption key)

Before starting reindex the below will be the state of aliases of each index

1. index1 (original index)

  • read_alias
  • delete_alias

2. index2 (dest index for reindex)

  • delete_alias

3. index3 (new write index)

  • read_alias
  • write_alias
  • delete_alias (pointed to new index for new writes)

This is we have designed to achieve zero downtime reads, deletes, writes earlier for other application, that doesn’t have update operation. right now we need updates as well in the application.

After reindex we switched aliases like

1. index1 (original index)

  • all aliases will be removed

2. index2 (dest index for reindex)

  • read_alias (added)
  • delete_alias

3. index3 (new write index)

  • read_alias
  • write_alias
  • delete_alias

So we want to handle update as well on the existing index (index1) and index2 as well with aliases.

Limitations we have faced:

  • When aliases pointed to multiple indices we can’t do write operation by alias
  • Get by id, delete by id can’t be done with alias when pointed to multiple indices (that we are thinking to use search and delete_by_query with _id field)

I have searched outside for how to do reindex from index1 to index2 with handling updates on both index1 and index2 while reindex is happening to avoid the data loss.

I couldn’t find any clean solutions for update existing documents. All solutions are like

  • Block updates on index1 and reindexing to index2
  • Or Write to each index by checking each index and update the document by indexName

Any help/pointers would be appreciated.

CC: @reta @cwperks

Not sure of the best way to solve this, sorry. Maybe this post on SO can offer a suggestion? Elasticsearch Reindexing while updating documents? - Stack Overflow