Hi all, at the moment my team is attempting to create a search service for a platform and we are contemplating different methods of keeping the documents in the index up to date. One potential solution that was raised was use of the _update_by_query API.
Our documents have 2 relevant fields
- Urn is our identifying field, it is a string.
- Id indicates the revision of the file, it is an integer and increases for each version of the file.
Our concern is receiving update events from the platform out of order, we want to make sure we are not overwriting the document in the opensearch index with an outdated version. We would only like to insert the document by Urn if it doesnt already exist, or if it exists and the Id is less than what we are trying to insert.
After some investgation we found it was possible to create a script in opensearch, which copies all params over into the document, and specifying in the _update_by_query query that the id must be GTE the id we are about to attempt insertion of, but this doesnt seem to work for inserting a new document, doesnt seem to be usable with the bulk operation API.
I have come to the community in search of potential ideas and solutions, about how to “upsert” a document, only if it does not exist or if it exists with a field being less than what we are trying to insert. Interested to hear your thoughts!