In reading about the Remote Store feature, I was wondering if this was laying the groundwork for additional functionality to be built on top of it?
We are currently looking at solutions to offload some very large fields inside of our
_source field that are currently not indexed, but that we would like available to return if requested. One idea is to store the contents of some fields in an object bucket store (S3, GCS, etc) and then fetch them as required to return the full document.
This could even apply to an entire
_source document in cases where the user may have chosen not to store
_source, this would allow them to store
_source in a remote location so that it is still available, but for most of their OpenSearch queries and needs, they would not rely on having
In our case we have some documents that can have a single field over a MB. This field is unindexed, but used in some of our applications. To simplify the application code, we store the data in OpenSearch. However, we end up paying increased storage costs (and likely many others), storing and returning large fields that we only sometimes need.
I’m happy to start digging in with code if there is a suggested starting point for a plugin. It seems to me that a potential starting point could be an option to have the stored data for a given field point to the Remote Store shard. Being able to index our large documents so that we can locate them in searches, and use docvalues, etc for aggregations, but have the
_source data stored elsewhere (with the expected latency in full retrieval), is very intriguing to me.