Sometimes, I want to create an index with a field that looks like a timestamp, but isn’t always a timestamp, and I want to prevent OpenSearch from automatically marking it as a “date” field rather than a “text” field in the index mappings.
If the first document in the index has a field with a value such as “12:00:00T12-03-2022”, OpenSearch will automatically set the field as a date. My problem is that sometimes I want to store dates that are malformed (for example, due to data entry errors.) This results in a broken index, and the only way to fix it is to manually change the index mapping through the “Index Management” screen.
Is there any way to prevent this automatic date detection so that my index doesn’t end up broken?
I’ve tried writing code that retrieves the index mapping, reads the schema, and then returns a corrected schema that has the mappings that I want, but I found that it can’t convert “date” fields to “text” fields.
It seems like there should be an efficient way to solve this problem without performing any complicated migrations, since it is fundamentally just a bookkeeping issue. I believe that Elastic Search has an index setting to enable/disable automatic date detection, but OpenSearch currently lacks an equivalent feature.
I thought about using index patterns, but there are problems with this solution also.
Index patterns require knowing the exact schema in advance. There is no way to do things like set the type of fields that match a regex. This, I believe, is another area where ElasticSearch is more capable than OpenSearch.
I think this problem is likely a shortcoming in the current API, and I am hoping to open a discussion about the possibility of improving the current design
Set template.mappings.dynamic to be false so you can prevent data to be indexed as any type except ‘date’ and ‘text’ type.
→ It can prevent your second problem: an index setting that disables automatic date detection
Specify on_failure parameter on your processor(“date”)
When you create index templates, add multi-fields type for your field having both ‘text’ and ‘date’ type. There will be no problem with the former(‘text’; because “12:00:00T12-03-2022” can be read as string type.).
When OpenSearch tries to index error(e.g. “illegal type: not date type”, etc) to be ‘date’ type, you can handle errors with on_failure configuration during the ingest pipeline runs.
Is template.mappings.dynamic currently undocumented? I can’t find any mention on the page for index settings
I haven’t tried using an onFailure handler. Wouldn’t this method require reindexing? It seems like this would work, but it’s more complicated and more of a workaround for an issue with the current design