Preventing automatic 'date' fields when I want 'text' fields in index mappings

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

Describe the issue:

Sometimes, I want to create an index with a field that looks like a timestamp, but isn’t always a timestamp, and I want to prevent OpenSearch from automatically marking it as a “date” field rather than a “text” field in the index mappings.

If the first document in the index has a field with a value such as “12:00:00T12-03-2022”, OpenSearch will automatically set the field as a date. My problem is that sometimes I want to store dates that are malformed (for example, due to data entry errors.) This results in a broken index, and the only way to fix it is to manually change the index mapping through the “Index Management” screen.

Is there any way to prevent this automatic date detection so that my index doesn’t end up broken?

I’ve tried writing code that retrieves the index mapping, reads the schema, and then returns a corrected schema that has the mappings that I want, but I found that it can’t convert “date” fields to “text” fields.

It seems like there should be an efficient way to solve this problem without performing any complicated migrations, since it is fundamentally just a bookkeeping issue. I believe that Elastic Search has an index setting to enable/disable automatic date detection, but OpenSearch currently lacks an equivalent feature.

Configuration:

Relevant Logs or Screenshots:

1 Like

OpenSearch(and Elasticsearch) provides Index Templates to let you initialize new indexes with predefined mappings and settings.

You can create templates using

  1. OpenSearch Dashboards:
  • Index Management > Templates > Create template

or

  1. API:
PUT _index_template/<template name>
POST _index_template/<template name>

In your case, let {date_field_name} field to be ‘date’ type in a template before any index is created, and then flow data to be indexed.

Thanks for the reply,

I thought about using index patterns, but there are problems with this solution also.

Index patterns require knowing the exact schema in advance. There is no way to do things like set the type of fields that match a regex. This, I believe, is another area where ElasticSearch is more capable than OpenSearch.

I think this problem is likely a shortcoming in the current API, and I am hoping to open a discussion about the possibility of improving the current design

I can think of two possible improvements that would address this problem:

  • a way to specify the type of fields when you index a document
  • (most simple) an index setting that disables automatic date detection

Hmm… then how about the below idea?

  1. Set template.mappings.dynamic to be false so you can prevent data to be indexed as any type except ‘date’ and ‘text’ type.
    → It can prevent your second problem: an index setting that disables automatic date detection

  2. Specify on_failure parameter on your processor(“date”)

  • When you create index templates, add multi-fields type for your field having both ‘text’ and ‘date’ type. There will be no problem with the former(‘text’; because “12:00:00T12-03-2022” can be read as string type.).
  • When OpenSearch tries to index error(e.g. “illegal type: not date type”, etc) to be ‘date’ type, you can handle errors with on_failure configuration during the ingest pipeline runs.

Is template.mappings.dynamic currently undocumented? I can’t find any mention on the page for index settings

I haven’t tried using an onFailure handler. Wouldn’t this method require reindexing? It seems like this would work, but it’s more complicated and more of a workaround for an issue with the current design