Getting rid of unwanted commas in long formatted data

Versions 2.9.0

At the moment I am using dynamic template for my indices. In reality, the only thing I specify in there is the primary/replica shard allocation count. Everything else is dynamically generated. At the moment fields that are written as long have comma in them. Is there a way to get rid of it? Do i need to convert them to integer or double? My mapping may change over time as I write different events to index, but would it be possible to explicitly define the field types in the mapping template for the fields that I know, and still allow for other fields to be genrated dynamically?

For example for this snippet in the mapping

                    "inode" : {
                      "type" : "long"
                    },

Can I define mapping for this node as integer or double?

If I’m reading this correctly, you have multiple questions :slight_smile:

At the moment fields that are written as long have comma in them. Is there a way to get rid of it?

Do you mean that the field names have comma in them? I think you can fix that with ingest pipelines.

Do i need to convert them to integer or double?

Or maybe the values have a comma in them? They maybe you can change them (again via ingest pipelines) to have a dot instead of a comma and then they’ll be double?

My mapping may change over time as I write different events to index, but would it be possible to explicitly define the field types in the mapping template for the fields that I know, and still allow for other fields to be genrated dynamically?

You can do that (in the index template) and I find that a good idea. Rely as little as possible on OpenSearch’s “guessing” the right mapping for you. Use dynamic templates and - if you can - naming conventions.

As a warning for how field type detection could go wrong, I always tell this story from almost 12 years ago :see_no_evil: with Elasticsearch: we were indexing logs into daily indices and of course all logs would have a message field. One day I come to work and there are hardly any logs in the new index. Turns out the first message after midnight (which created the index for that day) had a value of 2 in the message field, making it long. Which made most messages that came after it (which weren’t numeric) fail to get indexed, throwing a NumberFormatException. Needless to say, that was the day when message was defined in the index template :slight_smile: