Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
2.11
Describe the issue:
In the transforms job, the min, max, avg aggregations on a missing field is resulting in -Infinity, Infinity, NaN. Also value_count and sum results in 0.
The issue is - the target index is being populated with the new fields with such values (along with making some of the fields mapping set to TEXT). It behaves better with setting “missing”: 0 for numeric fields in the agg function but it’s not ideal as it misrepresents the data.
What I really want is for the missing fields based fields not to be in target index at all for those documents. Is there a way to accomplish this?
Here is an example:
Transform job:
{
"transform": {
"enabled": true,
"continuous": true,
"schedule": {
"interval": {
"period": 5,
"unit": "Minutes"
}
},
"description": "Sample transform job",
"source_index": "sample",
"target_index": "sample_transform",
"data_selection_query": {
"match_all": {}
},
"page_size": 1,
"groups": [
{
"date_histogram": {
"source_field": "timestamp",
"fixed_interval": "60m",
"timezone": "UTC"
}
},
{
"terms": {
"source_field": "device.keyword",
"target_field": "device"
}
}
],
"aggregations": {
"m1_value_count": {
"value_count": {
"field": "m1"
}
},
"m1_avg": {
"avg": {
"field": "m1"
}
},
"m1_max": {
"max": {
"field": "m1"
}
},
"m1_min": {
"min": {
"field": "m1"
}
},
"m1_sum": {
"sum": {
"field": "m1"
}
},
"m3_value_count": {
"value_count": {
"field": "m3"
}
},
"m3_avg": {
"avg": {
"field": "m3"
}
},
"m3_max": {
"max": {
"field": "m3"
}
},
"m3_min": {
"min": {
"field": "m3"
}
},
"m3_sum": {
"sum": {
"field": "m3"
}
}
}
}
}
In the target index, you can see m3 related fields showing up a certain way when m3 is missing in the time interval.
"_source": {
"transform._id": "metric_all_3_transform_job",
"_doc_count": 22,
"transform._doc_count": 22,
"timestamp": 1728007200000,
"device": "1.1.1.1",
"m1_max": 99.13,
"m1_min": 17.66,
"m1_avg": 56.58500000000001,
"m1_value_count": 22.0,
"m1_sum": 1244.8700000000001
"m3_max": "-Infinity",
"m3_min": "Infinity",
"m3_avg": "NaN",
"m3_sum": 0.0,
"m3_value_count": 0.0
}
What I am hoping is there is a way to not generate any of the m3* fields in such cases.
Configuration:
Relevant Logs or Screenshots: