Transform job aggregations for missing field behavior

akkina9 · October 5, 2024, 2:27am

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
2.11

Describe the issue:
In the transforms job, the min, max, avg aggregations on a missing field is resulting in -Infinity, Infinity, NaN. Also value_count and sum results in 0.

The issue is - the target index is being populated with the new fields with such values (along with making some of the fields mapping set to TEXT). It behaves better with setting “missing”: 0 for numeric fields in the agg function but it’s not ideal as it misrepresents the data.

What I really want is for the missing fields based fields not to be in target index at all for those documents. Is there a way to accomplish this?

Here is an example:
Transform job:

{
    "transform": {
        "enabled": true,
        "continuous": true,
        "schedule": {
            "interval": {
                "period": 5,
                "unit": "Minutes"
            }
        },
        "description": "Sample transform job",
        "source_index": "sample",
        "target_index": "sample_transform",
        "data_selection_query": {
            "match_all": {}
        },
        "page_size": 1,
        "groups": [
            {
                "date_histogram": {
                    "source_field": "timestamp",
                    "fixed_interval": "60m",
                    "timezone": "UTC"
                }
            },
            {
                "terms": {
                    "source_field": "device.keyword",
                    "target_field": "device"
                }
            }
        ],
        "aggregations": {
            "m1_value_count": {
                "value_count": {
                    "field": "m1"
                }
            },
            "m1_avg": {
                "avg": {
                    "field": "m1"
                }
            },
            "m1_max": {
                "max": {
                    "field": "m1"
                }
            },
            "m1_min": {
                "min": {
                    "field": "m1"
                }
            },
            "m1_sum": {
                "sum": {
                    "field": "m1"
                }
            },
            "m3_value_count": {
                "value_count": {
                    "field": "m3"
                }
            },
            "m3_avg": {
                "avg": {
                    "field": "m3"
                }
            },
            "m3_max": {
                "max": {
                    "field": "m3"
                }
            },
            "m3_min": {
                "min": {
                    "field": "m3"
                }
            },
            "m3_sum": {
                "sum": {
                    "field": "m3"
                }
            }
        }
    }
}

In the target index, you can see m3 related fields showing up a certain way when m3 is missing in the time interval.

                "_source": {
                    "transform._id": "metric_all_3_transform_job",
                    "_doc_count": 22,
                    "transform._doc_count": 22,
                    "timestamp": 1728007200000,
                    "device": "1.1.1.1",
                    "m1_max": 99.13,
                    "m1_min": 17.66,
                    "m1_avg": 56.58500000000001,
                    "m1_value_count": 22.0,
                    "m1_sum": 1244.8700000000001
                    "m3_max": "-Infinity",
                    "m3_min": "Infinity",
                    "m3_avg": "NaN",
                    "m3_sum": 0.0,
                    "m3_value_count": 0.0
                }

What I am hoping is there is a way to not generate any of the m3* fields in such cases.

Configuration:

Relevant Logs or Screenshots:

system · December 4, 2024, 2:27am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Transform Job Missing Data Index Management	1	182	July 5, 2024
[transform job] there are no field in source index but theare are in that source index OpenSearch	3	294	September 21, 2023
Unable to run my transform job after updating to OpenSearch 2.11 Index Management troubleshoot	3	353	April 9, 2024
Transform job accuracy OpenSearch troubleshoot	0	15	March 28, 2025
Transform Job "Failed to get the modified buckets in source indices" Index Management troubleshoot , index-management	2	308	April 30, 2024

Transform job aggregations for missing field behavior

Related topics