Calendar awareness in the context of Date Histogram Aggregations

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Opensearch 2.6 and Opensearch 2.11 (I tested it on both)

Describe the issue:
[Warning: this is a long post, I tested a lot of different scenarios]
I am currently looking at date-histogram aggregations for a project and researching the difference between calendar-interval and fixed-interval. The Opensearch documentation uses the deprecated “interval” parameter which to my understanding just uses fixed-interval unless the unit is Month, Quartal, Week, Year and possibly if the number of days is 1.

According to Elasticsearch documentation (now I haven’t tested it on ES) the difference is very clear. A fixed-interval is “always multiples of SI units and do not change based on calendaring context”. A calendar-interval on the other hand “understands that daylight savings change the length of specific days, months have different amount of days and leap seconds can be tackled onto a particular year.”

Now this is fairly easy to test. I did 2 different tests. One to test DST crossings and one to test the Leap seconds.
(For both my tests I used a “range” query-filter in combination with “extended_bounds” in the aggregation and a “min_doc_count” of 0 so I don’t actually have to insert fake data.

DST Crossing Test
For this test I used this years crossing into Daylight savings (March 12 2023 [USA] / March 26 2023 [London])
There are 4 scenarios, all four are tested using an interval of 1 day between the day before and after the DST crossing (3 days):

  1. Calendar-interval with time zone (I tested it both on “America/Los_Angeles” and “Europe/London”)
  2. Calendar-interval without time zone
  3. Fixed-interval with time zone
  4. Calendar-interval without time zone

Expected results:

  1. The first bucket boundaries (key) should start at midnight PST then switch to midnight PDT. Timestamps should have a difference of 23 hours on the day of the crossing.
  2. All 3 boundaries should start at midnight UTC and stay at midnight UTC. Timestamps should have a difference of 24 hours.
  3. The first boundaries should start at midnight PST and then switch to 1am PDT. Timestamps should have a difference of 24 hours. This would be because a day is defined as 24 h (86400000ms) for fixed_interval (ES doc)
  4. All 3 boundaries should start at midnight UTC and stay at midnight UTC. Timestamps should have a difference of 24 hours.

Now these are the 4 queries:

1. Calendar with TZ

URL: …{myServer}/{index}/_search

{
    "size": 0,
    "query": {
        "range": {
            "date_of_last_load": {
                "from": "2023-03-11T00:00:00.000Z",
                "to": "2023-03-13T00:00:00.000Z",
                "include_lower": true,
                "include_upper": true
            }
        }
    },
    "_source": false,
    "aggregations": {
        "dateHistogram": {
            "date_histogram": {
                "field": "date_of_last_load",
                "time_zone": "America/Los_Angeles",
                "calendar_interval": "1d",
                "order": {
                    "_key": "asc"
                },
                "keyed": false,
                "min_doc_count": 0,
                "extended_bounds": {
                    "min": 1678579200000,
                    "max": 1678752000000
                }
            }
        }
    }
}
2. Calendar without TZ

URL: …{myServer}/{index}/_search

{
    "size": 0,
    "query": {
        "range": {
            "date_of_last_load": {
                "from": "2023-03-11T00:00:00.000Z",
                "to": "2023-03-13T00:00:00.000Z",
                "include_lower": true,
                "include_upper": true
            }
        }
    },
    "_source": false,
    "aggregations": {
        "dateHistogram": {
            "date_histogram": {
                "field": "date_of_last_load",
                "calendar_interval": "1d",
                "order": {
                    "_key": "asc"
                },
                "keyed": false,
                "min_doc_count": 0,
                "extended_bounds": {
                    "min": 1678492800000,
                    "max": 1678665600000
                }
            }
        }
    }
}
3. Fixed with TZ

URL: …{myServer}/{index}/_search

{
    "size": 0,
    "query": {
        "range": {
            "date_of_last_load": {
                "from": "2023-03-11T00:00:00.000Z",
                "to": "2023-03-13T00:00:00.000Z",
                "include_lower": true,
                "include_upper": true
            }
        }
    },
    "_source": false,
    "aggregations": {
        "dateHistogram": {
            "date_histogram": {
                "field": "date_of_last_load",
                "time_zone": "America/Los_Angeles",
                "fixed_interval": "1d",
                "order": {
                    "_key": "asc"
                },
                "keyed": false,
                "min_doc_count": 0,
                "extended_bounds": {
                    "min": 1678579200000,
                    "max": 1678752000000
                }
            }
        }
    }
}
4. Fixed without TZ

URL: …{myServer}/{index}/_search

{
    "size": 0,
    "query": {
        "range": {
            "date_of_last_load": {
                "from": "2023-03-11T00:00:00.000Z",
                "to": "2023-03-13T00:00:00.000Z",
                "include_lower": true,
                "include_upper": true
            }
        }
    },
    "_source": false,
    "aggregations": {
        "dateHistogram": {
            "date_histogram": {
                "field": "date_of_last_load",
                "fixed_interval": "1d",
                "order": {
                    "_key": "asc"
                },
                "keyed": false,
                "min_doc_count": 0,
                "extended_bounds": {
                    "min": 1678492800000,
                    "max": 1678665600000
                }
            }
        }
    }
}

And here are the results:

The results were as expected, the crossing of DST was compensated for (as described in the ES doc) and the timestamp difference was 23Hours

1. Calendar with TZ
{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "dateHistogram": {
            "buckets": [
                {
                    "key_as_string": "2023-03-11T00:00:00.000-08:00",
                    "key": 1678521600000,
                    "doc_count": 0
                },
                {
                    "key_as_string": "2023-03-12T00:00:00.000-08:00",
                    "key": 1678608000000,
                    "doc_count": 0
                },
                {
                    "key_as_string": "2023-03-13T00:00:00.000-07:00",
                    "key": 1678690800000,
                    "doc_count": 0
                }
            ]
        }
    }
}

Also as expected, UTC doesn’t have any DST and therefore the time stayed at midnight and timestamp diff was 24h

2. Calendar without TZ
{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "dateHistogram": {
            "buckets": [
                {
                    "key_as_string": "2023-03-11T00:00:00.000Z",
                    "key": 1678492800000,
                    "doc_count": 0
                },
                {
                    "key_as_string": "2023-03-12T00:00:00.000Z",
                    "key": 1678579200000,
                    "doc_count": 0
                },
                {
                    "key_as_string": "2023-03-13T00:00:00.000Z",
                    "key": 1678665600000,
                    "doc_count": 0
                }
            ]
        }
    }
}

Now here is the issue. The result was identical to #1. Meaning that even tho this is not supposed to be calendar-aware, the results were. (see #1)

3. Fixed with TZ
{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "dateHistogram": {
            "buckets": [
                {
                    "key_as_string": "2023-03-11T00:00:00.000-08:00",
                    "key": 1678521600000,
                    "doc_count": 0
                },
                {
                    "key_as_string": "2023-03-12T00:00:00.000-08:00",
                    "key": 1678608000000,
                    "doc_count": 0
                },
                {
                    "key_as_string": "2023-03-13T00:00:00.000-07:00",
                    "key": 1678690800000,
                    "doc_count": 0
                }
            ]
        }
    }
}

As expected same results as #2

4. Fixed without TZ
{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "dateHistogram": {
            "buckets": [
                {
                    "key_as_string": "2023-03-11T00:00:00.000Z",
                    "key": 1678492800000,
                    "doc_count": 0
                },
                {
                    "key_as_string": "2023-03-12T00:00:00.000Z",
                    "key": 1678579200000,
                    "doc_count": 0
                },
                {
                    "key_as_string": "2023-03-13T00:00:00.000Z",
                    "key": 1678665600000,
                    "doc_count": 0
                }
            ]
        }
    }
}

Now what is happening here? Well the documentation states in a note under the chapter Time zone:

Many time zones shift their clocks for daylight savings time. Buckets close to the moment when those changes happen can have slightly different sizes than you would expect from the calendar_interval or fixed_interval

And even worse than that, if using “hours” instead of days the time zone catches up with that. For example 12 hours over a daylight crossing will become 11 hours both on calendar_interval and fixed_interval.

I tried many different timezones and got the same results. Therefore, other than calendar_interval offering week, month, quartal, year and fixed_interval offering multiples of microseconds, seconds, minutes, hours and days. There is not difference. (If there is an actual difference please let me know)

Leap Seconds

I did a second test, and I am going to omit the “code”. If you need it let me know and I’ll provide it in the comments. But for the leap seconds I tested it with hours instead of days and I tested it around December 31 2016. That is the last time a leap second was added.

Now again the results were the same for fixed and calendar intervals but worse no leap second was detectable. This might be because epoch numbers don’t actually show leap seconds. But then there is no way of telling that anything different was done.

Thank you for reading this whole post. Let me know what you think. Did I do something wrong, did I read the documentation wrong? Is there a bug? Or what is happening? Thank you :slight_smile:

1 Like