Behavior of datetime objects at DST gap when converting to UTC

Niko · October 18, 2023, 12:34pm

While writing some unit tests I bumped into the issue of DST gaps. Let’s say I have some timestamps in Europe/London timezone which have a DST change at 2023-03-26 01:00 UTC which introduces a gap in local time. During this night the local time 01:00 - 01:59 does not exist.

I would like to take arbitrary local timestamp ts which is inside the DST gap and convert it into a valid timestamp. Let’s say the time is “2023-03-26 01:20” (London):

import datetime as dt 
from zoneinfo import ZoneInfo

ts = dt.datetime.strptime("2023-03-26 01:20:00", "%Y-%m-%d %H:%M:%S").replace(
    tzinfo=ZoneInfo("Europe/London")
)

Now I would expect that if I convert a timestamp to UTC and back to local time, I would get 02:00 which is what I’m after. As a picture:

That is
a) The original timestamp, which is 01:20 local
b) Original timestamp projected onto UTC axis, getting the value of 01:00 UTC
c) Local timestamp based on the UTC value 01:00, which is 02:00

But what I get instead is

>>> ts.astimezone(ZoneInfo('UTC')).astimezone(ts.tzinfo)
datetime.datetime(2023, 3, 26, 2, 20, tzinfo=zoneinfo.ZoneInfo(key='Europe/London'))

This is because the time zone change for a local time inside a DST gap produces UTC timestamp value of 01:20:

>>> ts.astimezone(ZoneInfo('UTC'))
datetime.datetime(2023, 3, 26, 1, 20, tzinfo=zoneinfo.ZoneInfo(key='UTC'))

In pictures, this conversion logic seems to be:

That is
a) The original timestamp, which is 01:20 local
b) Original timestamp converted to UTC, getting value of 01:20
c) Local timestamp based on the UTC value 01:20, which is 02:20

Questions

How would I get the value 02:00 in this case? I can check if a local timestamp is within a DST gap, and I could round down and add an hour, but what if some country decides their DST changes will be 30 minutes? I should probably look into some DST changes listing? Or is there some easier way?
Perhaps the harder one to answer: Why local timestamp converted to UTC is not a projection to the UTC axis but simple 1 hour addition..? Is this a bug or a designed feature with some use case I cannot see?

kpfleming · October 18, 2023, 2:26pm

You’re going to struggle to get the result you want, because this timestamp is not valid; that timepoint (in local time) does not exist.

You’ll have the reverse situation for the autumnal change, where a single UTC timepoint maps to two local timepoints.

barry-scott · October 18, 2023, 4:55pm

You have two ways to avoid the ambiguity.

Always use UTC
write the localtime and the timezone offset like 2023-10-18T17:50:39+0100

(1) is the usual design choice these days.

Niko · October 19, 2023, 8:41am

@kpfleming I understand that it is not a valid timestamp (withing DST gap) and that there is the DST fold during fall. But think for example a case where you might need accept user configuration with arbitrary time string, like "03:00:00", tied to arbitrary timezone like America/Halifax, Indian/Mauritius or Pacific/Tongatapu and this should be parsed and converted to UTC for further processing. There it would be handy to be able to either raise an exception or coerce the value to next possible and valid time (or previous, whatever the logic should be). The fold issue is simpler to handle as it defaults to 0 (first occurrence) and could be easily configured to 1 (second occurrence) if a user wishes so.

@barry-scott Point 1: I like that. For making everyone’s life easier it is better to always use UTC in all parts of any application code. The exceptions are just the possible input from user or printout to user.

The point 2, writing everything with constant timezone offset, like 2023-10-18T17:50:39+0100 is useful when you want something to occur at 03:00 AM during winter and at 02:00 AM during summer. But if you want to have something occuring at some exact local time, it is in my opinion easiest to work with the tzdata database names.

barry-scott · October 19, 2023, 9:00am

You want to know how to schedule an event in the DST overlap/gap?
You can make a rule that is predictable for your use case can’t you?

I was addressing the issue of timestamps in an audit trail where the timezone name is not enough information.

Rosuav · October 19, 2023, 9:04am

It’s hard to pin down what SHOULD happen here. Let’s take a straight-forward, if deliberately perverse, example. Suppose I schedule an automated action (like a nightly backup) at 2:30AM every day, local time. What should happen around a DST switch? Logically, the action still needs to happen, and only once. But when? If you ask four people what they intuitively expect to happen, you’ll get five different answers.

Niko · October 19, 2023, 9:36am

Good point. I would also guess there are almost as many opinions as there are people.

I use the Europe/London timezone and spring DST offset switch 2023-03-26 as an example. Currently the local → UTC → local conversion works like this (UTC to local does not have ambiguity, but local → UTC may divide opinions):

import datetime as dt 
from zoneinfo import ZoneInfo

timestamps = [
  "2023-03-26 00:00:00",
  "2023-03-26 00:59:00",
  "2023-03-26 01:00:00", # does not exist
  "2023-03-26 01:20:00", # does not exist
  "2023-03-26 01:59:00", # does not exist
  "2023-03-26 02:00:00",
  "2023-03-26 02:30:00",
]


def get_ts(x):
  return dt.datetime.strptime(x, "%Y-%m-%d %H:%M:%S").replace(
      tzinfo=ZoneInfo("Europe/London")
  )

def to_valid_local(ts):
  return ts.astimezone(ZoneInfo('UTC')).astimezone(ts.tzinfo)
  
for ts in map(get_ts, timestamps):
  print(ts.time(), '->', to_valid_local(ts).time())

This prints out (local input → local coerced):

00:00:00 -> 00:00:00
00:59:00 -> 00:59:00
01:00:00 -> 02:00:00
01:20:00 -> 02:20:00
01:59:00 -> 02:59:00
02:00:00 -> 02:00:00
02:30:00 -> 02:30:00

If we plot that out we see how the coercion looks like

Left (a) shows the behaviour or ts.astimezone(ZoneInfo('UTC')).astimezone(ts.tzinfo). This is useful in many situations.

Con: Events A and B configured to happen at 01:20 and 02:10 occur at 02:20 (A), 02:10 (B), so the order is reversed

The right (b) shows an alternative.

Pro: Events A and B configured to happen at 01:20 and 02:10 occur at 02:00 (A), 02:10 (B), so the order is the same as expected.
Con*: Events A, B, C, D configured to happen at 01:10, 01:20, 01:30, 01:40 occur at 02:00 (simultaneously). These could be first ordered with the non-coerced local timestamp and having some wait between, if running sequentially.

* depends?

So I understand that in option (a) there is the upside that you won’t accidentally make many events occur at the same time, and you do not lose the minutes & seconds information. But the downside is that you might get the order tangled so events configured to the DST gap and hour next to it will be mixed.

It might be useful it the datetime.astimezone would have option for the strategy to be used in DST gaps. Or, perhaps separate datetime.asutc, if the strategy makes sense only for local → UTC conversions.

Thoughts?

Niko · October 19, 2023, 12:08pm

Continuing with more examples.

Case: Selecting datetime ranges. Again using the Europe/London timezone and spring DST offset switch at 2023-03-26.

Example 1: 01:30 to 02:10

If you select 01:30 to 02:10 (remember: gap from 01:00 to 01:59), you would probably expect to get 10 minutes worth of selection (b in figure below), but you get range of minus 20 minutes (a in figure below):

Example 2: 01:30 vs 02:00 as starting point to selection

If you select 01:30 to some distant point in future:

with (a) you get 02:30 to some distant point in future (half an hour less than if selecting from 02:00 to future)
with (b) you get 02:00 to some distant point in future. (same as if selecting from 02:00 to future)

I think the option (b) makes a bit more sense in this kind of scenarios, where you take a local timestamp as input, convert it to UTC and use that UTC timestamp to select a data range. This is course depends on the application, but I would guess I have more use cases requiring the logic of (b).

barry-scott · October 19, 2023, 7:04pm

Bare in mind that the time jumps happen at a time when most people are not at work and asleep. You may find that you just need a simple policy and you are done.

Rosuav · October 19, 2023, 9:18pm

This is true, but we can’t fully dodge the issues. Recurring events (where this is the most likely to become an issue) can occur at any point on the clock. Your “simple policy” might be very different from my “simple policy” because we’re solving slightly different problems, and our requirements are going to be slightly different. Using the hypothetical backup example from before, here are some likely expectations:

A backup WILL occur once per day, regardless.
A backup WILL NOT be delayed by more than one hour (or, putting it another way: Successive backups will not be more than 25 hours apart even in a worst-case scenario.)
To reduce the likelihood of problems, the backup SHOULD take place some time later than 1:30AM, when a different system is scheduled.
For the convenience of those defining all the schedules around this time, everything MUST be done in local time - the backup shouldn’t be happening consistently an hour earlier or later depending on season.

What’s the best way to handle this? If 2:30AM doesn’t exist, should you use 2AM, 3AM, or 3:30AM? Each of them will weaken one of those rules, so it’s a matter of picking. Which rules are the most important? My rankings might very well differ from yours.

So this is a hard problem. The best thing for Python to do is to be internally consistent and reasonably sane; if your requirements don’t gel with what Python picked, it’s best to add some dedicated DST-handling code to ensure that you get the result you need.