PEP 615: Support for the IANA Time Zone Database in the Standard Library

pganssle · February 28, 2020, 2:54pm

Thank you for taking the time to comment on the proposal!

I understand where you are coming from here, but there are a lot of reasons to use the cache, and good reasons to believe that using the cache won’t be a problem.

The question of “reloading time zone data transparently” could mean that existing datetimes would be updated if the data on disk changes (which would be problematic from a datetimes-are-immutable point of view), or it could mean that newly-constructed datetimes are always pulled from the latest data. Assuming we can only do the second thing, that means that if you get time zone data updates during a run of your program, you will end up with a mixture of stale and non-stale time zones, which is also pretty non-ideal.

I think there’s also a lot of precedent for this kind of thing:

It is already the case that if system local time changes, you must call time.tzset() in order to invalidate the cache, and this only works on some platforms. We’re basically already in a situation where you must actively take action to get the “latest time zone information” during a run of the interpreter.
Right now more or less everyone uses a cache and there are not really any complaints. pytz and dateutil both use a similar caching behavior (and AFAIK pytz doesn’t even expose a way to opt out of it - everything is unconditionally cached). I don’t think I’ve heard of anyone complaining about this behavior or even noticing much.

And one of the main drivers for this cache behavior is that the semantics of datetime explicitly assume that time zones are singletons, and you can run into some weird situations if you don’t use singletons. Consider this case:

>>> from datetime import *
>>> from zoneinfo import ZoneInfo
>>> dt0 = datetime(2020, 3, 8, tzinfo=ZoneInfo.nocache("America/New_York"))
>>> dt1 = dt0 + timedelta(1)
>>> dt2 = dt1.replace(tzinfo=ZoneInfo.nocache("America/New_York"))
>>> dt2 == dt1
True
>>> print(dt2 - dt1)
0:00:00
>>> print(dt2 - dt0)
23:00:00
>>> print(dt1 - dt0)
1 day, 0:00:00

Note that this makes no use of the cache — I used the .nocache constructor to simulate what the semantics of a non-caching constructor would look like - it could cause strange path-dependencies where whether you got somewhere by arithmetic or by construction / replace operations, you’d get different answers.

So, to summarize my position:

Cache by default because most people will want such a cache, even if they don’t know it.
Document various strategies and their trade-offs for people with long-running applications - including the use of ZoneInfo.clear_cache() for tzset-like behavior and ZoneInfo.nocache for “always give me a fresh copy” behavior.