PEP 615: Support for the IANA Time Zone Database in the Standard Library

Okay, you got some backwards incompatibilities I didn’t think of :slight_smile: Thanks for confirming that it’s a reasonable idea though - I’m still not sure when I’m right or when I’m crazy in this area yet.

So this essentially means that a datetime instance is identified by both its UTC and its local time, right? (Which I get is the same as saying it’s identified by its naive local time and offset, but that phrasing makes more sense to me.)

And also we acknowledge that this means all your live datetime instances need to be recreated if we learn something new about the transitions? That would seem to suggest that the only feasible caching strategies are “interpreter lifetime” or “completely managed by the application”. Because unless I’ve designed thoroughly for invalidation, I’m going to end up with inconsistencies.

Assuming we keep the two constructors for either cached or non-cached lookup, which should libraries and frameworks use?

You can create a function that does zoneinfo.clear_cache() and bind it to SIGNAL.

Sorry, forgot to respond to this. If libraries and frameworks accept a string as a specifier for a time zone (rather than a tzinfo object, in which case the point is moot), I would expect them to use the primary constructor - that is almost always what you’d want.

I think the cache-bypassing constructor will be a niche use case for people who need fine-tuned control over the cache invalidation because their applications are sensitive to certain edge cases and they want to make different trade-offs. Libraries and frameworks that do the time zone construction for you should probably also accept arbitrary tzinfo objects as well, to support those use cases.

While implementing the C extension, I’ve realized that I’m not actually sure about the situation with subinterpreters – if I use a static type for ZoneInfo rather than a heap type, I think that the ZoneInfo cache also ends up being a per-process rather than a per-interpreter cache, and all ZoneInfo objects (not just the class) that hit the cache would end up being shared among all subinterpreters.

I see some examples in PEP 554 of sharing objects via marshal and pickle - if this is the primary way that objects are passed between interpreters, I think it is safe to use a per-interpreter cache, but if objects are sometimes shared directly between interpreters, then it might be preferable to use a process-wide cache for the constructor to avoid the possibility of identical time zones constructed with the primary constructor being passed between interpreters in such a way that violates the invariant of ZoneInfo(key) is ZoneInfo(key).

@eric.snow or @nanjekyejoannah – do either of you have any thoughts or clarifications on this?

I more had in mind libraries that read from a DB or file and construct a tzinfo from that, but never actually give it out to the application developer.

So I agree using the caching constructor is the right default in every case, but I think the “niche” cache management functions should either come with a big doc warning (e.g. 1) or just be omitted/internal (and hence not necessary to make an equivalent API).

(1: “This function may cause any datetime instances in your application to become incomparable, including those created by third party libraries, at unexpected times in the future. Check your dependencies before using.”)

Thanks, @pganssle, for keeping subinterpreters in mind. That really helps. :slight_smile:

That is correct. Static types are shared between interpreters. This is actually one of the things we have to solve, given how we have a bunch of static types in CPython. Using a heap type would avoid the problem.

Going with a static type would be fine until we reach the point that subinterpreters stop sharing the GIL. So in the short term you would probably be fine. However, if you can instead do it in a per-interpreter way, that would be great. It would save us later work.

Take a look a PEP 489 and the newly accepted PEP 573.

PEP 554 is aiming for minimal functionality, including only a basic set of types that can be passed between interpreters. There is no proposed support for actually sharing objects between interpreters. In fact not even their underlying data is shared.

I expect that later we’ll look into broadening the scope of inter-interpreter sharing. However, assume for now that objects in each interpreter are entirely independent of other interpreters.

1 Like

Thanks for the response Eric.

For now I can try and go with a per-interpreter cache, but maybe I’ll avoid specifying the behavior exactly as part of the PEP, so we have a bit more freedom to make changes as needed in response to changes in how subinterpreters work.

The one thing I’ll note is that I don’t want to make this broadening of the scope harder, and this cache is not a cache for performance purposes – it could cause bugs in peoples’ code if ZoneInfo objects from different caches were passed between interpreters.

That said, I believe if the ability to share objects between interpreters becomes broader, we may be able to switch to either a per-process cache shared between all interpreters (or a more complicated design with a per-process cache and per-interpreter caches that query the per-process cache), so I suppose there’s not much need to worry about the choices we make here making it harder to allow the sharing of objects between interpreters.

Apparently I can no longer edit the post with the PEP in it :frowning:, so that text is now out of date. The latest version of the PEP has moved the open issues for Windows ICU support and for different PYTHONTZPATH configuration options into the “Rejected Ideas” section, and I have one more PR to move the “Using the datetime module” section there as well.

I am also thinking that it might be a good idea to rename nocache() to .no_cache(), since that would be more consistent with the naming convention used with .from_file().

Other than that, I believe this is ready to be submitted to the SC for approval, but please if anyone has further comments or believes I have missed something, let me know.

As I mentioned on python-dev, I was sort of hoping this could get approved next Sunday during one of the southern hemisphere’s DST -> STD transitions, so that the “accepted” datetime is an ambiguous datetime somewhere on earth :slight_smile:.

When adjusting the PEP to clarify this, I realized why I originally wanted ZoneInfo.__str__ to work this way: it allows for an easy way to check whether the zone can be serialized by string, since str(zi) will be "" if no key was supplied.

Upon further consideration and in discussion with @barry, I decided that we’ll have __str__ fall back to __repr__ when no key is supplied, and add a key attribute to ZoneInfo, which will be None if no key was supplied, so zi.key is None can replace str(zi) == "".

There has been a decent amount of discussion about this PEP on the steering council thread on Github, and right now one of the remaining questions is @vstinner’s concerns about the __eq__ and __hash__ implementation, with the discussion starting here.

In the current implementation, I do not override __eq__ and __hash__, because the semantics of these things are very much geared around object equality, and so I think it makes sense to have object equality correspond to value equality. That said, I would say that in the abstract, there are at least four valid ways to consider two ZoneInfo objects to be equal, assuming ZoneInfo objects z1 and z2, I would say the most reasonable choices are:

  1. z1 == z2 if and only if z1 is z2
  2. z1 == z2 if z1.key == z2.key
  3. z1 == z2 if z1 and z2 have the same behavior for all datetimes - which is to say that all the transition information is the same.
  4. A combination of 2 and 3: z1 == z2 if they keys are the same and all the transitions are identical.

In almost all real cases, these will all give the same answer, because most people will be calling zoneinfo.ZoneInfo, which will always return the same object for the same key. However, there are some implications around the notion of equality that compares all transition information.

Unlike options 1 and 2, options 3 and 4 do provide extra, otherwise inaccessible, information about the zones, so while you can easily write a comparison function to mimic options 1 and 2 in a world using option 3, you cannot write a comparison function using option 3 in a world where we use option 1 or 2.

We would also presumably have the option of making it so that zoneinfo.ZoneInfo("UTC") == datetime.timezone.utc if we have a custom, value-based comparison method, which might conceivably be convenient for trying to “normalize” your UTC or other fixed-offset time zones (though I suspect this would only be really meaningful for UTC, and you can special-case that by checking against str(zi) == "UTC", which, incidentally, would work for pytz as well).

I think the most important thing about this is how it would affect how these things get hashed. If we go with option 2, then it would not be possible to hold two different instances of zones with the same key together in a set:

>>> s = {ZoneInfo("America/New_York",
...      ZoneInfo.no_cache("America/New_York")}
>>> s
{ZoneInfo('America/New_York')}

Which means that {dt.tzinfo for dt in bunch_of_datetimes} won’t necessarily give you all the ZoneInfo objects used in bunch_of_datetimes.

If we go with option 3, then zones that are links to one another or are distinct zones with the same behavior could not co-exist in a set together:

>>> s = {ZoneInfo("America/New_York"),
         ZoneInfo("US/Eastern")}
>>> s
{ZoneInfo('America/New_York')}

If we go with option 4, though, you wouldn’t be able to tell whether two zones are identical to one another even if they have different keys, so you can’t do something like this:

with open(some_file, "rb") as f:
    unknown_zi = ZoneInfo.from_file(f)

print(unknown_zi == ZoneInfo("America/New_York"))

You also wouldn’t have any way to detect whether two zones have the same behavior but different names (e.g. "US/Eastern" and "America/New_York").

In the end, I can sort of imagine uses for having some sort of value-based equality in ZoneInfo, but there’s no one obvious choice here. I don’t know why people would want to use these things as keys in a dictionary, but maybe they would. I can also see some reasons for putting them in a set, but nothing so common that there’s one obvious use case.

In terms of performance, option 1 is the cheapest for both hashing and equality, and options 3 and 4 are most expensive, but we can use a cache to at least make the hash comparison a one-time cost.

My proposal: I think that we should stick with option 1 (default implementation - comparison by object identity) for equality, because that most closely matches the semantics people will care about (and for the same reasons that we have pickle serializing by key).

If a lot of people are chafing at the inability to do “comparison by value”, in a future version we can offer an .equivalent_transitions() method that exposes the results of option 3. We would also have the option of changing __hash__ to be value-based in the future, since hash values aren’t guaranteed, and all we’d be doing is introducing some hash collisions, but that would allow people to create subclasses (or wrapper functions) with the __eq__ and __hash__ behavior described in either options 3 or 4.

2 Likes

I agree. It’s easier to add the right value comparison later than to take away over-engineering (also, YAGNI).

I particularly disagree with exposing all the transition data via eq. These should be opaque objects for most people, so at most the key is the identifier. Then you run into trouble with (not-)caching, but that’s the trouble with allowing people to bypass the cache, and I think we agreed that they just have to deal with extra complications? Object identity with the default/recommended usage gives the right result.

1 Like

If we’re not doing value comparisons, then “equality by object” and “equality by key” are equivalent for people using the standard constructor, so the distinction between those two is really only meaningful for people bypassing the cache.

I think people bypassing the cache are hopefully more likely to understand that object identity matters (though this may not necessarily be true if they’ve inherited a framework that does this), but they may not have a good intuition about what “equality” means. I think people will make the reasonable assumption that “object equality means the results are always the same when using these two zones”, which is something we can only guarantee with object identity or equivalent-transitions, so I think that from that perspective, you run into less trouble with __eq__ being equivalent to a is b than if you go with __eq__ being equivalent to a.key == b.key (plus if you make a mistake about what it means, you’ll probably notice a whole lot more quickly this way).

1 Like

My concern was about loading the same zone by two different means and get not identical objects (zone1 is not zone2) but equal transition data (what I wrote as “zone1 == zone2”).

The problem is that defining the equal operator opens a can of worms. For me, Europe/Paris and Europe/Brussels are the same because they have the same UTC offset today. But Europe has a long history, and France and Belgium didn’t always have the same time (I didn’t check, but I’m quite sure that these neighbors had different time a few days since 1900).

So yeah, I understand that a new “equivalent_transitions()” function or method should be designed to take in account what the user “expects”. It’s also ok to leave this problem out of the stdlib for now. For example, I can put such code in my application:

def same_utfoffset_now(zone1, zone2):
    now = datetime.datetime.now()
    return (zone1.utcoffset(now) == zone2.utcoffset(now))

Let’s start simple with the uncontroversial part and complete the API later :wink:

1 Like

Is this in Python 3.9.0a6?

Not yet. I hope that the implementation will land before Python 3.9.0 beta1. Currently, the implementation lives at: https://github.com/pganssle/zoneinfo

Yes, this will be available before the beta. It’s nearly ready now, it mostly just needs documentation.

I’m also planning on re-using the reference implementation as a stand-alone backport installable from PyPI on Python 3.7+, which will likely be ready even before the first beta release.

Would you consider TZPATH as the environment variable name? The aim being to establish a convention that other tools/languages can adopt, perhaps one that’s eventually standardized. Or is that too much to pile on at once?

A search found one clashing use in the Tezos test suite, but it’s fairly obscure.

Would calling the module timezoneinfo be a nice middle ground? It clarifies the type of zone is in question (as opposed to DNS zones), and it helps to mentally group it with datetime, but on the downside it’s quite long.

We can always add that later if other language stacks want to standardize on it, but keeping it namespaced under PYTHON seems like a safer bet (particularly if other language stacks use a different convention, e.g. they don’t allow a full search path, or they do allow relative paths, etc).

This PEP was accepted some time ago (yay). The implementation PR is here: https://github.com/python/cpython/pull/19909

Please comment on it ASAP, as the feature freeze is coming in soon, and I plan to merge it this weekend if not sooner.

This is now available on the 3.9 and master branches (importing it on *nix is broken in b1, unfortunately, but it works on Windows), and a backport to Python 3.7 is available on PyPI. :tada:

Thanks for all the work and comments everyone!

1 Like