PEP 615: Support for the IANA Time Zone Database in the Standard Library

stub42 · February 27, 2020, 8:47am

Hi. First, thanks for working on this. I’ve managed to put off similar work for about a decade now. I look forward being able to deprecate pytz, making it a thin wrapper around the standard library when run with a supported Python. This kind of needs to happen before 2038, as pytz dates from before the newest tzfile format and does not handle the later timestamps.

On the serialization section, what is really being discussed is the difference between timestamps (fixed instances in time), and wallclock times (time in a location, subject to changes made by politicians, bureaucrats and religious leaders). If I serialize ‘2022-06-05 14:00 Europe/Berlin’ today, and deserialize it in two years time after Berlin has ratified EU recommendations and abolished DST, then there are two possible results. If my application requires calendaring semantics, when deserializing I want to apply the current timezone definition, and my appointment at 2pm in Berlin is still at 2pm in Berlin. Because I need wallclock time (the time a clock hung on the wall in that location should show). If I wanted a fixed timestamp, best practice is to convert it to UTC to avoid all the potential traps, but it would also be ok to deserialize the time using the old, incorrect offset it was stored with and end up with 1pm wallclock time.

The PEP specifies that datetimes get serialized with all transition data. That seems unnecessary, as the transition data is reasonably likely to be wrong when it is de-serialized, and I can’t think of any use cases where you want to continue using the wrong data. To deserialize a local timestamp as a fixed point in time, you only need the local timestamp and the offset. Perpetuating the use of wrong data is going to end up with all sorts of problems and confusion, where you will end up with several versions of a timezone each with unique rules and giving different results. At some point, you are going to need to convert the data using the old timezone rules to the current timezone rules, which seems to be exactly the sort of problem we had with the pytz API. Failing to normalize the datetime will cause apps to spit out nonsense to end users, such as timestamps that no longer exist (skipped over by new DST transition rules), or ordering issues (wallclock timestamps using old rules compare earlier or later than wallclock timestamps using current rules).

I think better options are to either serialize as a) wallclock time (datetime + zoneinfo key), or b) local timestamp (datetime + offset + optional zoneinfo key), or c) UTC timestamp (utcdatetime + optional offset + optional zoneinfo key). Even if this means special casing custom zoneinfo datafiles, which I suspect will be rare or non-existent outside of the Python test suite.

While a) is often what you want for calendaring applications (and what you get with pytz), it could cause problems in general use because there is no fixed ordering. Data structures will fail if they rely on stable ordering of local timestamps, and I can’t see a way of forcing people to use fixed timestamps instead of wall time beyond hoping they read the documentation.

b) & c) store fixed timestamps, and let you round trip if all three components are included. With b) a question needing to be answered is if the fixed timestamp is corrected when deserialized (if the offset doesn’t match the current zoneinfo rules, it can be adjusted), or if the current zoneinfo rules only take affect when arithmetic starts happening. ie. is ‘repr(unpickle(d)) == repr(unpickle(d) + timedelta(0)’ true ? With c) timestamps would be adjusted to current rules when deserialized.

For comparision, PostgreSQL went with c). Storing a ‘timestamp with timezone’ just stores a UTC timestamp, and information about the source timezone and offset is lost. See https://www.postgresql.org/docs/12/datatype-datetime.html#DATATYPE-TIMEZONES

This all affects how ZoneInfo.nocache and arithmetic work too. As proposed, we can have multiple Europe/Berlin ZoneInfo with different rules. They are sticky, so a datetime referencing an obsolete ZoneInfo is going to keep doing calculations using the obsolete rules. I’m thinking that it would be better if ZoneInfo.nocache would replace the existing cached version, flagging the existing cached version as expired. Existing datetime instances would be unaffected, as their tzinfo would still reference the obsolete ZoneInfo data. But arithmetic would notice the ZoneInfo has been superseded and the result would be using the latest ZoneInfo.