Expanding range of inputs for %Z in strptime

eugenetriguba · August 5, 2022, 11:22pm

Hi all, there has been an issue that seems to have come up here and there over the years with regard to the behavior of %Z in strptime. I wanted to reach out to discuss on what the best way to improve it’s current behavior might be and how to move forward the PR that is out there.

Original issue: %Z in strptime doesn't match EST and others · Issue #66571 · python/cpython · GitHub

Documentation on %Z:

Technical Detail: datetime — Basic date and time types — Python 3.12.1 documentation
Format Codes: datetime — Basic date and time types — Python 3.12.1 documentation

PR: gh-66571: Expand matches for %Z in strptime by eugenetriguba · Pull Request #93486 · python/cpython · GitHub

What is the issue?

Essentially, %Z is currently only setup to match UTC, GMT, and whatever timezone abbreviations are in the user’s locale. Because of this, it can lead to someone using %Z and having it error out on other valid timezone abbreviations.

Not only does it not match, the error message seems like it could be improved to describe the problem a little more clearly:

ValueError: time data '2016-12-04 08:00:00 EST' does not match format '%Y-%m-%d %H:%M:%S %Z'

I would think it’d be clearer if it said something like %Z didn’t match because EST isn’t in the list of available timezone abbreviations and then show what the user has available.

Prior Discussions

A while back, I had sent an email to Datetime-Sig to discuss this a little more and what would be a good way to resolve this. From that, I had learned a couple challenges with regard to %Z that I thought @pganssle put well:

TL;DR: There’s probably something more we can do before we close out #66571, but I doubt there will ever be a great way to make this work the way people think it intuitively should (mostly because people intuitively think that the 3-letter abbreviations 1. exist for all zones and 2. are globally unique).

with a more lengthy treatment here on the Github issue.

Since %Z isn’t unique and doesn’t exist for all zones, there isn’t a great way we can reliably parse it nor a great way to make tz-aware datetimes with them.

Possible Enhancements

Use zoneinfo?

I had initially thought that maybe with the addition of the zoneinfo module, we may be able to do more here to broaden the range of abbreviations that are accepted but still make sure they’re valid abbreviations. However, there doesn’t appear to be a simple way to look timezones up by abbreviation (ignoring the fact that a single abbreviation could match multiple timezones, maybe if we just wanted to confirm “yes, CST is valid” and leave it as a naive datetime unless an offset is specified with %z).

Given Paul’s comments on the Github issue and response on the datetime-sig mailing list, it seemed this wouldn’t be a good way to go.

Allow the end user to ensure it is valid themselves

Instead of saying these few abbreviations are valid, Paul had suggested maybe we just read until the next token for %Z or simply ensure it is within a certain range of characters? Basically, loosen the validation on %Z and allow the end user to ensure it is valid themselves if needed.

Tentative PR

After @pganssle comments, I had put in a PR with some additional questions (gh-66571: Expand matches for %Z in strptime by eugenetriguba · Pull Request #93486 · python/cpython · GitHub). What this does is add a 2-5 character check after doing the initial check we’re doing for UTC/GMT/abbreviations in user’s locale. I still had some open questions there I was waiting on and wanted to get others’ thoughts as well as confirm this is the direction we want to go here to resolve this bug.