Parse "Z" timezone suffix in datetime

This is already opened as BPO 35829 but I wanted to ask about it over here for discussion.

Problem Statement

The function datetime.fromisoformat() parses a datetime in ISO-8601, format:

>>> datetime.fromisoformat('2019-08-28T14:34:25.518993+00:00')
datetime.datetime(2019, 8, 28, 14, 34, 25, 518993, tzinfo=datetime.timezone.utc)

The timezone offset in my example is +00:00, i.e. UTC. The ISO-8601 standard (for which fromisoformat() is presumably named) allows “Z” to be used instead of the zero offset, i.e. 2019-08-28T14:34:25.518993Z, however fromisoformat() cannot parse this:

>>> datetime.fromisoformat('2019-08-28T14:34:25.518993Z')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Invalid isoformat string: '2019-08-28T14:34:25.518993Z'

Paul Ganssle (@pganssle) is the maintainer of the dateutil library and has made numerous improvement’s to the standard library datetime as well. (Thanks, Paul!) Paul suggested that I should post here to brainstorm possible improvements in the API.

The dateutil library does include support for parsing the Z suffix:

>>> from dateutil import parser
>>> parser.isoparse('2019-08-28T14:34:25.518993Z')
datetime.datetime(2019, 8, 28, 14, 34, 25, 518993, tzinfo=tzutc())

This feels like a missing battery in the standard library, especially since other systems may produce dates that end in “Z”. One big example is JavaScript. (You can run this in your browser console right now!)

>> new Date().toISOString()
"2019-08-28T14:34:25Z"

If you have a web browser (or a node.js system) that sends you an ISO-8601 date in UTC, then you can’t parse it with Python’s standard library.

The obvious workaround (that my colleagues and I have committed to muscle memory at this point) is datetime.fromisoformat(my_date.replace('Z', '+00:00')). This works but it is verbose and this seems like a missing battery in the standard library.

Rejected idea

Paul doesn’t want to break the existing contract:

datetime.fromisoformat() is the inverse operation of datetime.isoformat(), which is to say that every valid input to datetime.fromisoformat() is a possible output of datetime.isoformat(), and every possible output of datetime.isoformat() is a valid input to datetime.fromisoformat().

Therefore, if fromisoformat() can parse the Z suffix, then isoformat() will need to emit the Z suffix instead of +00:00, which could create a backwards compatibility issues. But then fromisoformat() wouldn’t be able to parse the +00:00 suffix anymore. Therefore, this idea cannot be accepted without breaking the contract.

Proposed Idea

The name fromisoformat() is a bit unfortunate because it doesn’t handle the full ISO-8601 spec. In fact, the spec is quite broad and covers issues that don’t matter in the datetime class such as representing dates, times, and intervals. Furthermore, the ISO spec isn’t an open standard as far as I know. (it looks like I would need to pay money to ISO if I wanted a copy to read?)

However there is a simplified standard that is open: RFC-3339. I suggest adding new methods datetime.rfcformat() and datetime.fromrfcformat() that implement this RFC. As a consequence, this would also allow us to parse dates ending in Z.

Let me know thoughts on this issue. Thanks!

3 Likes

Having two functions that are almost-but-not-quite-the-same sounds very confusing. I think it’s a worse solution than simply slightly extending the existing functions.

4 Likes

How critical is it in practice that passing a string using the Z format through fromisoformat and isoformat doesn’t give the same string as passed in, but changes the Z to +00:00? It seems a pretty minor discrepancy (and accepting both Z and +00:00 conforms to “be liberal in what you accept and strict in what you produce”). But I’m not an expert in the field, so there may be reasons why perfect round tripping is important.

5 Likes

To be clear, it is certainly possible to maintain the "fromisoformat is only the inverse of isoformat" without making isoformat emit Z instead of +00:00 by default since isoformat() could always grow the capability to optionally emit Z in place of +00:00 with a feature flag, in which case fromisoformat() would be required to implement Z parsing. I wouldn’t want to add a feature flag to isoformat() just to maintain an arbitrary contract, though, so I would only consider this option if there’s strong demand for emitting Z in isoformat - and I have not seen any issues on BPO requesting this, so it’s probably not that important to people.

That said, in some ways it would violate the spirit of the contract, which is that, as of right now, fromisoformat is intended to be used only on the output of .isoformat, which means that all the people who want it to parse Z are in a sense using it in an unsupported way. Modifying it to start accepting these would probably lead to more people hitting bugs in production when parsing a valid ISO 8601 string generated by something other than datetime.datetime.isoformat that happens to be in an unsupported format.

To be clear, currently the idea is that you should parse this name as "from isoformat" rather than “from ISO format”, meaning that it constructs a datetime from the output of fromisoformat. The same goes for datetime.fromisocalendar and datetime.fromtimestamp.

I strongly disagree with this idea, for the same reasons that @jdemeyer identifies.

This is not true, I just do not want to half-change the contract. If you look at the original issue in which I added fromisoformat, the intention was always that we would start with "reverses isoformat" because it is well-scoped and incredibly easy to explain what it does (it parses the result of isoformat()). I think it would be acceptable for it to eventually grow something like a full ISO 8601 parser, but there are many UI challenges and decisions to be made there.

There is no requirement that strings can be round-tripped from str -> datetime -> str, the only guarantee is in the other direction, so dt == datetime.fromisoformat(dt.isoformat()) must be true. We are free to expand what fromisoformat() parses and the main reason we have chosen not to is that it’s much more complicated to get it right.

I strongly disagree with this sentiment in most library code, as it tends to take clear specifications and make them very fuzzy and implementation-defined. In this case failing loudly on common mistakes that we can still interpret as a datetime is an early warning that you are using the function in an unsupported way. As of today, if you are not parsing the output of a dt.isoformat() call or a string guaranteed to be in an equivalent format, you should not be using datetime.fromisoformat, and if it works it only does so by accident.

My goals for a “general-purpose” ISO 8601 parser:

  1. It should support the entire datetime portion of the spec (or as nearly so as we can)
  2. It should have a way to specify which deviations from the spec are not allowed (e.g. no sub-minute offsets)
  3. It should be possible to specify that you want to support certain subsets of all supported functionality (e.g. RFC 3339, which is in some ways a subset and a superset of ISO 8601).
  4. It should support a minimum of deviations from ISO 8601 - essentially those that are specified to be changeable “by agreement” plus support for sub-minute time zone offsets.
  5. It should continue to “just work” on the output of datetime.isoformat.

We will also need to decide what to do with the --MMDD and --MM-DD formats, since they represent a concept that cannot be represented with datetime.datetime. I believe the options are “don’t support at all”, “fill in the missing year from the current year” and “allow the user to specify the default value for the year”. The last two can also be combined (e.g. use current year by default but allow users to override it). I suspect if we didn’t support it no one would care, since most of the people who even know it exists are people who have tried implementing the spec.

Ah, if that’s the case then yes, fromisoformat shouldn’t accept Z. I didn’t realise that (although that’s my fault for not checking the docs). I guess the answer “if you want to parse more general ISO format dates, use a 3rd party library” stands, then. Which is fine for my needs, so I’ll stop offering uninformed opinions here :slightly_smiling_face:

Well, it’s still pretty annoying that something called fromisoformat doesn’t actually parse the ISO format. And the doc isn’t helpful, as it doesn’t give any alternative.

4 Likes

The way I see it:

ISO 8601 evolved from a very old stanard, designed for parsing by humans. The main problem it fixes is ambiguity in traditional formats like 10/9/12. If a human who’s never heard of the standard gets a ISO8601-encoded string, they’ll either parse it correctly or go „this is weird, I better ask the sender what they meant!“.
That’s very good news for the receiver (encoder).

ISO 8601 specifies how to encode a lot more than just datetime: things like durations, repeating intervals. It allows you to use week-based counting. It’s very useful if you want to express something, but it’s not at all practical if you want to write parser.
A complete parser for ISO 8601 is not only practical, but also not very useful.

But that’s okay: you can define a subset of ISO 8601 and write a parser for that.
If you need week-based counting, ISO 8601 will give you the best way to encode a week-based date, with all the nice properties (unambiguity, lexical sorting) and all the relevant information (like which edge cases are solved and which are still dangerous).
Writing the encoder/decoder with all the nice properties is then trivial.

That, for me, is the point of ISO 8601: it has extremely good guidance for selecting a datetime encoding. But it is not as a spec to be implemented.


Contrast with the other standard: RFC 3339. This is an encoding only of a moment in time, with timezone information – i.e. it’s limited only to what a Python datetime stores. It has nearly all the nice properties of ISO 8601, because it’s a profile/subset. (Not a 100% strict one, but the deviation is well argued.) And crucially, it’s designed to be easily implementable (and testable) – it omits the arcane parts of the ISO standard that are largely irrelevant to datetime.

(Also, RFC 3339 is an open standard: not only can it be reasonably implemented, but anyone can also check if the implementation is actually correct.)

Now contrast with datetime.__str__(), which has almost the same design goals as RFC 3339, but an additional one of being „human-friendly“. It replaces a T (a computer-friendly separator) with a space (a human-friendly separator). RFC 3339 explicitly doesn’t allow this to keep a useful property :

Assuming [important details], then the date and time strings may be sorted as strings […] and a time-ordered sequence will result. The presence of optional punctuation would violate this characteristic.

How does ISO 8601 handle this? It tells you T is the best choice, but allows other characters by „mutual agreement“ of sender and receiver. How typical of he ISO! It’s not a spec, but guidance for making your own spec.
Writing a parser that accepts T or space (or anything else) isn’t a lot of work, and so Python’s isoformat has an option to select the separator. It carefully passes the choice to define your own format on to the user.


In conclusion, ISO 8601 is not a good spec to implement for datetime, but RFC 3339 is, and it’s a perfect match.

I’d like to quote Paul, but substitute the RFC for the ISO:

The good news is that we’re almost there: we’re missing details like the Z.


Apologies for:

  • not having read the ISO standard
  • being all words and no work
1 Like

Unfortunately RFC 3339 is not a perfect match for fromisoformat, since it requires a time zone, a requirement we most certainly do not have in datetime. Additionally, it doesn’t cover some things that isoformat() allows, such as sub-minute offsets.

Another wrinkle here is that RFC 3339 does not support the use of commas as a separator for fractional components, which is allowed in ISO 8601 and, unfortunately, is included in the default format for the logging module - if we’re already being liberal in accepting anything that is allowed by “mutual consent” we should probably be able to parse the logging module’s format.

Restricting ourselves only to the date, time and datetime related portions of the spec, it’s actually not terribly difficult to write a fairly full-featured ISO 8601 parser once you know all the rules. I would also contend that a full-featured ISO 8601 parser is useful, just that supporting additional valid ISO 8601 formats has pretty severely diminishing marginal utility once you get away from the (pseudo-regex) forms YYYY(-?MM(-?DD)?)?(.*HH:?MM:?SS([\.,]\d+)?([+-]HH:?MM([\.,]\d+)?)?)?. The marginal utility of adding additional formats is slight and the marginal cost of accidentally accepting invalid dates is also minor - it’s probably a net positive.

This is an interesting suggestion. The main problem is that you have two types of users: one form that has a bunch of datetimes and wants to parse them as long as they are any kind of valid format, and another kind who knows the format of the datetime (e.g. "it was generated by isoformat()" or “the spec says it’s in RFC 3339” or “the spec says it’s ISO 8601”) and they want it to be an error if it’s not that. I have gotten requests for stricter versions of both dateutil.parser.parse and dateutil.parser.isoparse (which itself exists as a “strict” version of parse).

It might not be such a bad thing to make the default fromisoformat be maximally permissive (accept anything valid that is allowed “by mutual agreement”, plus extend the timezone offsets to accept any valid format for a naive time, and point people to dateutil.parser.isoparse for a more configurable strict-subset behavior (though that still means I’d need to figure out that API for dateuil.parser.isoparse).

Does datetime.fromisoformat(my_date.replace('Z', '+00:00')) really cover full RFC3339?

It’s an important RFC that’s used in many internet APIs. It’s the recommended way to represent moments in time in JSON Schema, OpenAPI etc.
So it would be nice to have it as a battery, and the stdlib is so tantalizingly close to providing it…

(As usual, an external library is more immediately useful because it can be used today, and python’s datetime is lacking some other things so people use external libraries anyway.)

4 Likes

I regularly have to work with integrations and standards in the ed-tech world where zulu-terminated datetimes are required (required to accept as input, and required to produce on output), and would be happy to have this functionality in the standard library.

1 Like

Paul, I know you favor the idea of a more comprehensive ISO-8601 parser, but you have stated it is tricky to design the API (e… feature flags) and nobody in this thread is asking for broader ISO-8601 support. They just want want to be able parse the date strings created by the world’s most popular programming language (JavaScript).

Would you endorse a minimal patch that just adds support for zulu dates (and updates documentation)?

No, I realize that from a practical point of view it would be nice to have something that often just works, but we have deliberately designed it this way because it has a very clear scope and by not stepping outside that scope for practicality’s sake, people are more likely to learn early on that they are using the function incorrectly (i.e. for parsing datetimes not guaranteed to produce only the formats that datetime.isoformat produces).

The preferred solution is to have a version of this function that will satisfy both the people who want to invert datetime.isoformat and the people who want to parse ISO 8601 datetimes in general. I think Petr’s suggestion of leaving the feature flags for dateutil.parser.isoparse and creating a liberal ISO 8601 parser might simplify things greatly, however.

In the meantime, I have seen no objections to using dateutil.parser.isoparse other than “but it’s in a third party library”, which is not a great justification, particularly when that library is dateutil - an incredibly popular library maintained by one of the maintainers of datetime (me) and from which datetime.fromisoformat was adapted in the first place. Best case scenario, we change the scope of fromisoformat today, PEP 602 passes and you can get the same functionality you get out of dateutil.parser.isoparse today in November 2020, assuming you are comfortable immediately upgrading your code to be Python 3.9-only. Given that timeline, I don’t think there’s enough urgency here that we should complicate the clearly-communicated scope of this function with a half-measure like supporting parsing “Z” for UTC.

1 Like

As far as I’m aware, there’s no reliable means of determining the most popular programming language. The stackoverflow 2019 survey shows JavaScript on the top, but the PyPL Index and TIOBE Index would suggest differently. There’s of course many other sources with differing results.

Therefore, I would highly recommend replacing “the world’s most popular programming language” with “one of the world’s most popular programming languages”. This is far less controversial, but still represents the same user demand for compatibility.

It’s a 1.3MB dependency that for many of us only adds one feature: the ability to parse the letter Z.

I don’t find this persuasive, because the same logic applies to every change ever made to Python.

I respect your authority on this issue, though, and thank you for the hard work on datetime and dateutil. I’ll stick with replace('Z', '+00:00') for now.