How critical is it in practice that passing a string using the
Z format through
isoformat doesn’t give the same string as passed in, but changes the
+00:00? It seems a pretty minor discrepancy (and accepting both
+00:00 conforms to “be liberal in what you accept and strict in what you produce”). But I’m not an expert in the field, so there may be reasons why perfect round tripping is important.
How critical is it in practice that passing a string using the
To be clear, it is certainly possible to maintain the "
fromisoformat is only the inverse of
isoformat" without making
Z instead of
+00:00 by default since
isoformat() could always grow the capability to optionally emit
Z in place of
+00:00 with a feature flag, in which case
fromisoformat() would be required to implement
Z parsing. I wouldn’t want to add a feature flag to
isoformat() just to maintain an arbitrary contract, though, so I would only consider this option if there’s strong demand for emitting
isoformat - and I have not seen any issues on BPO requesting this, so it’s probably not that important to people.
That said, in some ways it would violate the spirit of the contract, which is that, as of right now,
fromisoformat is intended to be used only on the output of
.isoformat, which means that all the people who want it to parse
Z are in a sense using it in an unsupported way. Modifying it to start accepting these would probably lead to more people hitting bugs in production when parsing a valid ISO 8601 string generated by something other than
datetime.datetime.isoformat that happens to be in an unsupported format.
To be clear, currently the idea is that you should parse this name as "from
isoformat" rather than “from ISO format”, meaning that it constructs a datetime from the output of
fromisoformat. The same goes for
I strongly disagree with this idea, for the same reasons that @jdemeyer identifies.
This is not true, I just do not want to half-change the contract. If you look at the original issue in which I added
fromisoformat, the intention was always that we would start with "reverses
isoformat" because it is well-scoped and incredibly easy to explain what it does (it parses the result of
isoformat()). I think it would be acceptable for it to eventually grow something like a full ISO 8601 parser, but there are many UI challenges and decisions to be made there.
There is no requirement that strings can be round-tripped from
str, the only guarantee is in the other direction, so
dt == datetime.fromisoformat(dt.isoformat()) must be true. We are free to expand what
fromisoformat() parses and the main reason we have chosen not to is that it’s much more complicated to get it right.
I strongly disagree with this sentiment in most library code, as it tends to take clear specifications and make them very fuzzy and implementation-defined. In this case failing loudly on common mistakes that we can still interpret as a datetime is an early warning that you are using the function in an unsupported way. As of today, if you are not parsing the output of a
dt.isoformat() call or a string guaranteed to be in an equivalent format, you should not be using
datetime.fromisoformat, and if it works it only does so by accident.
My goals for a “general-purpose” ISO 8601 parser:
- It should support the entire
datetimeportion of the spec (or as nearly so as we can)
- It should have a way to specify which deviations from the spec are not allowed (e.g. no sub-minute offsets)
- It should be possible to specify that you want to support certain subsets of all supported functionality (e.g. RFC 3339, which is in some ways a subset and a superset of ISO 8601).
- It should support a minimum of deviations from ISO 8601 - essentially those that are specified to be changeable “by agreement” plus support for sub-minute time zone offsets.
- It should continue to “just work” on the output of
We will also need to decide what to do with the
--MM-DD formats, since they represent a concept that cannot be represented with
datetime.datetime. I believe the options are “don’t support at all”, “fill in the missing year from the current year” and “allow the user to specify the default value for the year”. The last two can also be combined (e.g. use current year by default but allow users to override it). I suspect if we didn’t support it no one would care, since most of the people who even know it exists are people who have tried implementing the spec.
Ah, if that’s the case then yes,
fromisoformat shouldn’t accept
Z. I didn’t realise that (although that’s my fault for not checking the docs). I guess the answer “if you want to parse more general ISO format dates, use a 3rd party library” stands, then. Which is fine for my needs, so I’ll stop offering uninformed opinions here
Well, it’s still pretty annoying that something called
fromisoformat doesn’t actually parse the ISO format. And the doc isn’t helpful, as it doesn’t give any alternative.
The way I see it:
ISO 8601 evolved from a very old stanard, designed for parsing by humans. The main problem it fixes is ambiguity in traditional formats like
10/9/12. If a human who’s never heard of the standard gets a ISO8601-encoded string, they’ll either parse it correctly or go „this is weird, I better ask the sender what they meant!“.
That’s very good news for the receiver (encoder).
ISO 8601 specifies how to encode a lot more than just datetime: things like durations, repeating intervals. It allows you to use week-based counting. It’s very useful if you want to express something, but it’s not at all practical if you want to write parser.
A complete parser for ISO 8601 is not only practical, but also not very useful.
But that’s okay: you can define a subset of ISO 8601 and write a parser for that.
If you need week-based counting, ISO 8601 will give you the best way to encode a week-based date, with all the nice properties (unambiguity, lexical sorting) and all the relevant information (like which edge cases are solved and which are still dangerous).
Writing the encoder/decoder with all the nice properties is then trivial.
That, for me, is the point of ISO 8601: it has extremely good guidance for selecting a datetime encoding. But it is not as a spec to be implemented.
Contrast with the other standard: RFC 3339. This is an encoding only of a moment in time, with timezone information – i.e. it’s limited only to what a Python
datetime stores. It has nearly all the nice properties of ISO 8601, because it’s a profile/subset. (Not a 100% strict one, but the deviation is well argued.) And crucially, it’s designed to be easily implementable (and testable) – it omits the arcane parts of the ISO standard that are largely irrelevant to
(Also, RFC 3339 is an open standard: not only can it be reasonably implemented, but anyone can also check if the implementation is actually correct.)
Now contrast with
datetime.__str__(), which has almost the same design goals as RFC 3339, but an additional one of being „human-friendly“. It replaces a
T (a computer-friendly separator) with a space (a human-friendly separator). RFC 3339 explicitly doesn’t allow this to keep a useful property :
Assuming [important details], then the date and time strings may be sorted as strings […] and a time-ordered sequence will result. The presence of optional punctuation would violate this characteristic.
How does ISO 8601 handle this? It tells you
T is the best choice, but allows other characters by „mutual agreement“ of sender and receiver. How typical of he ISO! It’s not a spec, but guidance for making your own spec.
Writing a parser that accepts
T or space (or anything else) isn’t a lot of work, and so Python’s
isoformat has an option to select the separator. It carefully passes the choice to define your own format on to the user.
In conclusion, ISO 8601 is not a good spec to implement for
datetime, but RFC 3339 is, and it’s a perfect match.
I’d like to quote Paul, but substitute the RFC for the ISO:
The good news is that we’re almost there: we’re missing details like the
- not having read the ISO standard
- being all words and no work
Unfortunately RFC 3339 is not a perfect match for
fromisoformat, since it requires a time zone, a requirement we most certainly do not have in
datetime. Additionally, it doesn’t cover some things that
isoformat() allows, such as sub-minute offsets.
Another wrinkle here is that RFC 3339 does not support the use of commas as a separator for fractional components, which is allowed in ISO 8601 and, unfortunately, is included in the default format for the
logging module - if we’re already being liberal in accepting anything that is allowed by “mutual consent” we should probably be able to parse the
logging module’s format.
Restricting ourselves only to the
datetime related portions of the spec, it’s actually not terribly difficult to write a fairly full-featured ISO 8601 parser once you know all the rules. I would also contend that a full-featured ISO 8601 parser is useful, just that supporting additional valid ISO 8601 formats has pretty severely diminishing marginal utility once you get away from the (pseudo-regex) forms
YYYY(-?MM(-?DD)?)?(.*HH:?MM:?SS([\.,]\d+)?([+-]HH:?MM([\.,]\d+)?)?)?. The marginal utility of adding additional formats is slight and the marginal cost of accidentally accepting invalid dates is also minor - it’s probably a net positive.
This is an interesting suggestion. The main problem is that you have two types of users: one form that has a bunch of datetimes and wants to parse them as long as they are any kind of valid format, and another kind who knows the format of the datetime (e.g. "it was generated by
isoformat()" or “the spec says it’s in RFC 3339” or “the spec says it’s ISO 8601”) and they want it to be an error if it’s not that. I have gotten requests for stricter versions of both
dateutil.parser.isoparse (which itself exists as a “strict” version of
It might not be such a bad thing to make the default
fromisoformat be maximally permissive (accept anything valid that is allowed “by mutual agreement”, plus extend the timezone offsets to accept any valid format for a naive time, and point people to
dateutil.parser.isoparse for a more configurable strict-subset behavior (though that still means I’d need to figure out that API for
datetime.fromisoformat(my_date.replace('Z', '+00:00')) really cover full RFC3339?
It’s an important RFC that’s used in many internet APIs. It’s the recommended way to represent moments in time in JSON Schema, OpenAPI etc.
So it would be nice to have it as a battery, and the stdlib is so tantalizingly close to providing it…
(As usual, an external library is more immediately useful because it can be used today, and python’s datetime is lacking some other things so people use external libraries anyway.)
I regularly have to work with integrations and standards in the ed-tech world where zulu-terminated datetimes are required (required to accept as input, and required to produce on output), and would be happy to have this functionality in the standard library.
Would you endorse a minimal patch that just adds support for zulu dates (and updates documentation)?
No, I realize that from a practical point of view it would be nice to have something that often just works, but we have deliberately designed it this way because it has a very clear scope and by not stepping outside that scope for practicality’s sake, people are more likely to learn early on that they are using the function incorrectly (i.e. for parsing datetimes not guaranteed to produce only the formats that
The preferred solution is to have a version of this function that will satisfy both the people who want to invert
datetime.isoformat and the people who want to parse ISO 8601 datetimes in general. I think Petr’s suggestion of leaving the feature flags for
dateutil.parser.isoparse and creating a liberal ISO 8601 parser might simplify things greatly, however.
In the meantime, I have seen no objections to using
dateutil.parser.isoparse other than “but it’s in a third party library”, which is not a great justification, particularly when that library is
dateutil - an incredibly popular library maintained by one of the maintainers of
datetime (me) and from which
datetime.fromisoformat was adapted in the first place. Best case scenario, we change the scope of
fromisoformat today, PEP 602 passes and you can get the same functionality you get out of
dateutil.parser.isoparse today in November 2020, assuming you are comfortable immediately upgrading your code to be Python 3.9-only. Given that timeline, I don’t think there’s enough urgency here that we should complicate the clearly-communicated scope of this function with a half-measure like supporting parsing “Z” for UTC.
Therefore, I would highly recommend replacing “the world’s most popular programming language” with “one of the world’s most popular programming languages”. This is far less controversial, but still represents the same user demand for compatibility.
It’s a 1.3MB dependency that for many of us only adds one feature: the ability to parse the letter Z.
I don’t find this persuasive, because the same logic applies to every change ever made to Python.
I respect your authority on this issue, though, and thank you for the hard work on datetime and dateutil. I’ll stick with
replace('Z', '+00:00') for now.
And it violates the robustness principle: “Be conservative in what you send, be liberal in what you accept”.
I’ve just ended up here after getting sick of writing hackish code such as:
if d.endswith('Z'): d = d[:-1] + '+00:00' return datetime.datetime.fromisoformat(d)
Z-suffixed date strings as the norm.
If your code interacts for instance with a Node.js server then you may see dates formatted like this:
$ node > d = new Date() 2020-01-20T20:48:30.971Z
fromisostring(), or rely on external libraries just to get the
Z parsed properly.
Please consider your tone.
Due to unfortunate naming, this is impractical — the full ISO-8601 format is large with arcane options like ordinal days, decimal fractions on minutes and much more. We can safely assume this will never happen in stdlib. I’m not sure any of the external packages that tried ever implemented 100% of the full standard.
dateutil doesn’t either (doc says it doesn’t parse fractional minutes). A couple years ago I searched several other languages too, and didn’t find anybody doing full 8601! [I guess being a pay-to-read standard, with long prose and no BNF, makes this a goal programmers just don’t care enough about…]
What most people actually mean when they think “ISO” is “as long as I pass a valid RFC-3339 string”.
It’s a 1.3MB dependency that for many of us only adds one feature: the ability to parse the letter Z.
To be fair, there are multiple smaller modules that don’t attempt ISO 8601 but only RFC 3339 (not sure if any of them is perfect, but hey if not, let’s perfect one before requesting stdlib does it ):
Let’s see, what are the actual points separating
fromisoformat from full RFC 3339?
“Z” or lowercase “z” — @blacklight86 note you code above doesn’t handle “z”.
4.3. Unknown Local Offset Convention.
Not clear how to best represent with
email.utils.parsedate_to_datetime set a precedent of returning a naive datatime, which you should understand as “UTC but with no indication of the actual source timezone”, which is… meh.
Leap seconds (section 5.7)?
fromisoformatseems to accept any character between date and time. Even a digit.
\x00! Makes sense because
isoformattakes an optional arg to emit any characters (space
" "is common but not only). But a “from RFC” function better only allow
"t", and optionally
While I respectfully disagree on value of supporting RFC3339, this is a very insightful comment, thanks.
I agree that the ISO 8601 is large and arcane, and probably it’s not really worth to implement everything in stdlib. However I would argue that:
Python isn’t the first language to bump into the problem of how to implement the ISO datetime standard. From what I know, Java isn’t fully ISO-8601 compliant, but at least it supports both the time offset and the
strptime). My point is that probably Python doesn’t have to reinvent the wheel, and it could see instead how other languages have tackled the problem (answer: most of them aren’t fully compliant either, but they at least do support some reasonable variations, such as the time offset and
A quick search on the internet for “
As long as those behind the ECMA standard keep saying “we’ve always returned UTC datetime strings with the
Zsuffix, we won’t change it now”, and those who develop Python keep saying “we’ve always only parsed the datetime strings generated by Python itself (with time offset), we don’t care about processing in stdlib those returned by default by other languages”, the divergence can only get worse.
You are critically missing at least 4.4. Unqualified local time - RFC 3339 is only suitable for aware datetimes and requires a tzoffset.
dateutil gives a slightly different result. Compare:
import dateutil.parser as dt date_time_obj = dt.parse('2020-02-25T13:13:43.913Z') date_time_obj # out: datetime.datetime(2020, 2, 25, 13, 13, 43, 913000, tzinfo=tzutc())
date_time_obj2 = datetime.fromisoformat('2020-02-25T13:13:43.913Z'.replace('Z', '+00:00')) date_time_obj2 # out: datetime.datetime(2020, 2, 25, 13, 13, 43, 913000, tzinfo=datetime.timezone.utc)
A difference in
Although that doesn’t seem to be a problem:
date_time_obj == date_time_obj2 # out: True
It would be really useful if the documentation explicitly stated that
Z is not parsed and must be replaced with