Parse "Z" timezone suffix in datetime

NumesSanguis · February 25, 2020, 2:48pm

Most have been said here already, just wanted to add that I also arrived at this page after receiving JSON data from a JavaScript application.

Also, using dateutil gives a slightly different result. Compare:

import dateutil.parser as dt
date_time_obj = dt.parse('2020-02-25T13:13:43.913Z')
date_time_obj
# out: datetime.datetime(2020, 2, 25, 13, 13, 43, 913000, tzinfo=tzutc())

date_time_obj2 = datetime.fromisoformat('2020-02-25T13:13:43.913Z'.replace('Z', '+00:00'))
date_time_obj2
# out: datetime.datetime(2020, 2, 25, 13, 13, 43, 913000, tzinfo=datetime.timezone.utc)

A difference in tzinfo with tzutc() vs datetime.timezone.utc.
Although that doesn’t seem to be a problem:

date_time_obj == date_time_obj2
# out: True

borpin · June 4, 2020, 8:37am

It would be really useful if the documentation explicitly stated that Z is not parsed and must be replaced with +00:00

seb · October 27, 2020, 3:02am

I think one thing that might be worth to add is that Django’s DjangoJSONEncoder also uses ‘Z’. This is the reason that I ended up here.

So it’s not just other popular languages like Javascript that uses the format. One of the most popular Python web frameworks uses it for JSON encoding datetimes.

ndamclean · January 13, 2021, 8:50pm

I wanted to note that the replace('Z', '00+00') workaround makes parsing dates around 5x as slow (I ran a benchmark of various date parsing libraries and functions; see linked gist).

I think having a very fast ISO parsing function in the python standard library is important.

Date parsing benchmark (requires packages: pytest, pytest-benchmark, python-dateutil, ciso8601, iso8601)

gist.github.com

https://gist.github.com/ndamclean/2c1113ea40199be7edd43a506748ffd7

bench_dateparse.py

import ciso8601
import iso8601
import pytest
from datetime import datetime
from dateutil import parser


TIME_STR = '2019-08-28T14:34:25.518993+00:00'
TIME_STR_Z = '2019-08-28T14:34:25.518993Z'

This file has been truncated. show original

dimaqq · July 21, 2021, 12:46am

Another gotcha is that datetime.datetime.fromisoformat is picky about fractional seconds:

2020-01-01T12:33:56
2020-01-01T12:33:56.0
2020-01-01T12:33:56.000 (milliseconds)
2020-01-01T12:33:56.0000
2020-01-01T12:33:56.000000 (microseconds)
2020-01-01T12:33:56.00000 # commonly generated by JavaScript libraries
2020-01-01T12:33:56.0000000
2020-01-01T12:33:56.000000000 (nanoseconds) # golang, tc39 Temporal

zaytsev · September 12, 2022, 5:55pm

So, it’s been awhile now and I wanted to ask if there is still a strong opposition to a minimal code and documentation patch adding support for Z?

We have to interact with other languages a lot and depending on third-party packages is very cumbersome for stdlib-only CI scripts and such.

Timezone stuff landing in Python was a huge deal, because we had to depend on third-party stuff in the past, and now we don’t have to anymore. Getting this small wart out of the way is a small step for Python developers, but a big step for humanity.

vovavili · September 13, 2022, 4:18pm

datetime.fromisoformat() is the inverse operation of datetime.isoformat(), which is to say that every valid input to datetime.fromisoformat() is a possible output of datetime.isoformat(), and every possible output of datetime.isoformat() is a valid input to datetime.fromisoformat().

I am not sure if I fully am in line with this objection. If “Z” is taken to denote the same referent as “00+00”, then the symmetry still stands:

Python has symmetrical functions datetime.fromisoformat() and datetime.isoformat(), which point to the same referent.
Python adds an ad-hoc rule to datetime.fromisoformat(), which makes this function recognize one input (strings ending with “Z”) as having the same referent as another (strings ending with “+00:00”).
Python now has symmetrical functions datetime.fromisoformat() and datetime.isoformat() which point to the same referent, but the former has been enriched with an ad-hoc rule.

Do Python developers want to avoid scenarios where "2014-12-10 12:00:00Z" is transformed into a Python object, which then gets translated back into a different string with the same referent "2014-12-10 12:00:00+00:00"? I personally see nothing wrong or asymmetrical with this conversion rule, since I cannot envision a scenario where this might cause problems further down the road. Are there any reasonable kind of scenarios where one would strictly expect a datetime object generated from a string with an explicit “Z” to never covert back to a string with an explicit “+00:00”?

cben · November 21, 2024, 4:51pm

[EDIT: ignore this message, see next one.

I wrote this whole message noticing mild strptime improvements but before I realized fromisoformat is much improved since 3.11. ]

Turns out datetime.datetime.strptime() does accept Z suffix when parsing %z specifier :

Changed in version 3.7: When the %z directive is provided to the strptime() method, the UTC offsets can have a colon as a separator between hours, minutes and seconds. For example, '+01:00:00' will be parsed as an offset of one hour. In addition, providing 'Z' is identical to '+00:00'.

This emits tz-aware object, using timedelta, so achieves the important goal of reading an unambiguous moment in time, with stdlib, without packaging a TZ database

>>> datetime.datetime.strptime('2024-10-12t06:29:22.1z'.upper(), '%Y-%m-%dT%H:%M:%S.%f%z')
datetime.datetime(2024, 10, 12, 6, 29, 22, 100000, tzinfo=datetime.timezone.utc)
>>> datetime.datetime.strptime('2024-10-12T06:29:22.12345+0030', '%Y-%m-%dT%H:%M:%S.%f%z')
datetime.datetime(2024, 10, 12, 6, 29, 22, 123450, tzinfo=datetime.timezone(datetime.timedelta(seconds=1800)))

So I’ll hijack this thread for a more modest goal — not touching fromisoformat(), just having ingredients in stdlib for reasonably parsing RFC3339.

Non-goal: raising strict errors for every deviation from the RFC.

Incomplete state of datetime.strptime (testing on 3.13.0, linux, glibc-2.39-22):

[I’m not talking of time.strptime which is slightly different and less suitable]

dateTtime separator & case

NOTE: Per [ABNF] and ISO8601, the “T” and “Z” characters in this syntax may alternatively be lower case “t” or “z” respectively.

This date/time format may be used in some environments or contexts that distinguish between the upper- and lower-case letters ‘A’-‘Z’ and ‘a’-‘z’ (e.g. XML).
Specifications that use this format in such environments MAY further limit the date/time syntax so that the letters ‘T’ and ‘Z’ used in the date/time syntax must always be upper case.
Applications that generate this format SHOULD use upper case letters.

NOTE: ISO 8601 defines date and time separated by “T”.
Applications using this syntax may choose, for the sake of readability, to specify a full-date and full-time separated by (say) a space character.

strptime T does match lowercase t as well (or vice versa).
strptime T will not match a space or any other separator.
[Unlike fromisoformat which accepts any character whatsoever.]

The RFC seems vague whether a space or other separators are conformant??
Would one know when specifically the input is e.g. “like rfc3339 only with space”, or SHOULD one accept any character whenever they parse rfc timestamps? If the latter, there is no easy way to use strptime? Well maybe slicе it
```
datetime.datetime.strptime(s[:10] + 'T' + s[11:], '%Y-%m-%dt%H:%M:%S.%f%z')
```
strptime %z accepts uppercase Z but refuses to parse lowercase z. Easy enough to call .upper() first but would be nice if %z accepted both…

Fractional seconds

RFC says either omit fraction, or include 1+ digits, no limit.

partial-time = time-hour ":" time-minute ":" time-second [time-secfrac]
time-secfrac = "." 1*DIGIT    # 1* is ABNF for 1 or more

Well by now (Python 3.13.0) I see fromisoformat() accepts anything from no fraction, 1 to 6 digits (those are parsed), as well as ANY higher number of digits (9 or 10000…) of which only first 6 are kept, datetime having µsec resolution

strptime()'s %f OTOH is lacking:

.%f accepts 1–6 fractional seconds.
.%f refuses to parse (“does not match format”) date without fractional seconds. Neither ...T06:29:22Z (RFC compliant) nor ...T06:29:22.Z (invalid) work.
=> You must retry with and without .%f.
Annoying, but kinda natural with the way formats work, even if %f accepted 0 digits, literal . still must match a period?
.%f refuses to parse fraction with >6 digits.
This is the worst gap IMO because optional variable-length fractions come before %z of also varying length, so are hard to trim.
Well, re.sub(r'\.([0-9]{6})[0-9]+', r'.\1', s) keeps first 6 digits.

TZ offset parsing is good

%z can parse all forms RFC allows except lowercase z: Z +01:23, -23:45.
%z can also parse seconds +01:23:45, without colons -0123, +012345 and even fractions 2024-10-12T06:29:22.1+01:23:45.000678 (afaik nobody needs those, it’s just supported because timedelta has µsec resolution for other uses).

Naive objects

strftime %z is valid for naive objects, emits empty string — but strptime %z refuses to parse them. If you care you’d have to retry without %z.
But that’s OK , deliberately out of scope of RFC 3339:

Since interpretation of an unqualified local time zone will fail in approximately 23/24 of the globe, the interoperability problems of unqualified local time are deemed unacceptable for the Internet.

Leap seconds

Unparsable, but datetime currently can’t represent them anyway

>>> datetime.datetime.strptime('2005-12-31T23:59:60Z', '%Y-%m-%dT%H:%M:%S%z')
Traceback (most recent call last):
  File "<python-input-188>", line 1, in <module>
    datetime.datetime.strptime('2005-12-31T23:59:60Z', '%Y-%m-%dT%H:%M:%S%z')
    ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.13/_strptime.py", line 584, in _strptime_datetime
    return cls(*args)
ValueError: second must be in 0..59

>>> datetime.datetime(2005, 12, 31, 23, 59, 60, tzinfo=datetime.timezone.utc)
Traceback (most recent call last):
  File "<python-input-185>", line 1, in <module>
    datetime.datetime(2005, 12, 31, 23, 59, 60, tzinfo=datetime.timezone.utc)
    ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: second must be in 0..59

Unknown offset out of scope IMHO?

Section 4.3. Unknown Local Offset Convention suggests using -00:00 to mean “same time moment as +00:00/Z but unknown where on earth”. datetime today just has no way to represent this subtlety. We MUST use a TZ-aware object either way. So nothing to do.

Summary wishlist

[EDIT: with 3.11 improvements fromisoformat(s.upper()) is nicer than any strptime approximation. The following are YAGNI]

~~But as discussed fromisoformat is bound by other goals~~, whereas these sound harmless to me:

datetime.strptime %z to accept lowercase z.
datetime.strptime %f to accept and discard >6 digits.

Or would these be considered breaking compatibility for “strict” users relying on these raising errors?

That still leaves the nuisance of seconds .fraction being optional:

try:
    d = datetime.datetime.strptime(s, '%Y-%m-%dT%H:%M:%S.%f%z')
except ValueError:
    d = datetime.datetime.strptime(s, '%Y-%m-%dT%H:%M:%S%z')

In a green-field world I’d suggest making up some notation e.g. %.f…
But AFAICT no current % directive can match empty string, it’d introduces back-tracking issues?
Also, the % space is very crowded, with both C standard and de-facto libc implementations growing in future, need to think twice before adding our owns(?)

Well WDYT of strptime taking tuple of formats and trying them in order?

RFC3339_FORMATS = ('%Y-%m-%dT%H:%M:%S.%f%z', '%Y-%m-%dT%H:%M:%S%z')
d = datetime.strptime(s, RFC3339_FORMATS)

~~Is all this better than adding a tailored fromrfc3339 method, or using existing external packages?~~

For stdlib, IMHO yes — same improvements could help parse other reasonable time formats.
For example “like 3339 but with space”, other subsets of 8601, English style with 12h am/pm but long fractional seconds, etc. …

cben · November 21, 2024, 5:06pm

Wait, I missed the news that in 3.11 fromisoformat was deliberately made muchmore flexible:

datetime.date.fromisoformat(), datetime.time.fromisoformat() and datetime.datetime.fromisoformat() can now be used to parse most ISO 8601 formats (barring only those that support fractional hours and minutes). (Contributed by Paul Ganssle in gh-80010.)

Cruicially, it now handles uppercase ‘Z’ out of the box! And fractions, with no limitations.
[Still needs .upper() to handle ‘z’?]
=> All I wrote above about strptime is irrelevant (YAGNI), sorry for the noise.

Consider this thread done.