Deprecating `utcnow` and `utcfromtimestamp`

Previously, we have documented that utcnow and utcfromtimestamp should not be used, but we didn’t go so far as to actually deprecate them, and I wrote a whole article about how you shouldn’t use them.

The main reason I had for not deprecating them at the time was that .utcnow() is faster than .now(datetime.UTC), and if you are immediately converting the datetime to a string, like datetime.utcnow().isoformat(), there’s no danger.

I have come around to the idea that this type of use case is not important enough to leave the attractive nuisances of utcnow() and utcfromtimestamp() in place, and we should go ahead and deprecate them.

I’ve opened an issue about doing this, and also prepared a PR, but I wanted to also open a discourse thread for more visibility. I’ll note that in the deprecation PR I remove all of our internal uses of utcnow and utcfromtimestamp and found that everyone was using them correctly, but I think this is atypical.

The main downside here is that for the use case of “I want the time in UTC and I immediately format it without %Z”, the alternative is slower and more unwieldy (benchmarks on 3.11.3):

>>> %timeit datetime.now(UTC).replace(tzinfo=None).isoformat(' ')
2.15 µs ± 19.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
>>> %timeit datetime.now(UTC).isoformat(' ')[:-6]
1.61 µs ± 23.7 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>>> %timeit datetime.utcnow().isoformat(' ')
919 ns ± 5.23 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

As an example of how this changes the speed in a real-life application, here are the before and after measurements for the change to http.cookiejar.time2isoz:

>>> t = datetime.now().timestamp()
>>> %timeit cookiejar(None)  # Uses datetime.now
1.52 µs ± 16.2 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>>> %timeit cookiejar_utc(None)  # Uses datetime.utcnow
1.32 µs ± 6.72 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

>>> %timeit cookiejar(t)  # Uses datetime.fromtimestamp
1.77 µs ± 24.2 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>>> %timeit cookiejar_utc(t)  # Uses datetime.utcfromtimestamp
1.4 µs ± 5.75 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

I’m still more or less convinced that this is useful to do, and I’d like to see if anyone complains that it’s a major problem after the deprecation before worrying about these micro-optimizations.

7 Likes

I think deprecation is fine, as long as there are new APIs which return aware datetime instances in the UTC timezone, which don’t require having to write datetime.datetime.now(datetime.UTC) every time you want to get the current time.

We should encourage people to always use UTC datetime values, since anything timezone related is hard, creates situations which are not future-proof (timezones can easily change in the future), not necessarily past-proof (timezone information is not always available or correct) and sometimes even ambiguous (during DST switching times or when governments adjust their timezones).

Making access to such datetime values harder and require more typing won’t achieve such a goal.

Fortunately, this is easy to have, by e.g. adding a module level factory datetime.utc() as shortcut for datetime.datetime.now(datetime.UTC).

I also think that the implementation could be optimized to make datetime instances with tzinfo set to datetime.UTC faster (there are plenty shortcuts which can be used in this common case), so moving from non-aware UTC datetimes to aware ones won’t hurt performance much in the end.

I think deprecation is fine, as long as there are new APIs which return aware datetime instances in the UTC timezone, which don’t require having to write datetime.datetime.now(datetime.UTC) every time you want to get the current time.

I disagree that this should be blocking or even related. If people want to work in UTC it’s not like we’re taking away a good option and replacing it with a more complicated one. We’re taking away a bad option that people can and do mistake for a good option. It’s a totally separate question as to whether there is a “good” way to work in UTC.

I also disagree that it is desirable to encourage people to work in UTC. What time zone you work in depends a lot on context.

Additionally, I did make it easier to work with aware UTC objects by adding the top level UTC singleton, so you can do:

from datetime import datetime, UTC

now = datetime.now(UTC)

This is the same number of characters as datetime.utcnow() if imported that way (though it’s 2 completions in an IDE and not one).

3 Likes

Here is a real-world scenario that I have commonly, and I believe validly, used utcnow():

  1. Need to give a utc datetime to an external data store (e.g. database, spreadsheet, etc.)
  2. The external data store datetime type does not support timezones and therefore the Python library interacting with it validly rejects datetimes with tzinfo not None

As you say this now needs to be replaced with datetime.now(UTC).replace(tzinfo=None), which seems awkward at best.

IMO this seems too disruptive because:

  • utcnow() is used widely in many code bases (looking at grep.app etc. it’s in a lot of public code)
  • The recommendation to replace it with datetime.now(UTC) does not produce the same object
  • Developers now need to make a choice on exactly what to replace it with through their code base, this creates churn, testing, and breaking of older libraries
1 Like

I mean, it should be awkward because it’s the wrong way to use datetime. I recognize that there is a spectrum of validity in the use cases, ranging from “we need to discard the time zone offset anyway and we never use this as a datetime object” (valid) to “we’re working with a legacy interface that doesn’t accept aware objects but expects the naïve object to represent a specific zone” (not valid on the part of the interface, but the right thing to do on the part of the user) to “we call .timestamp() on the result of a .utcnow() object” (totally invalid).

Yes, we want them to churn, because they’re probably using it wrong. That’s the point of the deprecation period, to put them on notice.

Luckily, it’s not that hard for these legacy interfaces to support the old version (e.g. “assume it’s UTC if it’s naïve because that’s what we were assuming before”) while also supporting aware time zone objects, since it’s easy to detect if tzinfo is None.

Yes, that’s awkward, but it’s also for an awkward situation, so I’m not as concerned as I would be if this were a more common idiom. Given that the external data store isn’t recognizing timezones anyway, information WILL be lost, so whether you start with an aware datetime and explicitly strip the tzinfo or use a single function that gives you a naive datetime that happens to be approximating to UTC, you’re doing the same thing.

IMO this is like converting Python 2 code to Python 3, and doing something like map(int, stuff) and expecting a list, and now you have to use list(map(int, stuff)) to get the same effect. Yes, it’s a bit clunkier, and that’s not great; but it’s an uncommon case, and the improvement is worth it.

Yes! That’s a good thing. Most of the places where utcnow() is being used, people actually will do better with now(UTC).

Perhaps this needs a longer-than-usual deprecation period due to the extensive use? I’d rather not (special cases aren’t special enough to break the rules), but it might make things easier.

1 Like

To clarify, my example is not about legacy interfaces, many data stores (most major relational database and spreadsheet software I believe?) support datetime data types without timezone information, in lots of cases it’s for historical reasons (although this is valid as Python should interact with historical formats), but also for efficiency reasons (in the data store, not in Python).

If a library which interacts with these external data stores accepts Python datetime objects where tzinfo isn’t None and either discards or does something with that information implicitly it’s likely to cause hidden bugs. Whereas if the library accepts datetime objects with None you get the benefit that data is validated to be a valid datetime and the user can do date logic on the datetime object in a valid way (e.g. add 1 day).

IMO it seems like a re-framing of the Python datetime library to say this historical usage of datetimes with no timezone information you shouldn’t have ever done and now we’re going to make it awkward for you to support this going forward. But that’s my point of view as a user of the library, I believe I’ve expressed what I consider to be a common valid use cases and highlighted the possible disruption this might cause, I don’t have to maintain Python so I’ll leave it there.

1 Like

I’m not going to deny that it’s awkward. But the truth is that you’re starting with a point-in-time and then moving to an abstract date + time with no timestamp; it doesn’t matter where you do this, it’s going to be losing data. You could do this:

  1. Start with an aware datetime in UTC
  2. Convert to a naive datetime with the same year/month/day/hour/min/sec (discard timezone)
  3. Pass that to the library

Or this:

  1. Start with an aware datetime in UTC
  2. Pass that to the library
  3. Library discards the timezone information

Or this:

  1. Start with a timestamp in UTC, buried inside the datetime module
  2. Discard the timezone information before returning that value as a naive datetime
  3. Pass that to the library

The third one is what utcnow() does. It’s hiding the moment where the timezone is being discarded, but discarded it no less is. You’re correct to say that the second option is suboptimal and could cause bugs. I put it to you that the first option is slightly less suboptimal in that it’s very CLEAR that you are discarding timezone information; and I would also suggest that, if the far end is assuming that the timestamp represents UTC, the library should accept an aware datetime and convert it into UTC.

2 Likes

Yes, this.

1 Like

The data type in the data store isn’t assuming any timezone, it doesn’t have timezones. Typically a user would label the column something like “business_datetime_utc” or “business_datetime_nyc” or nothing at all because it’s implied by the business function.

In a situation like this the library interacting with the data store has no idea what what is the correct timezone, it should reject datetimes with timezones as implicitly converting for the user when it doesn’t know what to convert to is almost certainly going to silently introduce bad datetimes.

Yeah, fair enough. So it’s not really the library’s job to do this, and ultimately, there is NO good way to handle it (especially if the column is named “business_datetime_cst” which is ambiguous in so many ways).

It could still be done by the library (declare that the correct timezone for this column is UTC, or America/New_York, or whatever, and have the library always return aware datetimes in that timezone, and convert to that timezone before saving), but if it isn’t done that way, I would say it’s correct for your app to explicitly discard the information.

I’d like to emphasize the impact of removing utcnow in one specific project because I believe that, in this thread, the amount of developer time required for this transition was somewhat underestimated. However, if my perception is inaccurate, I stand corrected.

When I mention “churn” I’ll be referring to the process of getting a (PR) merged that removes utcnow and all the resulting consequences. This will therefore cover beyond the direct impact of utcnow, because in my opinion this is the boarder impact deprecations have.

Pip

You initially raised this issue to identify a few places where utcnow was being used, both within Pip’s own codebase and in a vendored package, cachecontrol. Looking at these two:

1. Pip’s Main Codebase

Two PRs were opened, this one attempted a direct substitution from datetime.datetime.utcnow() to datetime.datetime.now(datetime.timezone(datetime.timedelta(0))).

However, that PR was abandoned in favor of this one which not only replaced utcnow but also took the opportunity to substitute some date format parsing logic with the fromisoformat method. This led to user issues, including problems building wheels and users encountering errors during checks.

2. Updating Cachecontrol

You opened this PR to remove utcnow in the cachecontrol project. I’m not aware of any specific issues it caused but I would point out that the logic involved was more complex and required suppressing type hinting for one line.

The key aspect here is that this PR probably only got merged because the project ownership changed hands. When you initially submitted the PR, the project was unmaintained. This situation extended beyond just utcnow, leading to the project being initially forked but later taken over and moved into the PSF GitHub organization.

Opinionated conclusions

Based on “1.” I recommend developers who want to ensure safety in their PRs should opt for a straightforward replacement of datetime.datetime.utcnow() with datetime.datetime.now(datetime.UTC).replace(tzinfo=None) (I’ve already implemented this in my company’s codebase). As observed, simple changes in complex applications with many users can easily result in issues.

Furthermore, when the deprecation turns into a removal, my plan is to create a tool to audit our entire dependency tree (currently ~300 modules) for any use of utcnow. If we find anything, we will either raise issues, fork, or vendor and patch it. Hopefully, less experienced users will be spared from these issues by the time of the release.

I guess if you really think that utcnow has no valid use case then all of this is reasonable, but I continue to model my excel and database timezoneless datetime data types in Python, and mostly utc is implied.

3 Likes

FYI some recent blog posts floating around advoating for the use of datetime.utcnow() for performance reasons:

I have no opinion on these, but thought it was relevant.