Make more of the standard library import on demand

the above were noticed when looking at importtimes for black, which is meant to be run as a cli tool, there’s arguably a lot of potential to improve a lot of interpreter startup time if some of this can be made lazy.

(I’m open to doing the legwork on some of the changes the info above would suggest here, but I’m not sure if there’s a compelling reason not to)

6 Likes

An alternative version of this idea would be to speed up import for those modules. I’m guessing that the simplest way to do that[1] would be to freeze them–i.e. prevent core modules like these from being modified under most circumstances. If the code object is sure to be constant, I believe it can be loaded much faster and more efficiently?

There will always need to be a way to allow such modification for workflows that need to do something strange to inspect or shutil or whatever. But my suspicion is that the vast majority of users aren’t doing any runtime modification to the standard library, and they assume none of their dependencies are doing it either.

But maybe my assumptions here are just wrong–there isn’t a path to faster loading by freezing the modules, or that would disable some very common workflow I didn’t know about…


  1. without “just make everything fast” magic ↩︎

1 Like

Frozen modules just move the load time somewhere else. We hope it reduces it overall, but it can make the main executable larger which increases the load time for everyone.

It’ll be easier and faster in these cases just to defer the imports until they’re needed.

A quick look at json suggests that re is only being used for convenience and portability, and we could easily make a couple of native methods that are likely faster.

dataclasses is only using inspect.unwrap (which gets .__wrapped__ with cycle detection) and sometimes signature (to create a fallback docstring). It looks like unwrap is always going to be called when defining a dataclass, but signature isn’t going to matter in an app scenario. This might take a bit of inventiveness to avoid the entire import.

tempfile uses shutil.rmtree once, and could easily import it then when it needs it.

4 Likes

tempfile may need shutil.rmtree at the interpreter shutdown time, when import is prohibited.

shutil can be imported in the TemporaryDirectory. This will add an overhead to every temporary directory creation, but I think it is minuscle in comparison with IO latency.

5 Likes

Yes, the difference is three orders of magnitude.

$ ./python -m timeit -s 'import tempfile' 'with tempfile.TemporaryDirectory(): pass'
5000 loops, best of 5: 78.7 usec per loop
$ ./python -m timeit -s 'import shutil' 'import shutil'
2000000 loops, best of 5: 104 nsec per loop

You can’t test import times with timeit - you’re checking a dictionary lookup.

It’s obviously cheap to do the import on demand when it’s already been imported. The reason we’d want to defer the imports is so that processes that don’t use the module don’t pay the price.

If you go through to the original thread, we were looking at the startup time difference between python -m black --version and ruff --version, and those list of imports came from analysing python -X importtime -m black --version. Black doesn’t need shutil to print its version number.

Oh! That’s surprising–I thought there must be additional overhead from executing the module [1].


  1. which presumably must be done, because the user may have modified who-knows-what beforehand ↩︎

If the methods themselves are faster it may be worth doing, but removing the re import is mostly futile for improving import time for CLI tools. argparse relies on it both directly and via gettext. Even if those did remove the dependency, the executable wrapper generated by distlib for console scripts also imports re.

The re import itself follows the chain of enum > functools > collections making up most of its own import time.

Ah that unwrap usage must be new, I think a couple of versions ago it was only used for the docstring, then dataclasses switched to using inspect.get_annotations and now on main it’s using the new annotationlib.get_annotations which means I expect if you remove inspect you’ll find most of the import time will be shifted to ast which annotationlib relies on.

I have my own library which is something along the lines of dataclasses but without the imports and with lazy class generation that I use for my own CLI tools but unfortunately the increase in complexity for directly handling annotations in 3.14 is making it more difficult to maintain this[1].


I think a lot of the import-on-demand improvement would be helped by better machinery for doing deferred imports, rather than doing everything directly as you would have to now.


  1. __annotations__ no longer exists in the class namespace. if __annotate__(1) fails I give up and import annotationlib because handling deferred annotations correctly is… messy and the enum values don’t seem to be fixed (they changed from a2 to a3). ↩︎

4 Likes

We’ve deferred imports for a number of modules, where the deferral doesn’t make the code harder to maintain. See What’s New and python/cpython#109653 for 3.13, and python/cpython#118761 for 3.14.

See also Async imports to reduce startup times - #57 by thomas for some future work in this area.

1 Like

I’ve written and distributed apps that use none of these, so the fact that some applications exist that are going to import re anyway shouldn’t mean that all the rest have to import it. If they’re going to get it anyway, there’s no loss from deferring it through certain other paths. (Though I haven’t checked whether my use of json would trigger the import anyway. I though we moved most of the basic JSON parsing into native code already.)

Alternatively, since “everyone” has to import re “every time”[1], we should make re as fast as possible.

Thousands of posts have been spent on this idea already, I can’t see it happening. Manually deferring the import is trivial in relative complexity and reliability. And in the cases listed above, the lazy imports would all be triggered immediately, because we do stuff like re.compile at the top level of the module.


  1. Scare quotes, not literal quotes. ↩︎

7 Likes

I did say “mostly futile”. I’d be hesitant to classify it as just some applications, currently it’s every application that is generated with the distlib template, which covers most of the linting, packaging and type checking tools written in Python or with a Python wrapper[1].

Example for black, but you get the same script for most pip installed tools.

# -*- coding: utf-8 -*-
import re
import sys
from black import patched_main
if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(patched_main())

I believe this comes from here: distlib/distlib/scripts.py at c6fc08e9fbc81c4350ecc5c7e9729c5b2711c422 · pypa/distlib · GitHub

So if you want to improve the start time of Python tools installed via pip you’re not going to see any improvement by removing re from modules unless this template is changed first[2].


Yes I’d agree on this.


Yes I’m aware of PEP 690 and many of the other discussions around the topic.

Deferring imports manually tends to hide the fact that they can occur at all unless someone remembers to put a comment at the top. For instance argparse can end up importing shutil to work out console size but you wouldn’t know that from looking at the imports at the top. typing uses functools.cache as part of deferring the import of inspect for one function.

Much like dataclasses I’ve ended up with my own module which I use for deferred imports when making my own command line tools. My general experience using it has been that it’s easier to use when you have multiple places that something can be imported in a module and better than manually writing __getattr__ and __dir__ functions for exports[3].


  1. So not uv or ruff for instance, as those provide the actual compiled binaries. ↩︎

  2. I had a project that did avoid re and almost any other import, and then discovered it was slower than expected to run after being installed which was how I noticed this re import in the first place. ↩︎

  3. That’s not to say that my module has things right though ↩︎

It is more than a dictionary lookup (the import machinery does several attribute lookups to check that the module is not partially imported).

And it is exactly what I want to measure – an overhead added by a local import to every TemporaryDirectory call. It is tiny in comparison with the rest.

2 Likes

In terms of “always imported” modules, I haven’t seen anything that would remove any of these for 3.14, can’t check right this moment, but 3.13:

3.13, python -SsuB -X importtime -c "pass", which is optimistic about people not needing site customization and knowing that disabling it can speed things up slightly when unneeded.

Windows:

import time: self [us] | cumulative | imported package
import time:        84 |         84 | winreg
import time:       169 |        169 |   _io
import time:        54 |         54 |   marshal
import time:       177 |        177 |   nt
import time:      2773 |       3172 | _frozen_importlib_external
import time:       328 |        328 |   time
import time:       870 |       1197 | zipimport
import time:        40 |         40 |     _codecs
import time:      1333 |       1373 |   codecs
import time:      1601 |       1601 |   encodings.aliases
import time:      4440 |       7414 | encodings
import time:       477 |        477 | encodings.utf_8
import time:        42 |         42 | _signal
import time:        23 |         23 |     _abc
import time:       737 |        760 |   abc
import time:       685 |       1444 | io
import time:      1231 |       1231 | linecache

These imports (trimmed from a full output) represent site customization with virtual environment usage, which I expect to be nearly universal.

import time:        52 |         52 |       _stat
import time:       202 |        253 |     stat
import time:       756 |        756 |     _collections_abc
import time:        67 |         67 |       genericpath
import time:        79 |         79 |       _winapi
import time:       307 |        453 |     ntpath
import time:       755 |       2215 |   os
import time:       101 |        101 |   _sitebuiltins
import time:      1075 |       1075 |   encodings.utf_8_sig
import time:       670 |        670 |     __future__
import time:      1433 |       2103 |   _virtualenv
import time:       284 |        284 |   sitecustomize
import time:      2196 |       7972 | site

These are mostly the same on linux, so I won’t be duplicating them. The windows specific modules winreg, nt and ntpath (the last only with site customization) aren’t pulled in, in their place posix is pulled in, with posixpath and erno being pulled in with site customization.

re appears to be nearly guaranteed to be pulled in currently in practice because of distutils templates, but that’s possible to avoid invoking with -m module since runpy doesn’t import re directly or indirectly.

Without a generalized speedup for module import and execution times and a generalized laziness mechanism for imports that can be used within the standard library [1] I tend to agree that there are certain modules where the effort would be better spent making the module faster to import rather than trying to lazily import it, but there are quite a few where I think there’s an extreme opposite at play. For example, the typing module. It’s extremely slow when considering that it is a module that most people don’t have any runtime use of; If the entire module could be made lazy, this would likely be an improvement for the majority of applications with any in-line type hints in any dependency.

This is probably going to require some amount of informed choices on which approaches best apply to each module outside of the generalized improvements that every module benefits from.


  1. more along the lines of the linked work for async imports rather than manually defined getattrs and non top level imports placed where they work properly even during interpreter shutdown ↩︎

For the specific case of other stdlib modules incurring the full inspect module import just to access inspect.unwrap, that function doesn’t have any internal dependencies on the rest of the inspect machinery, so we could migrate it down to functools, and make the inspect interface simply call the functools interface.

16 Likes

Just an FYI for folks here:
Inspired by this discussion, I experimented with removing the re import from distlib’s executable wrapper: Remove `import re` from script template by hukkin · Pull Request #239 · pypa/distlib · GitHub

4 Likes

I’ll go ahead and put together a draft later to see what this and a few other options here would look like. This one in particular would decouple most of the import cost of inspect from anything using dataclasses, which would be nice.

This is a draft of what a change like that would look like in practice. There’s a notable wart with removing it in dataclasses that can be further improved here by changing the logic of the class builder to also build up a text signature rather than relying on inspect to do so after the fact, as the change as drafted means inspect is conditionally imported based on if your dataclass is missing a docstring, and only to generate __doc__: Draft example · mikeshardmind/cpython@d228a48 · GitHub

probably an odd enough side-effect that this needs better justification + documentation, or removal of the wart.

There’s a few other things like this in various modules with needing to do slightly more to get the desired outcomes, I’ll try and find time this week to collect a few of them and open PRs for each that can be reasonably done.

2 Likes

I’d actually been down this road before back on 3.11 for dataclasses, but then 3.12 started using inspect for get_annotations so I dropped the idea at that time.

I did look at this at that time and I think I went one step further and made __doc__ a descriptor so inspect would only be imported if your class was missing a docstring and something actually tried to read it.

I also didn’t manage to find this docstring generating behaviour in the dataclasses documentation? I’m not entirely sure what its purpose is?

3 Likes

It feels like there’s more that could / should be able to cached here. For modules which don’t do lots of dynamic behavior in their globals (just define functions), feels to me like should be able to have the interpreter run a lot less bytecode and just say “here is this module’s dictionary”, “here is the interpreter’s dictionary”, “do a dictonary merge” which my brain says is fairly fast.

Definitely nice to do less computation (see also: Time to run python program_name.py vs python -m program_name), where caching makes a big difference. It’s nice deferring imports by hand when it’s not to hard, but it doesn’t seem like something should have to think or worry about; that there’s some underlying pieces could improve around module bytecode which always builds the same interpreter state.

Definitely when get to things which build more complex interpreter state, such as class hierarchies, dataclasses, and regex compiles can be more complicated, and in python code caches commonly get built for those, but there’s a bit of a gap across the board. Most big codebases / python modules I’ve looked at work to keep global work to just “imports, define functions, define classes, here’s main entrypoint”, the CPython standard library seems to do that a lot in the pieces I’ve looked at.

Being able to take a module and “reduce” its import from “run this code” to “merge these dictionaries” + see what is dynamic / unable to be “cached” (inspects external data like os.enivron, etc) feels like could make broad improvements without needing to change the python standard of “import at top of program” and add more that developers need to learn. Feels like a lot of modules do the same thing over and over once installed (ex. regex compilation), and can’t cache that today unless they build their own out of interpreter caches. For things which can be constant folded, make that tradeoff and do less (load from pycache/disk vs. run, possibly with “only in these cases use this cache data”).