Async imports to reduce startup times

Oh, okay. I thought you were implying that “import x, y, z” should be optimized differently. Carry on then!

1 Like

But some modules when executed modify import hooks, affecting how specs of the later imports are found, so we can’t assume that modules can be prepared in parallel in general without breaking backwards compatibility.

Depends what “prepared” means. If it means “search the file system and load the source code, but don’t execute anything”, then it’s fine, since that can be flushed when a change happens. (Which I think most people here knew; I just misinterpreted the multiple-on-a-line thinking there was something special.)

1 Like

But it means that an existing module that modifies import hooks needs to be refactored to invalidate the cache or to force a redo of “search the file system and load the source code” with a new API, which again means the change would be breaking backwards compatibility.

Given that the (vast?) majority of packages aren’t going to do anything weird to the import machinery, it would probably be worth it to push “get ready to import but bail if necessary” as far as it can go. That at least would include marshalling unmarshalling pyc files.

I don’t know if there’s a reliable way to statically analyze a package and mark it “safe to import independently” but that’d open up a lot of possibilities.

Python’s incredible flexibility makes a lot of stuff technically possible but it could still be worth optimizing the common case.

2 Likes

Marshalling or unmarshalling? You wouldn’t want to execute anything, including for the sake of building the pyc. The ONLY thing you’d do is load stuff into memory without execution. I don’t know if unmarshalling a pyc is guaranteed to be side-effect-free.

1 Like

I meant unmarshalling but I’m also not sure about side effects (I kinda just hoped there aren’t any).

1 Like

An unique resource identifier can be attached to prefetched code objects, e.g. realpath.

If sys path was modified by other modules, a “cache miss” will occur upon executing the actual import. It will render our prefetch useless but would not affect the correctness of the final result.

Modifying the path is not the only possibility here. The import machinery itself could be modified. It’s probably very rare though.

1 Like

The problem is that while the path of the prefetched code may still be valid, the meta path finder may be altered by a preceding module for example so a different path and/or loader should be used instead, but the prefetched code would not know to invalidate itself unless there is a new API that the preceding module actively calls to force a redo of preparations.

I’m not against the idea by the way, but just pointing out the potential breaking changes that people need to be made aware of during the transition.

1 Like

importlib.invalidate_caches already needs to be called to ensure dynamic path config changes take full effect immediately, so adding a loaded-but-not-executed code object cache would be covered by that.

@markshannon is right that the most meaningful starting point would be better metrics:

  • splitting up import times by load/unmarshal/execute
  • splitting bytecode frequency estimates between “import time” module level code and function and method code (class, function, and method definitions would be expected to make up a much higher proportion of the time for module level code)
2 Likes

All that importlib.invalidate_caches does is call the invalidate_caches method of each meta path finder in sys.meta_path. It should be called only if a module makes changes to what an existing meta path finder caches. It is not useful or needed at all if a module installs a new meta path finder that changes the behavior of subsequent imports.

Async imports will definitely be a breaking change to such a module, setuptools._distutils_hack being a prominent example.

One possible way to improve backwards compatibility would be to make sys.meta_path a property with a setter that triggers a redo of concurrent module preparations, and the type of the object that this property manages can be a subclass of list with a __setitem__ that also triggers a redo.

While there’s still a potential that a module can monkey-patch an existing meta path finder, and the fact that sys.meta_path is no longer a list also potentially breaks backwards compatibility, I think the approach can help make the change as transparent as possible for the vast majority of use cases.

EDIT: On second thought sys is not a class but a module, so sys.meta_path can’t be made a property. At least making sys.meta_path a list subclass with a custom __setitem__ can still help.

The idea of async imports did come up during the PEP 690 discussion. I could’ve sworn there was more but maybe it was in another thread or in I’m just remembering our SC conversations during discussions of the PEP.

Regardless my understanding from talking with @thomas is that Meta may be working on what will probably become a new proposal for deferring import time work to improve startup time. Lots of Python users do want soem form of the feature to improve startup time, both for CLIs, notebooks, production applications, and to improve time spent in CI running tests (which are often targeted and execute far fewer codepaths than an application would).

PEP 649 has finally landed and gives us deferred evaluation for annotations. perhaps reusing that mechanism for import namespace related dicts could be involved?

PEP 649 seems excessive in complexity for this when considering what is allowed in an import statement, and that not all applications will want imports to be lazy. Deferring all imports makes task execution time more variable with looking at long lived applications. The first task to trigger a code path is measurably more expensive. I think looking at how type statements are deferred for evaluation might be a better parallel, and it might be possible to just make type import name work

CI presents a bit different challenge than workbooks / import name at command line. A lot of CI makes a clean environment and/or clears out caches between runs… Tests in a single build re-importing the same thing relatively optimizable, otherwise need pre-built packages…

I’m curious to see the time split for loading from disk vs. unmarshalling vs. executing. Also is it in Python code or C/Rust/etc. module loading for the projects being looked at?

I’ve been wondering if there would be a way to update module caching to build a “freeze” / “deepfreeze” version of a module that gets saved/loaded to .so rather than .pyc. Make it so the common case is faster if a module / environment adheres to rules that let the module be turned into a C API module with Multi-phase (or maybe even single phase) initialization. Ideal here would be get to “even faster import” that is validatable in CI of various projects.

Would potentially give benefits that less data is actually read from disk for very large modules (dynamic linkers / binary loading has supported this a long time), reduce amount of bytecode read/executed (global statements still potentially need run, module traverse in C I think is faster than running lots of defs through the interpreter?). Ideally could even pre-optimize (including specializing?) the code when compileall is run; if being run interactively/interpreter specializing save a new version potentially as well.

Overall make a “even faster import” mode with guidance for modules on how to get it (and ideally test/validate in their CI that they keep meeting the requirements), rather than more statements and modes of execution. Rather than making a special-purpose tool, reuse existing developed and supported infrastructure making common case more common and more valuable to optimize.

1 Like

Thanks, Greg, for stealing my thunder :joy: I’d been working on this post for a few days and was about to post it, but yes, what you say is pretty much true.

(For context: I recently joined Meta, and I’m part of the team that includes Germán and used to include Carl, the original authors of PEP 690. I’ve talked with Germán and the rest of the team quite a bit about lazy imports.)

Lazy imports really matter for startup time when you have dependencies that you don’t always use. They don’t have to be that large, either: even things like typing and enum, with all their metaclass magic, cause noticeable slowdowns, and delaying their imports (or the imports of the things using them) makes interactive tools a lot snappier, and process restarts less expensive. This is especially true when combining a lazily imported typing with PEP 563 (from __future__ import annotations) or PEP 649. I know we all hope we can make all of Python fast enough so that modules aren’t expensive to execute anymore, but just not doing things until (and unless) they’re actually necessary is a really simple way of achieving that goal, too.

For what it’s worth, Meta is actively using the original PEP 690 lazy import mechanism, and still sees a lot of value in it (specifically, startup time, which matters for a wide variety of things). We also see some of the downsides, as we enable lazy imports for more internal binaries. We’re mitigating some of that with internal tooling to try and determine if it seems safe to lazily import a module. While that obviously reduces the speedups (it’s fairly conservative, which is kinda the point) even the safe approach sees significant improvements. We intend to open-source that tooling when that’s useful (if lazy imports become a real thing in Python, implicitly or explicitly.)

We’re also working on an alternate proposal for explicit lazy imports. The internal mechanism would remain the same (special dict entries that resolve imports when the lazy objects are accessed, so no special syntax to “resolve” a lazy import), but the way to get an import to be lazy would be explicit. The laziness still applies to everything (module finding, loading and execution), because that seems like a much more straight-forward approach than trying to split those up – especially since execution usually leads to more imports, and having the delayed module already found and loaded under potentially different search paths/importers feels like a very confusing situation.

With the explicit lazy import syntax implementation, it would be incredibly easy to also have an implicit lazy import mode, possibly with an opt-out or opt-in list of module names. We’re not sure yet whether that should be part of the PEP or not, but it seems very likely that if explicit lazy imports were added, we’d still carry the (much smaller) patch for the implicit version internally. We could have implicit lazy imports as a tool to test whether a module can be loaded lazily (e.g. by running its tests with lazy imports enabled).

(I should also point out that it looks like the lazy import mechanism, the part that evaluates imports when the lazy objects are accessed, could be used to implement the “deferred expression” syntax proposed in Backquotes for deferred expression, at least in the global namespace. I think that’s probably a bad idea, but it’s possible.)

We unfortunately have some other stuff to finish first, but we hope to have a concrete proposal soon, probably come January.

11 Likes

This is true. The current version of demo implementation for deferred object can already work for lazy import. It just lacks a dedicated syntax. Here are some related posts where I mentioned lazy imports in that thread.

Since they share the same infrastructure, I think it might be a great idea to include lazy import as one of defer expression’s core use cases and sell them together. (Please let me know how you think about it :smiley:)

I believe Thomas meant the other way around. With some tweaks, lazy imports could implement deferred expressions:

Yes I got that. But to me defer expression seems like a broader concept. That’s why I proposed to cover lazy import with it (i.e. the other way around).