PEP 690: Lazy Imports

While everyone in this thread seems to be generally on board with lazy imports, I’m not clear why so far. It seems like this thread is nothing but confusion over the implementation and ramifications, and every answer is complex and raises more questions. I consider myself an expert in Python, I can’t imagine the trouble less experienced users are going to have with understanding this change and any issues it might cause.

Another thing I’m worried about is what this means to me as a maintainer of Flask, Jinja, Click, etc. I’ve already experienced tons of extra work and frustration attempting to support asyncio and typing. Both of these were huge changes but had a huge marketing push behind them, so users were constantly asking for more and more support for them even though I had limited time to understand them and make the required changes and the benefits weren’t clear. I really hope this doesn’t add a third huge workload to my list of things to juggle as a maintainer.

I guess my two concerns are:

  • How is this going to be communicated to users, and what will they have to do to take advantage of it?
  • What extra work will maintainers have to do, including learning, one time changes, and ongoing testing/maintenance?
3 Likes

It’s hard to tell. The pattern is so well established and for the most part it’ll continue to work. Having said that, I believe a more explicit way of checking for the existence of modules (like the one you are proposing) is cleaner too, as opposed to relying in the module to throw the ImportError exception.

Having imports inside try blocks makes them always eager, but any imports inside that eagerly imported module are still lazy.

# foo.py
import bar
try:
    import foo
finally:
    pass

Makes foo eager, so it’s immediately loaded. The import bar in foo.py is still lazy, even if foo itself was eagerly imported.

2 Likes

I think the main reason why this feature is in general well received is because it tries to solve a few of the long standing problems Python has historically faced:

  • it addresses improving startup times (and some times execution time too),
  • removes a lot of cases that can lead to import cycles,
  • solves the typing-only imports issue,
  • helps reducing memory footprint,
  • allows for easier refactoring of modules in large codebases,
  • helps improving code readability by allowing to avoid inner imports, and
  • prevents needing to use “tricks” to defer imports, as so many packages are currently doing.

All this it in the least intrusive, transparent and efficient way, and with great real-world numbers backing things up too!

I believe the benefits greatly outweigh the burden of starting using the lazy imports feature, which (perhaps being a little too over-optimistic) just works out of the box in some cases.

We need to have clear explanation of how the feature works and what it changes in a simple way anyone can clearly understand. We’ve tried hard to explain things in the PEP as clear as possible, but of course there’s a lot more to do in that area.

The feature is opt-in, and is very simple to turn on and off, once on, it may or may not work out of the box if an application owner starts using it. In the cases that it doesn’t, there are clear things to look after to make it compatible. These things ideally should become very easy to spot for the owners of the codebases (I think mainly reliance in side effects, changes to sys.path, try / except / finally pattern).

There will definitely need to be some reading to do, about what exactly is changing and what to look while making changes to the codebase. We already have some guidelines in the PEP.

1 Like

I very much agree with this. My fear is that most users will get the impression that “enable lazy imports” is essentially a “go_faster=True” setting, and will enable them and have little or no ability to handle the (inevitable) problems. They will therefore push the issue back to library maintainers.

I’m lucky in that I don’t maintain any big libraries that have a programmatic API, but for the libraries I do maintain, I’ll almost certainly just refuse to support lazy imports (“if you hit an issue, reproduce it with lazy imports switched off, otherwise I will close it as unsupported”). Whether that’s a viable strategy for larger projects, though, I don’t know. And even if it is, the emotional work of repeatedly stating that position is far from negligible.

I’m all in favour of improving the startup time of Python applications, and in particular the cost of imports. But I think we should do so in a way that matches other tuning processes - provide a set of (ideally easy to use) tools that a motivated application developer can use to achieve better performance. But with full understanding of what they are doing, not by simply turning on a “magic switch” and expecting things to “just work”.

3 Likes

Can anybody profile what is the slowest step of PyImport_ImportModuleLevelObject() that implements import in Python/import.c, please? Does the module search slows imports or further bytecode processing?

If it’s the latter, we can teach import to just load (or map) a bytecode file, scan its static hashmap of exports (in case of from ... import ...), and defer the rest of bytecode processing until the first access to an imported entity.

As a result, no user-facing lazy importing would be required.

1 Like

That is user-facing lazy importing. When import foo is executed, the current semantics are that all of the top-level code in foo is run. This is where the performance issue lies, as that could potentially be expensive. Lazy importing defers running that code until the first access to an imported entity. But it has risks, as modules are currently entitled to assume the top-level code is run at the time of the import, and may need changes to deal with the modified semantics.

2 Likes

I forgot that top-level code exceptions pass into the module imported if uncaught. Initially I kept argparse-like scenario in mind and missed the issue.

1 Like

It is almost inevitably the execution of the top-level bytecode in the module, once found. The exception here is importing built-in or extension modules, which don’t have bytecode and typically just create a bunch of objects using the C API. But as soon as it’s a .py file, executing the bytecode is typically the costliest step. (Especially if the bytecode is loaded from a .pyc file; if no current .pyc file exists, parsing the source code and generating the bytecode will also take up some time, but for a typical installed package, there is always a .pyc file.)

That would be nice, but implementing this would be an enormous project. And if you defer most of the bytecode execution, that means you would still have many of the same problems of the proposed lazy import scheme.

3 Likes

Hi Paul,

I’m trying to understand how (if at all) the PEP could concretely be modified to address this comment:

I am not clear what about the PEP makes you feel that developers won’t understand what they are doing when they use lazy imports. What specific changes would make it seem more likely to you that developers would understand what they are doing?

The intention of the PEP is precisely to “provide a set of easy to use tools that a motivated application developer can use to achieve better performance.” If you think there’s a better way to do that, I’d love to hear it!

1 Like

Hi David,

Thanks for your feedback. I think this PEP has both sizable pros (dramatic performance and memory improvements are the most notable) and sizable cons (if lazy imports are widely used, there will be two significantly different semantics for Python imports in wide use, which increases the complexity of the ecosystem.)

It’s worth reiterating that the “lazy imports” semantics are already available in the standard library today (via importlib.LazyLoader) in a very similar global opt-in way, so the PEP doesn’t introduce that, but it does make it more usable and more effective and faster, which may make it more popular, which (if it happens) means Python library maintainers are more likely to hear about it than they are today.

The -L flag and the new importlib.set_lazy_imports() API will be documented in the Python documentation. Their documentation will include a clear description of what they do, probably based on what can currently be found in the (newly updated and greatly expanded based on this discussion thread) PEP text. The PEP also now specifies that:

The only extra work that maintainers have to do is possibly occasionally closing an issue with a polite “I’m sorry, I don’t support lazy imports, if you use them you’ll have to figure things out yourself.” (As Paul says, I don’t pretend that this isn’t in itself a cost; I know it is. But it’s the minimal cost that allows the feature to exist!)

If a maintainer wants to do more than that to support lazy imports, it’s a little hard to say what exactly that would entail, because it heavily depends on the details of the library. For most libraries, no code changes would be called for at all. (“Most” is supported by real-world experience here, since we have lots of Meta code using lazy imports and using many popular Python libraries.) One thing that it could entail would be choosing to run the library’s CI with lazy imports enabled (and then adding one import-the-library smoke test without lazy imports to ensure there aren’t import cycles.) It might entail an additional note in the library’s documentation that “if this feature is used in this way with lazy imports, you’ll need an opt-out.” Or it could entail code changes to reduce the library’s reliance on import side effects.

7 Likes

Specifically, I think that marking individual imports as lazy would help, because the implication would be that the developer profiles their application, notes that a particular import is slow, and then improves it by making it lazy.

What does mark mean? Does it mean something like

lazy import x

where only that exact import is lazy and other files that import x are eager? Or does it mean like

set_lazy("x")

and all imports involving x in any file are lazy? If it’s the former that’s too explicit. I have a couple libraries I know are very slow to import first time and may not be necessary. But those libraries are also core libraries imported in hundreds of files. Marking all of them as lazy is bug prone and I think close to the solution of manually move all imports inside functions/getattr. It’s very easy to write normal code that has one expensive import and then remove benefit of hiding it everywhere else.

The set_lazy(“x”) option I’d be happy using. It’s not great for case you have large number of libraries that each add up some import slowness, but it’s useful for case you have a few very expensive libraries you want to avoid whenever possible.

edit: There’s a second issue with explicitness. Often import slowness is not what you directly import but transitive imports from library code you do not own. There’s not a good way to mark those explicit without some global lazyness. As an example imagine library foo imports library A and B internally. B is slow to import. Your codebase never accesses B including transitively but does access A. How would you make B lazy explicitly if you do trigger foo import? Should you mark internal details of libraries? I think heavy explicitness would end up leading to needing to examine internal imports/private details of library.

I don’t honestly know or care. My point was that a command line flag that affects everything is too prone to people just using it “because it makes things go faster”. I would prefer a mechanism that encourages people to go through the analysis:

  1. My application is slow.
  2. I profiled it, and it’s the imports that are slow.
  3. Imports of libraries X and Y are the worst offenders.
  4. Let’s import those libraries lazily and see if that helps.
  5. Profiling the result, yes that fixes the issue.

Step 4 is where you explicitly flag the problem imports to make them lazy. I’d imagined marking the specific import statement, because that’s what profiling should identify for you. But I can see an argument for marking the module. The problem is that if you mark some commonly used module X as lazy, and then an unrelated module W starts failing because it’s not expecting X to be imported lazily, the author of W gets a bug report saying “please make your library work when X is imported lazily”, and we’re back to the problem I want to avoid, of library authors getting put under pressure to “support lazy imports”.

Let me ask the converse question. If you had a set_lazy("X") function, and library W broke if X was set to lazily import, then what would you do? Assume that the author of W has said they don’t want to support lazy imports. You can’t fix your import of X without breaking W. And maybe W only imports X in a way that doesn’t contribute to your performance issue…

This is all entirely theoretical of course. But so is all of the discussion here - we’re all just proposing scenarios that we’re concerned about and asking “how does the PEP handle this?” So if you want to say “I don’t care about this scenario” then I’m fine with that. But I care about it, and I’m against the PEP if it doesn’t address this problem. That’s all I’m saying, in the end.

1 Like

It would mainly depend on how W broke. I’ve encountered a number of libraries/tools that have weird edge cases/bugs with namespace packages or runtime type annotation usage as both are uncommon. When this happens I’ll take a look at why it failed and if bug is small I’ve occasionally made small prs/bug reports.

If W does not want to support lazy imports then bug report is skipped. If issue is small, W is easy to build library, and benefit is large for me I’ve occasioned forked libraries and upload them to my work private pypi index. If the issue looks more fundamental, then I accept them being incompatible and either give up or try to find a hack on my end. As an example I have packages that cause pytest/pylint to crash due to an issue with namespace packages. For those cases I did a workaround. While sphinx-autodoc-typehints one bug I did a very small fork. And then tensorflow bugs I mostly give up on as forking looks too maintenance risky there.

edit: My view is this feature is nice bonus for cli experience. I completely understand if maintainer doesn’t want more work. I’m very happy with existing python ecosystem of packages.

You would use the opt-out mechanisms already described in the PEP to say “all imports in module W should continue to be eager.” You can do this as the application developer without touching the code of module W. This is a very important component of the PEP.

Unfortunately I don’t think this is a practical description of many real-world scenarios where PEP 690 can result in huge performance improvements. Often it is not some particular library that is noticeably slow to import, it is simply the sum total of many perfectly ordinary modules that add up to “slow startup” because they are all eagerly imported every time, even though only ~30% of them are needed for a given invocation of the program. How do you propose that the author of a CLI in that scenario would achieve improved startup times via lazy imports that have to be manually enabled at each import site?

If the PEP cannot help in that scenario (which I think is the most common relevant scenario), then IMO it is not really worth it.

Also: importlib.LazyLoader already exists and is enabled all-or-nothing, globally. The Python ecosystem doesn’t seem to have been overwhelmed by people trying it “just to see if it makes things faster.”

3 Likes

You’re right, I’ve never seen an issue due to someone using LazyLoader, but I also have no idea if anyone has used it. If people who need startup speed aren’t applying LazyLoader (or if they are and it’s already working for them), why will they use this flag instead?

The PEP says that LazyLoader doesn’t work with from foo import a, b, and that it adds overhead to subsequent attribute lookup. Are those things that could be addressed instead? If not, should LazyLoader be deprecated if this PEP is approved?

1 Like

Because it is able to make more types of imports lazy, and do so with lower performance overhead.

Not while keeping the implementation mechanism of LazyLoader. The way you would do that would be by replacing LazyLoader with PEP 690.

Deprecated, probably. (Though I would recommend holding off on this until PEP 690 sees more real-world usage and not doing it immediately, which is why the PEP doesn’t call for this.)

I don’t think there would be any reason for a new user to choose LazyLoader over PEP 690 if both are available. But of course there’s little upside to actually removing LazyLoader from the stdlib, as it would break existing users for no real gain.

I have considered whether PEP 690 should just be a transparent under-the-hood “upgrade” to LazyLoader. But I think this would be problematic for a few reasons. It would be odd for PEP 690 to be installed as an import loader, since that’s not actually how it operates – it would be an abuse of the import loader mechanism. And while I think practically in most cases PEP 690 would work fine as a drop-in replacement for LazyLoader, there are enough observable behavior differences (behavior of from imports, what type of objects are populated in sys.modules) that I think it must be an explicitly opt-in switch for existing users.

2 Likes