PEP 690: Lazy Imports

guido · May 17, 2022, 3:38pm

I don’t see any evidence that sys.modules["X"] is set to None when any kind of error happens during import, in standard CPython. If you see this in Cinder it must be a Cinder feature. In standard CPython the module is explicitly deleted from sys.modules when the import ends in a failure. See the Code.

You may be confused by something else – if you set sys.modules["X"] = None, then any attempts to import it will immediately raise ModuleNotFoundError without looking for it on sys.path. That’s meant as a feature (and used extensively in the test suite).

The only thing I haven’t investigated is submodules, but I don’t see why it would be different for those.

Kronuz · May 17, 2022, 4:02pm

Yes, I think may be confusing it. I think I’ve seen it as None (even while not running the tests). Maybe at the time it was a library explicitly setting it to None then, for some reason.

guido · May 17, 2022, 5:08pm

It does mean that in MAL’s scenario, if there’s an error in a module, and it is lazily imported in a number of places (or a number of things are lazily imported from it), the module may be re-imported multiple times, failing with some runtime error multiple times. That feels undesirable.

sirosen · May 19, 2022, 8:13pm

I was able to construct an example which demonstrates this sensitivity to where/how a cycle is entered with 5 files. It might be possible to make it even smaller, but this was the smallest version I could come up with including a bool toggle to see the pass/fail behavior.

In a dummy package dir:

# __init__.py
SHOULD_FAIL = True  # toggle order to see pass/fail

if SHOULD_FAIL:
    from .foo import Foo
    from .bar import Bar, BarWithFoo
else:
    from .bar import Bar, BarWithFoo
    from .foo import Foo

# foo.py
from .bar import Bar

class Foo:
    def __init__(self, bar: Bar):
        self.bar = bar

# bar/__init__.py
from .base import Bar
from .withfoo import BarWithFoo

# bar/base.py
class Bar:
    pass

# bar/withfoo.py
from ..foo import Foo

class BarWithFoo:
    def __init__(self, foo: Foo):
        self.foo = foo

To summarize why/how this passes and fails:
If the bar subpackage imports run first, then bar.Bar is successfully imported before foo.py is read, so Bar is a usable name when foo.Foo is defined. However, if foo is imported first, it attempts to import all of the bar subpackage before resolving. This hits bar/withfoo.py before foo.Foo is available.

Based on the PEP, it sounds like the above case would be handled gracefully by the lazy imports. Because __getattr__-based laziness doesn’t handle from ... import ... specially, it may be more vulnerable to these sorts of issues.

carljm · May 19, 2022, 9:42pm

This is the same behavior as occurs today if you have inline imports of the same module in multiple places in your codebase, or even module-level imports of the same module in multiple places in the codebase (and, in either case, iff something catches the first error so the system continues running after it rather than just crashing as would be most likely.) It feels like kind of an inevitable consequence of how Python imports work.

Put another way: the simplest way to explain lazy imports is “it’s the same as if you wrote an import inline right before each use of the name, but you don’t have to write out that inline import manually in each spot.” This behavior with errors falls out of that simplest-possible description of the semantics.

The main difference with laziness is that it may be more likely for the import error to be caught and thus surface this behavior. But I think MAL’s suggestion of wrapping with LazyImportError should help a lot with this.

If you have a suggestion of what semantics would be better, we can definitely consider whether it’s possible!

guido · May 19, 2022, 10:58pm

That makes sense, no need to do anything special (but wrapping seems nice, I agree).

ronaldoussoren · May 20, 2022, 6:27am

The PEP mentions that importing inside a try-except block makes the lazy import eager. Is that intended to be recursive, or does it only apply to the imports mentioned in the block?

The latter could be problematic for code trying to deal with missing dependencies of imports, e.g.:

try:
    import curses

    # use curses to define the terminal UI
    ...
except ImportErorr:
    # use basic print-based UI
    ...

When the eager import is non-recursive this pattern fails because curses is a python module that imports an optional C extension.

malemburg · May 20, 2022, 8:46am

The main difference with lazy imports is that the implicit nature of the import causes such imports to potentially be tried inside loops. This is rather uncommon to have in regular Python code, where you may need to put the import inside a function to work around circular import issues, but don’t put such functions inside loops.

Such lazy import errors are going to be a problem for anyone not experienced enough to understand what is happening (and that may well include experienced programmers not familiar with the code base).

We also haven’t yet discussed situations where lazy import errors kick in at critical times, e.g. you’re writing a file and in the middle of processing an import gets triggered (say coming from a plugin used for data conversion), causing processing to fail because the code was not prepared for such errors. The file would then be left half written, likely causing data corruption further down the road.

I believe that such issues can be addressed by making the lazy imports easier to configure and the whole thing more explicit at the code level, e.g.

introduce an explicit lazy import statement and
additional helpers in sys to manage lazy imports externally for packages which you don’t control, but have tested for compatibility.

And without any command line switches to prevent abuse of the functionality.

Overall, I believe the whole idea needs more time cooking…

brettcannon · May 20, 2022, 6:57pm

I like to phrase it as import vs. eager import opcodes for my brain as that clarifies what’s going to be special-cased. And I have not read the PEP yet (someone else explained to me the dictionary trick and I have obviously have some experience in lazy importing ), but I was assuming you’re were adding/changing opcodes to implement the try trick, so consider that another vote for looking at that sort of solution.

carljm · May 20, 2022, 11:07pm

It is shallow. If it were transitive, this makes it impossible to reason locally about the laziness of any given import. Rather than just looking at the import itself and the global configuration of lazy imports, now you have to understand every possible import chain (all the way to the top) that could ever reach this import, in order to know whether it will be lazy. And it means that putting the wrong import inside a try could unexpectedly render most of the codebase eagerly imported.

Yes, this can be a problem. It would not actually impact the curses module, since Lib/curses/__init__.py has from _curses import *, which is always eager since it is a star import. Although this is somewhat accidental in this case, it does reflect the right way to handle a case like this: just ensure the import that might fail is eager, using any of the methods the PEP provides for that.

The debugging of this scenario is reasonable. Assuming it did impact curses: if you enable lazy imports for your application, when you test it on a system without curses, you would get a LazyImportError showing the import of _curses that failed, with a traceback showing the import was triggered at the first place that name is used. Then you add curses to the lazy imports opt-out list for your application and move on.

carljm · May 20, 2022, 11:28pm

This can of course easily happen today. If you have a function and you put an inline import into it for whatever reason, there’s no particular protection or reason to think that function can’t or won’t be called in a loop somewhere; the caller may not know or care that you added an inline import in it.

And what is the consequence here with lazy imports? Generally: you get a LazyImportError, your program crashes, the import is only tried once, you fix the problem. Worst case: inside your loop you also have some code protected with except Exception:, which you are silently passing with no logging, and within this protected code is the first ever reference to a lazy-imported name, and the import of that name fails. In this unusual edge case you have some debugging to do, sure. But it’s far from undebuggable: you see where the code seems to be stopping, notice you are catching all exceptions there, check what exception is being raised, and the situation clarifies. I’ve dealt with many harder to debug scenarios

I/O can raise many different kinds of unusual and unexpected exceptions, and KeyboardInterrupt or MemoryError can happen anywhere. So if you are doing I/O that is so sensitive to interruption and you aren’t cleaning up in a finally or catching BaseException you are already exposed to this risk today.

There’s no doubt that lazy imports is an advanced feature, which is why it is opt-in. It’s powerful and can have great benefits, but it introduces some new ways that things can fail, and in unusual cases potentially be difficult to debug. As with any new feature, even experienced Python developers will have to learn something new if they are working in a codebase that chooses to use lazy imports.

It makes sense to try to minimize the downsides as much as we can, without giving up the benefits. So let’s consider specific ideas for that.

I don’t understand how this addresses any of the concerns you’ve raised, unless you are also proposing to get rid of the option for global opt-in entirely, in which case this sacrifices too much of the benefits. Syntax for lazy imports may make sense for some use cases (which are not the primary motivating cases for PEP 690); I’m not opposed to it but I don’t think it helps PEP 690. It could be a separate PEP.

I already showed above how PEP 690 already provides this, in a way that is quite flexible and in fact similar to what you suggested in a prior post. Can you be specific about what exactly you think the PEP should provide that it currently doesn’t?

Tbh I don’t have super strong feelings here, but if you are using lazy imports it’s nice to be able to ensure laziness from process startup. And I think the scope for abuse with a CLI flag is much less than with an env var, since most Python applications are not distributed in a way where you run them with an explicit invocation of python. Python already comes with many CLI flags (-S, -s, -E, -I) that can have unexpected and breaking effects if used with some random program.

Kronuz · May 20, 2022, 11:31pm

In this very particular case, curses is not affected because it uses star-import for the c extension, and those are always eager. It could have been affected if it wasn’t and none of the imported names were directly used during the execution of the module.

ronaldoussoren · May 21, 2022, 6:56am

I’m liking this proposal less and less. As a library author I’d already have to reason about the possible effect of lazy imports and disabling being shallow means I have to know about internal details of the libraries I import to do this analysis.

I’m already not a fan of implicit lazy import due to import errors popping up when trying to use the imported module instead of at the import statement itself. That said, I’m not in the target audience of this proposed feature.

Carl Meyer:

ronaldoussoren:

The latter could be problematic for code trying to deal with missing dependencies of imports

Yes, this can be a problem. It would not actually impact the curses module, since Lib/curses/__init__.py has from _curses import * , which is always eager since it is a star import. Although this is somewhat accidental in this case, it does reflect the right way to handle a case like this: just ensure the import that might fail is eager, using any of the methods the PEP provides for that.

The debugging of this scenario is reasonable. Assuming it did impact curses : if you enable lazy imports for your application, when you test it on a system without curses, you would get a LazyImportError showing the import of _curses that failed, with a traceback showing the import was triggered at the first place that name is used. Then you add curses to the lazy imports opt-out list for your application and move on.

I’d probably end up adding a pth file that automates this to libraries that are be affected by this feature…

steve.dower · May 23, 2022, 9:42am

At least until someone notices this, declares that import * is “bad practice”, submits a PR and causes enough commotion that it gets merged (this has happened before).

Now a “purely stylistic” change (we know it’s not, but within the scope of reasoning about the change, it is) has inadvertently broken [non-]users of this module.^[1]

Add this to my reasons for preferring either complete transparency (which this is not), or completely explicit (which is already available as local imports).

And it’s a side-issue, but worth considering that type checking is already in place and further discourages people from using import *. So we can’t really add a feature that in any way encourages it again without clearly explaining why the other feature “loses” here, and how users can decide which situation they’re in. ↩︎

encukou · May 24, 2022, 9:35am

Over the course of this discussion, the term for who should have control over the -L setting was simplified from:

to just “application developer”, which is, IMO, the wrong one of the two.
As an example, the developers of the application black don’t pin versions of their dependencies, so they have no control over the kind of details that might make lazy imports fail.
It seems to me that many deployments of black don’t have an integrator – that is, someone who pins the versions and runs the tests to ensure everything works (and potentially patches in some set_eager_imports calls).

More generally, I’m worried real-world experience is from a very specific deployment, which affects a lot of the PEP’s claims, from debuggability to the performance numbers.
Sadly, I don’t have a suggestion how to make this better, short of implementing it and letting people play with it.

(I’m speaking for myself only, other members of the Steering Council might disagree with me)

ajoino · May 24, 2022, 10:22am

Perhaps accepting the pep as provisional (it seems everybody agrees that lazy/deferred imports is a good idea) to let people play around with it?

guido · May 24, 2022, 3:59pm

That gets a +1 from me. I am eager to have this available even if there are potential problems, and the syntactic form proposed by Carl is the best I can imagine. We can tweak the API to make things eager, but “lazy by default (if enabled at all)” makes sense to me. It will always be opt-in at some level.

brettcannon · May 31, 2022, 7:59pm

8 posts were split to a new topic: Packaging of projects that are both an app and a library

EpicWink · May 25, 2022, 9:48am

There’s too much (bordering on off-topic) focus on black. Don’t forget about other development utilities (read: applications) such as Pytest, Sphinx (specifically autodoc), etc, which categorically must be in the dev environment, as they need to import the target package, which will be affected by lazy imports.

In this case, I think the proposal covers their use-case as those utilities can choose to not enable lazy-imports. I’m not sure about the interaction where the target package (to be used as an app) is designed only to work with lazy-imports (and breaks with eager imports, eg due to circular imports)

steve.dower · May 25, 2022, 6:48pm

I think it’s time to wrap up this argument-by-analogy (though I agree it’s an appropriate analogy).

It sounds like in Petr’s view (to give the view an arbitrary name), an entry point executable generated by pip should not ever specify -L, and the person doing the install should decide whether to use -L or not.

In Barry’s view, the entry point metadata would specify that it wants -L and would get it unless the person installing somehow overrides it, under similar circumstances to how they’d override strict dependency requirements.^[1]

Extending the analogy from “entry point script wrapper” to “however you configure application launch” is left as an exercise to the reader.

It seems to be a question merely of de-facto defaults. If we already had multiple tools that clearly handled each case, this wouldn’t even be a question, and there wouldn’t be such concern about libraries being “forced” to have lazy imports. But because different use cases most likely go through a single tool, they become conflicts.

(Here ends my attempt to sum up the point of the preceding, apparently off-topic, discussion. Here begins my opinions.)

This all actually makes me lean harder towards a command line option or environment variable, and continuing to default to “off”, which is the original proposal. But I would want to see clear messaging around what responsibilities you as the integrator (“end user”) are taking on by enabling the option. And they’re virtually all social responsibilities rather than technical ones:

you don’t get to “call out” libraries in public because they don’t work yet
you don’t get to demand fixes because they haven’t been made yet
you don’t get to abuse maintainers because they haven’t made the fixes yet
you do get to stop passing the option you chose to pass in yourself

I wish that was more hyperbolic, but I’ve seen it all happen. Fundamentally, I don’t think I have any concern with the PEP itself, but I hate the thought of adding more ways for people to justify being horrible to the volunteers who are writing the code that they’re using for free.

Possibly the only (bike-shedding) question I’d ask is could this be an -X option rather than its own command line option? Mainly just to make it even more obvious that you’re doing something weird. (Or does that break shebang processing because of multiple arguments? ISTR something about that, possibly in this thread already.)

[Edited to add]
More concretely on messaging, I think the PEP abstract and “how to teach this” could carry the main weight. The latter in particular should have examples of how to determine when an application is not behaving properly under lazy imports with an instruction to turn off the option.

In contrast, if we put out messages like “when things don’t work, you’ll just have to wait for the maintainer to release an update” then I think we’re actively encouraging the kind of behaviour I would rather reduce. Info on how to debug and patch/fix/workaround issues, and a suggestion to contribute the change, I think promotes more responsibility.

[Edited again to add]
And of course, this totally leaves open the possibility for an installation tool to choose to enable it via a shortcut (substitute “pip” and “script wrapper” if you like concrete examples, but I’m trying to stay general).

If a package developer explicitly says “when people run my script, I support lazy imports” then they deserve the feedback that comes from that.

It’s just that when someone goes “my webserver starts up faster when I enable this option, oh it broke, let me go shout at the Django developers,” it should be really obvious that they ought to be shouting at themself first.

And most likely by using a tool other than pip, I would assume. ↩︎