PEP 690: Lazy Imports

steven.daprano · July 3, 2022, 10:42am

__future__ flags are for introducing features that will become normal always-on features in the future (hence the name), not for always optional features.

We don’t have a __future__ flag for (e.g.) the -E and -O command line switches.

koubaa · July 3, 2022, 12:50pm

According this this, subclasses() is an optimization, and in the context of another optimization (import time), it may make sense to trade one for another. You’re right that decorators and meta classes may lead to side effects, but I believe my point still stands - it is possible to determine those at the AST level.

In C++11 constexpr was introduced as ways to do some computation at compile time, and initially its scope was extremely small. Over time, the set of things that were provably correct at compile time grew, so the set of features supported by constexpr also grew. I’m imagining the same sort of story for python lazy module imports.

carljm · July 3, 2022, 4:36pm

Yes, this is a very good point. In fact we’ve already had to add an “is lazy imports enabled?” API to the implementation for testing reasons, we need to add this to the PEP also.

EpicWink · July 4, 2022, 5:25am

That’s the main problem others are suggesting: users asking library authors to support lazy imports may significantly increase the support burden on many Python developers, arguably reducing the quality of the ecosystem.

I think we should heavily discourage such requests, and find a way to make the community think of lazy imports as a potentially-available feature that only application developers (not end-users nor library authors) can enable. Maybe they can only enable it if their CI test-suite has 100% path coverage .

In my proposal, both end-users and library developers ideally have no control over lazy-importing. Raising an exception after discovering that lazy importing is in-effect is a form of control (although I don’t see a way to prevent this if sys.lazy exists).

Would it make sense to enable lazy-imports in the entry-point hook, without adding a command-line flag, environment-variable or sys function? For example, a lazy marker in the console_script entry-point definition, which then tells Setuptools to add a #py-pragma: lazy line early in the hook, which then tells the parser to enable lazy-imports.

The main one to look out for is __init_subclass__, which I suspect most users expect to be run on import of the subclass’s module.

Another idea: rename Lazy Imports to something more clandestine: eg “Dangerous Dragons mode”, “Unsupported Deferring of Imports Mode (UDIM)”, “Potentially Problematic Postponement of Imports (P3I)”

barry · July 4, 2022, 9:21pm

The idea of adding something to the console_script entry point specification has merit. Even if we keep the -L flag, you still need a way to tell the backend to create the script using that option (but also other options might be useful and I’m not aware of a standard for specifying that for entry points, in either PEP 621 or the packaging guide.

However, even if the thing that triggers lazy imports is a pragma comment, then it’s always going to be possible for programmer misuse. So I don’t see that as an argument for removing the -L option, which will still be useful when the script is not created by an entry point.

brettcannon · July 4, 2022, 10:41pm

Beyond Mercurial, I know of at least one very large Python code base that (at least at one point) used LazyLoader for their entire test suite.

Note you will need to update pyproject.toml specification - Python Packaging User Guide as that’s the packaging standard for declaring entry points.

Yep, the shebang will need the flag on Unix. Not sure what would need updating for Windows.

And I’m not a fan of the pragma idea as it now makes this somewhat syntactic and it would be the first comment that changes runtime semantics (I don’t count the encoding cookie). The AST doesn’t even keep comments, so that would make this even more of a change.

CAM-Gerlach · July 12, 2022, 5:09am

Presumably, the actual Entry Point Specification would also have to be modified, no?

brettcannon · July 12, 2022, 6:16pm

Probably both. The example in that spec, for instance, is setuptools-specific, so it’s already somewhat outdated.

petersuter · July 16, 2022, 6:25am

You may be aware the Scientific Python community is using lazy loading (via LazyLoader in some cases) e.g. in skimage, tensorflow, SciPy, napari, cuda, … libraries:

Maybe they should be asked about their experience and impact on them if this is deprecated?

carljm · July 18, 2022, 2:24pm

Thanks @petersuter. I posted at Lazy loading has landed! - #6 by carljm - scikit-image - Scientific Python to invite attention to PEP 690, and included some thoughts there about how PEP 690 can cover the SPEC 1 use case. I think in general it would be an improvement, but to fully cover it might require that we add the capability for a library module (typically an __init__.py in the SPEC 1 use case) to locally declare all its own imports lazy by default.

markshannon · August 4, 2022, 3:46pm

First of all, I think there is real value in lazy imports, and in adding them to the language rather relying on various third-party implementations.

However the proposed PEP just feels too magical. Controlling import by configuration, rather than being explicit,
makes it really hard to reason about and test code.

I don’t think that having things that do or don’t exist, depending on how you look at them, is a good idea.
(I’m referring to the hidden thunks that a lazy import introduces).

I feel that explicit approaches have been rejected with insufficient justification.
Using an explicit lazy as a keyword prefix to import makes it very clear what is going on, and allows packages precise control over what they import lazily and what they import eagerly, and the ability to test their packages properly.

lazy import mod

could simply create a lazy module object that would convert itself into a real module (performing the import) when it was used.

I understand the desire to force packages to be loaded lazily in a large application, rather than waiting for package maintainers to implement it themselves, which might take years.
However, if the feature is as valuable as claimed, package maintainers will be eager to adopt it.

The PEP as proposed has too many rules as to when a module is imported lazily or not. It is impossible to know at a glance whether a module will be loaded eagerly or lazily. A keyword makes it very clear.

One thing that this PEP does that an explicit lazy import could not is defer the loading of a module when an attribute is accessed from it without changing the class of that attribute, so this

from mod lazy import foo

can only work if foo is a module.

However, if lazy imports are wanted, then changing to code to

lazy import mod

and changes uses of foo to mod.foo doesn’t seem a big deal.

barry · August 4, 2022, 5:17pm

Really? I think all these issues have been discussed quite extensively, both here and in the PEP. I acknowledge that there’s different opinions about it, but I don’t think the PEP itself is lacking in justification. That said, I think the PEP is on target for what it proposes. If there are advocates for the explicit keyword based approach, then let’s see a competing PEP with all the necessary details, and we can debate the alternatives!

markshannon · August 4, 2022, 5:27pm

I don’t know if they have been discussed here.
The PEP should stand alone, IMO there isn’t sufficient justification in PEP.
But, as you say, it is a subjective judgement.

EpicWink · August 4, 2022, 10:44pm

With regards to testing, I can’t see how it is much more than adding a lazy-imports toggle to the CI matrix. Perhaps Pytest can learn a command-line flag to run the test suite twice, once with lazy imports.

The point of this PEP is that library authors shouldn’t care about whether lazy imports are enabled, just like right now where libraries don’t care they’re running in a separate thread. The application developer has the authority to enable lazy imports, but also has the responsibility to test the application thoroughly once enabled.

I think preventing application developers from demanding library authors support lazy imports is the social issue that should be addressed.

Kronuz · August 9, 2022, 5:46pm

Let me tell you the story of Lazy Imports regarding that global flag, so we can have some context about why we are proposing it that way.

When I started working in lazy imports at Instagram, my first intention was to be able to flag each import that I wanted to be deferred as lazy, with a comment or with a new keyword as the lazy keyword you’re suggesting.

The ideal to me was, obviously, that eventually everything was supposed to be lazy to get the full benefits of laziness, but I doubted that was going to be possible because of the large number of third party packages and huge amount of modules we use (it was too big of a change in the semantics of imports, I thought). I felt it was going be prohibitively difficult to make that ideal happen. I initially settled for adding a way to toggle lazy imports on and off in a per module basis and see if that was enough, so I implemented that in Cinder as from __future__ import lazy_imports.

I started enabling the flag in a few modules of our own codebase after analyzing them and figuring out if they were going to work or not. I kept doing this for more and more modules and started adding the future import in whole packages, then on whole directory trees!

When I realized it was working for almost all our internal modules I was trying it on (most of the time without any changes). I added an option to enable it everywhere, and I also added from __future__ import eager_imports, to disable lazy imports in a per module basis when needed. When I saw things were broken I started digging into seeing what was missing.

I fixed some bugs in Lazy Imports, also fixed some unneeded incompatibilities in the implementation and I progressively saw less and less issues. Then I started seeing common things that weren’t working. It mostly became a matter of fixing few patterns: import side effects related to the registry pattern; bad imports such as importing a module and not its submodules and then expecting the submodule attribute would be there (e.g. import foo; foo.bar.Bar → import foo.bar; foo.bar.Bar); cycles in custom import loaders (when lazy imports were triggered midway another import); and modules that mangle with sys.path and/or sys.modules in certain ways (such as adding a path to sys.path doing a lazy import and removing the path before resolving the lazy import).

Finally we ended up not using any explicit way of enabling lazy imports for our codebases and in the PEP we removed the from __future__ import lazy_import because it’s not clear whether lazy imports will ever be the default or not for Python.

There are many different types of uses of Python and some communities have different patterns, but all the evidence we do have is that the percentage of modules we tried and worked without any issues out of the box with lazy imports enabled was high, and that just enabling lazy imports in a few modules doesn’t yield many benefits at all. The true power comes when you enable laziness in whole systems. We’ve saved terabytes of memory in some systems and reduced start times from minutes to just a few seconds, just by making things lazy. When we saw this is when we thought of the insanely great amount of memory, time and other resources this could save us all, if we make this available for everyone!

Having said that, I’m not totally against having an lazy import mod or from mod lazy import foo. However, there are a few considerations::

Perhaps most obvious, is that a new syntax for imports would require changes in parsers, linters, debuggers, etc.
One bigger concern I have has to do with implementation details and what the new syntax could require: what happens with lazy import inside a function (inner lazy imports)?. They run with optimized fast locals and, in the current implementation of lazy imports, that would mean we’d lose that for lazy imported names, or introduce performance penalties on fast locals. It’s also unlikely that having inner lazy imports supported would bring additional or significant wins, because, many times, inner imported names are immediately used, defeating lazy imports purpose there.
Most importantly, needing to explicitly call for a lazy import would mean we won’t see any of the benefits of the feature until the new syntax is generally adopted, and that could take years! Not to mention that very good (but unmaintained packages) could never get the (some times) free wins.

Edit: I’m also sharing a link to the blog post I wrote back in June, I don’t think I had shared it here before, and it could give some additional context: Python Lazy Imports With Cinder

joaoe · August 19, 2022, 2:05pm

Hi.

The examples in the PEP imply that the module is loaded when the variable is accessed in the code, so that would indeed require some overloading when accessing global variables.

However, there is an alternative.

In my projects, whenever I need some lazy object, I use a proxy object.
I’ve implemented the lazy module feature myself a couple times.
Whenever a property of my proxy object is accessed, only then is the underlying object loaded.

Copy-paste

class LazyModule(types.ModuleType):
    def __init__(self, name):
        super().__init__(name)
        if name not in sys.modules:
            spec = importlib.util.find_spec(self.__name__)
            if spec and spec.origin:
                self.__file__ = spec.origin
            sys.modules[name] = self

    def __getattr__(self, item):
        if item.startswith("__"):
            with contextlib.suppress(AttributeError):
                return object.__getattribute__(self, item)

        assert sys.modules.pop(self.__name__) is self
        module = importlib.import_module(self.__name__)
        # TODO: use gc.get_referrers() to cleanup last objects referencing this lazy object.
        self.__getattr__ = lambda item: getattr(module, item)
        return getattr(module, item)

# Example to replace "import pandas"
pandas = LazyModule("pandas")

So, for the given code examples in the PEP, the statement “spam” would be a no-op, but any of “str(spam)” or “spam.property” would load the module.

One of the cool things about this implementation is that it can be done purely in python, by hooking into sys.meta_path and sys.path_hooks. And you could ship it as your own add-on on pypi.org.

Cheers.

PS: I’m not too much of a fan of future imports, and that implies a feature that is part of a newer python being backported into an older python. This feature of lazy imports significantly changes the behavior of the Python runtime, and as such is not a feature that will be shipped and enabled at runtime sometime in the future, I think. Plus, this is the kind of stuff you’d like to enable globally, instead of delegating that to whoever develops your 3rd party dependencies.