I thought this was one of the things that the PEP deliberately tried to avoid. If Python were to add lazy import then it will lead to pressure on library maintainers to support. And if they can’t or don’t want to support it, the pressure may not let up because new people will come in with the same requests over time, and there’s no syntax for being explicitly eager.
Maybe that’s an approach that could make PEP 690 better: add explicit eager import syntax? Library maintainers could mark their imports which must be eager, but it would no-op under a fully eager regime.
make lazy imports explicit (e.g. use lazy import abcinstead of import abc) and use a new keyword to make people aware
Maybe that’s an approach that could make PEP 690 better: add explicit eager import syntax? Library maintainers could mark their imports which must be eager, but it would no-op under a fully eager regime.
The main problem I see in any of the cases is that these imports are located on the usage sites rather than the definitions sites. This introduces a mental overhead (of thinking about whether an import should be eager or not every time you type it) on the users and makes it really hard on the package developers (since they don’t know which way their modules end up getting used at; and even if we offer a suboptimal way of manually forcing eager on particular modules I think that still makes it hard for the whole ecosystem to migrate to 3.12+).
In a slightly opposite direction, I’d lean into opt-in per import site (instead of usage site) would be the easiest way forward. Each package maintainer knows the best about their code (e.g. a new PR introduces a global state which actually won’t work with lazy mode; so they can also change that module to be eager from now on) so letting them decide and lifting this overhead from regular users would be a great addition. As the community adoption (which shouldn’t require any syntactical changes; perhaps a per-module/per-package marker so they won’t need to use any hacky functions) increases, more and more packages can support it gradually with technically no automatic breaking changes.
Two things about that. One, how would you denote that something can be lazily imported?
And two, you still have the issue of a lazy import raising an exception in the place of usage instead of import statement due to e.g. a syntax error. That suggests you may want control at the import statement level if you want those errors upfront instead of later.
The tricky bit with lazy imports as a concept is both the code doing the import and code being imported are affected. Right now there’s no handshake in both directions saying both sides of this “transaction” agree that lazy imports are a good thing. You almost need the code being lazily imported to opt into the possibility, and then the person doing the importing saying they want those semantics.
Otherwise we have to migrate the whole community over to lazily imports with lazy/eager keywords on import statements and potentially pivoting on what the non-labelled import statement does.
Pushing the lazy choice to the library maintainers sounds correct.
So what sort of services should Python provide to allow a library to
choose to be lazily imported?
I suppose there could be a gamut of possibilities: libraries with little
internal state might just need a switch to flip, libraries with with
some little-used functionality might want some stuff eager, other stuff
lazy.
But to find out, the library has to be located and something read in to
memory… that has some cost! Should the source be read but not
compiled past some point? For .pyc, loaded but not all the
initialization done? Isn’t the file system access that is a significant
cost of loading a library?
Should a lazy library have a small stub that could be included in an
application when it is compiled, that would the load the rest of itself
dynamically when needed?
I think the ideas of libraries opting in to being lazily imported is not useful or necessary. Modules in Python can never control how or when they are imported; someone can always put any import inline inside a function to make it lazy. The author of the import is always the one to decide when any side effects of the import will take effect.
The problems with libraries and global lazy imports are in multi-module libraries, where the library’s own internal imports become lazy and this affects the library author’s assumptions about when import side effects of their own imports happen. A shallow-effect per-import opt-in cannot cause this problem. It’s effectively just syntactic (and maybe performance) sugar for manually inlining the import, which is already possible and not infrequently done. It doesn’t change anything from the perspective of the library module.
So I think opt-in at the import site is all that is necessary; opt-in by the module being imported isn’t useful. (Assuming we are abandoning the idea of globally-enabled lazy imports and switching to per-import-statement opt-in.)
Yes. Per-import-statement opt-in laziness is (basically) syntactic sugar for manually inlining imports.
The main reasons to look beyond manual inlining as a solution, off the top of my head, are:
Manual inlining invokes the import system every time the function is called, which has a noticeable cost. The PEP 690 approach reduces this overhead to zero, after the initial reference that triggers the import.
Manual inlining is verbose. Sometimes syntactic sugar tastes sweet
Manual inlining doesn’t put the name in module scope, which means e.g. it can’t be re-exported, and it can’t be used for module-level uses like type annotations (which has a lot of value when the annotations themselves are lazy via PEP 563 or 649.)
Surely we can reduce this to a dict lookup and the checks we (still?) already have for detecting whether __builtins__['__import__'] has changed? No harm in further optimising things that have a noticeable impact.
On the other hand, it doesn’t put the name in module scope, which means it won’t be re-exported
Not going to argue with the annotations cost though. Ultimately, typing.TYPE_CHECKING is probably imported by someone else anyway, so checking it isn’t going to be that expensive to skip module-level imports. If we eventually figure out cheap, no evaluation annotations, then the rest of the cost largely goes away too (personally I use string literal annotations for anything perf sensitive, but I don’t have that many places where this matters).
It will definitely mess with syntax checking abilities of type checkers to have the module-level imports for checking but not at runtime. But I guess we just need additional tests.
The main problem with typing.TYPE_CHECKING is not the cost of importing or checking it, it’s that you are forced to choose up-front whether you want cheap/non-cycle-causing annotations or introspectable annotations. With a lazy import and PEP 649 (now accepted), you can have your cake and eat it, too.
The PEP does propose a with eager_imports(): context manager. (Which is always a no-op since with itself makes imports eager.)
But I don’t quite see how it would reduce the pressure. Lazy imports need to be tested, and to be generally useful (outside big apps with rigid dependency chains), they should be tested in individual library test suites. There’ll be demand for testing, maintenance, mental overhead around the fact that your library can be imported in two different ways.
Perhaps. It seems less certain than a syntactic change. What I liked about the PEP was that it (at least attempted) to put the burden on the application developer, which is where I think the majority of the responsibility lies. For example, if I turned on implicit lazy imports in my Python CLI, and I found that one of my dependencies can’t be lazily imported, I think I’d report the issue (or file a PR) to the dependency, but then I’d just eager-ify the import and my CLI would be none the worse off.
I’m just curious, but wouldn’t these be already hard to manage without PEP 690? Many libraries don’t warn you about importing a library in your functions (we can call this a “manual lazy import”). If the various things mentioned in this comment are problematic, aren’t they problematic for manual lazy imports too? Wouldn’t these things already need to be tested or proscribed?
It would be amazing if we could try out lazy imports to see how many things actually break.
Sure, but what if a new patch version of one of the dependencies broke your CLI, while you’re on vacation?
The PEP puts all the burden on applications with pinned dependencies. But it also doesn’t benefit anyone else. To do that the burden needs to shift.
If the burden is on libraries, then the libraries would benefit – assuming it’s good for the library if applications that use it can be faster. (That isn’t always the right trade-off for the library, of course, but that’s a general issue – app authors can pin/fork/patch or find dependencies that align with their interests.)
And if the dependency accepts the PR, it now has tests for the new use case!
That’s a burden.
Manual lazy imports are “shallow”: when the dependency is eventually imported, it’s still loaded at once – presumably, the way its authors load it in their tests.
With PEP 690, loading of individual pieces of the dependency would get delayed, perhaps indefinitely.
@carljm
In the proposed explicit lazy imports (lazy import ...) is the resulting object a thunk, like in PEP 690, or a Python object that performs the import on attribute access? (Or something else?)
The problem I see is that lazy imports change the semantics of “being imported”. Currently, the author of a module can write code to be run at import time. That is a part of the defined semantics of the language, and as such, doing so isn’t wrong.
Lazy imports, the way they are being discussed here, allow the library user to alter those semantics, without the library author having any way to know that has happened. That is the real issue here, to me. To make a library “safe for lazy importing” requires the author to not use an existing language feature (code run at import time). And I sort of feel that we should be assuming that library authors know what they are doing, and don’t gratuitously use costly features - so helping lazy imports comes with a trade-off that maybe the author doesn’t want to make (the awkward UI of explicit registration, for example).
Well, that scenario isn’t limited to lazy import side effects!
Yes, most of the visible benefits of lazy imports are seen by the application, but that’s okay. We’re talking about Python start up time here, and that’s an application effect.
I do think that PEP 690 could be stronger by providing guidance to library authors for how to test their libraries under lazy imports. OTOH, there’s no requirement for libraries to guarantee they are lazily importable. That’s just one more piece of the equation.
For example, I receive a bug report that one of my libraries doesn’t work on Windows. Maybe there’s even a PR to fix that. I don’t test on Windows [1] but the patch looks reasonable and doesn’t break anything else, so as a courtesy, I apply the patch. That’s a burden too, but I accept it because it’s helpful to my users. I don’t make any guarantees that my library will continue to work on Windows, but now I’ve got users that should help me keep it working in that environment. Is the lazy import burden different in kind to that?
I wouldn’t make the same assumption! In fact, several years ago at $work I did start up time analysis on some common Python CLIs that people pointed to as evidence that Python wasn’t appropriate for CLIs. Turned out that a large part of the problem were with our libraries that did significant amount of work at import time and/or did expensive network calls to create various module globals. Fixing that improved things considerably (although not enough to satisfy the meme that Python can’t be used for performant CLIs).
@markshannon and others, what are our options? I feel like PEP 690 was our best chance to combat this meme. What viable alternatives are worth pursuing? Or do we just give up on Python for performant CLIs?
That’s a fair point, and I agree that my claim that “we should be assuming that library authors know what they are doing” is too strong (in particular, I wouldn’t claim that I know what I’m doing a lot of the time ). However, I’d rather see people report that import time of my library is slow, and maybe the following changes might speed things up, than see reports “your library doesn’t work when I lazy-load it” which expect me to rewrite semantically correct logic. Particularly if doing so requires me to use a less convenient UI.
This feels very like the sort of debates over optimisations and static compilation of code. Python’s dynamic nature prohibits certain types of performance improvements. We can fix that by disallowing some types of otherwise valid code, but it’s traditionally been very hard to get acceptance for that. Startup time for Python apps is a genuine problem, and improving it would help a lot, both in real terms, and in terms of killing off the “Python is slow” meme. But I don’t think it’s at all surprising (or unreasonable, if I’m honest) to hold proposed solutions to the same sort of standards around maintaining Python semantics that we hold other optimisations to.
Good question! I think @malemburg touched on the same questions earlier:
I’m more or less convinced that explicit per-import-statement opt-in (rather than global opt-in) is the right approach to make the feature easy to understand and use, even though it creates a harder and longer path to actually being able to write a fully lazy-imported program that uses third-party libraries.
I’m not convinced on the second or third points: that we should switch to “lazy module objects” with __getattr__, and make lazy imports work only for full-module imports.
There are several mostly-orthogonal considerations here: intuitiveness/comprehensibility/usability, invasiveness/risk of implementation, and performance.
As a user of the feature, I think it is somewhat ugly and unintuitive (and surprising) if lazy imports can only be used for full-module imports; much nicer if they work for any kind of import, so that laziness can be orthogonal to the syntactic question of whether I want to repeatedly type the imported module name throughout my module (e.g. even if I’m only using one imported object from it.)
In terms of invasiveness, I understand the reluctance to have support for lazy objects embedded in the core dict implementation. I’m interested in exploring what it would look like to do it in a subclass of dict, so we can keep the essence of the PEP 690 approach but in a less invasive (and possibly even maintainable outside of core) way.
In the Python versions that I’ve worked on (3.8 and 3.10), I think “lazy module object with __getattr__” is also an unacceptable level of performance overhead on every attribute access on the lazy module object, compared to PEP 690 approach which is amortized zero overhead. I can imagine that it is possible to reduce this overhead significantly, but I don’t know how close we can get to PEP 690 performance, and I’m hesitant to commit to this approach until we can demonstrate the overhead can be reduced to an acceptable level.