My memory is a bit fuzzy since it’s been so long since I worked on that code. I believe you are right. My original motivation for starting down that path was to have a, IMHO, nicer way to implement __getattr__ compared to what was done in PEP 562. I.e. if LOAD_GLOBAL does __getattr__ on the module object, you can implement something like PEP 562 with an actual __getattr__ method on the module. Or, you could even define a property. There are some other side benefits like making the import system a lot cleaner (e.g. passing around and operating on modules, rather than module dicts).
In my prototype, if you access module.__dict__, you get back a dict instance. So there would be no way to trap access to do the lazy load PEP 690 does. If you did mod.foo that would trigger the lazy load of foo. But mod.__dict__['foo'] would not trigger it. So, I can imagine your wack-a-mole problems. If you ran into a lot of them in real code, my idea is not viable.
It does to me. I would be real happy if 3.12 had this lazy loading feature and I would spend time to try to enable where I could.
I think you could argue it is a case of practicality beats purity. I wonder too if CPython will have something like abstract and concrete types at some point in the not too distant future. Not sure of the terminology but I think pypy does something like that. I.e. there could be multiple concrete (aka hidden) types that satisfy isinstance(obj, dict). It will take some clever engineering to make that work given the C-API but could allow many optimizations. E.g. specialized dict implementations used for namespaces.
Edit: BTW, the reason I was originally looking at a nicer way to do PEP 562 was because I was working on a crazy lazy import idea. Barry has some of the prototype code in github, see comments in lazy_compile.py. My idea from way back then looks pretty similar to some things that Cinder is doing with strict/static modules.
Ah yes, I skipped the rationale section because “faster startup time” was obvious and went straight to the reference implementation section, which only talked about Cinder
My only “concern” as a run-of-the-mill Python developer is that, for CLIs which are distributed widely (e.g. via PyPI and managed with pipx - not just within a corporate environment, which is what appears to have been the motivating use case), a user-enabled start-up flag or environment override might prove impractical[1] or otherwise require e.g. the creation of platform-specific wrapper scripts which would complicate delivery. If there were some way to opt into lazy imports - not necessarily on a per-module basis but within Python - I think that would interest a lot of CLI authors.
I’m not sure that I’d want to suggest to my users to alter Python import behaviour across their whole system, I don’t know how large an audience I’d be able to reach, and environment overrides are not straightforward on what remains the most popular consumer OS. The flag is a non-starter for console scripts. ↩︎
Perhaps it would be good time to make another proposal for extending entry point specification to allow specifying console script’s command line arguments that are passed to Python interpreter. This could also be useful for specifying a flag like -O that disables asserts.
I am fairly concerned with the effect this would have given how much code in python does actively cause side effects. Even just subclassing a class in a different module creates non-local side effects. I don’t think changing the behavior of import is going to do anything other than cause significant headaches.
On the other hand I do think that modules could use an upgrade so we could avoid the issue at the source instead of trying to skirt around it.
It is expected that if users are providing popular libraries they now double their test cases? Run with and without proposed -L flag so that users of library can depend on both behaviors?
For this use case when users run entry point scripts like pytest, nox, or tox, etc. Will they have to switch this to python -L -m pytest or otherwise manually specify what the entrypoint is?
This doesn’t seem like too big of an issue to me. If we’re actually resolving a lazy import, that’s definitely going to be the bottleneck. Compared to that cost, walking over the values of even a large global dictionary is probably close to free.
As a benefit, we don’t pay the cost of the more expensive lookup function once they’re all resolved. We could even attempt to convert it back to a unicode-only dict (if it’s shown to pay off in practice, which I suspect it could).
Not a dealbreaker of course, but something to consider.
I understand it is opt-in. I just don’t think it solves underlying problem and leaves 3rd party libraries in a bit of a bind. I am really not sure how to define side-effect free python. I think we can have code that isn’t reliant on its side effects but there is a pretty limited set of code that actually has no side effects. I am unsure if this distinction actually matters or not. I may just be overthinking this part.
Further for libraries where you need these side effects, they are now unable to use this feature leaving the performance issues. I have a tough time seeing how this would work cleanly with any of the django style classes. I think the actual root cause, inherent limitations of packages and/or peoples implementations of packages is worth addressing in a way that works globally and does not require command line switches.
@zrothberg, with lazy imports enabled we can also use libraries that need to rely in side effects, that’s not going away, it’s part of the language.
The difference with lazy imports is that merely importing other modules at the top of a module won’t automatically guarantee all the imported modules will have the chance to do any registration on time. Django is actually a good example of the kind of registration we’d need to have for lazy imports to work. In fact, Django’s classes such as models, for instance, that register themselves as part of an import side effect during the execution of their apps’ models.py, will still work because they have a discovery process that manually (and eagerly) imports all the needed models modules that will register the Model classes. It’s also arguably better to have this kind of exhaustive discovery process (or an explicit registry) instead of relying in any modules doing registration while being imported, at some point, by something, somewhere.
I think you’re overthinking it. You really only need to concern yourself with code that explicitly requires that side-effects due to imports happen. For instance, this really only becomes an issue if you import setup_logging to turn on your logging system instead of doing import setup_logging; setup_logging.start().
True, I suspect users would tell you that detail quickly. Otherwise if people really want to be diligent and have a test that does nothing but imports everything to see if they have a circular import problem they could. But my key point is you don’t have to run your entire test suite twice just for this feature.
I’ll be able to give you a good list of ways to break it at least.
I do think this also has the side effect of this being a bit reaching similar to say async. Not that there is any issue with that. I think the performance improvements alone sound pretty crazy. Would also be exciting to see how it performs for stuff like AWS Lambda.
I think it may also be worthwhile to have a discussion about expanding packaging metadata to have some information about compatibility with this and several other features that are in similar vein of may spontaneously combust like: multiprocessing, threadsafe, async to allow IDEs to utilize that info. I say this as someone who has used many libraries that I later discovered are not fork friendly.
What if we maintain a count of deferred objects in a dict? Once the count hits 0, the lazy lookup can be unset. We’ll need to bother with modifying that count only when adding deferred objects or resolving deferred objects. Not sure if we have the spare bits to manage this counter though.
I’m confused. The only impact here is that in the code
import foo # point 1
#
# A bunch of intervening code
expression_involving(foo.something) # point 2
the top-level code in module foo will run just before “point 2”, rather than at “point 1”. Certainly, changing where that code runs may have an impact. Here’s a trivial example:
import foo
if "foo" in sys.modules:
print("No lazy loading here!")
# Use foo here
But if that matters to you, don’t use the opt-in flag, or simply reference foo straight after the import. I assume that
import foo
foo # To trigger immediate loading of foo
would work?
By the way, I dislike having to use a command line option to enable this. As others have said, it can be tricky to ensure a flag gets passed. For example, a module like pip that is run via py -m pip would need to be run as py -L -m pip to get the benefit from this feature. I can’t imagine us going through changing all of our documentation, and even if we did I bet large chunks of our user base would forget the -L. Can’t it be enabled via some form of directive specified in the code of the application itself?
So I think the example that is problematic is code that registers itself with another module during import. Pydantic has this direct behavior on subclassing and Fastapi uses it to set up the OpenAPI endpoint. Now for my own class it is easy to work around. The concerning ones are actually the ones nested inside other packages.
I have several libraries for AWS specific stuff that I have no idea how they create or import anything. I have no real idea what parts of their system would break. I would end up just disabling this feature for those libraries. The problem is that would just cascade back to all eager imports because more than likely it is importing almost every other dependency I am using for FastAPI and Pydantic. All the value is gone. I think any depency graph with overlap would likely lead to this for libraries that are dependent on import behavior.
I am going to futz around with it later and see where it breaks for this kind of stuff. I have a LOT of code that uses this type of behavior to test aggressively on.
I wonder it would be possible to turn on a warning about that. E.g. if you have lazy imports turned on but your code would break if it is turned off, due to import cycles. Might be hard to do without a performance impact but it would be nice to have the warning. I would like my apps/libraries to be lazy import safe but still work for people with lazy import disabled. A warning would be good enough to ensure I didn’t accidentally introduce an import cycle.
So the best practice with Lazy Imports to run tests with pytest will then be the command python -L -m pytest instead of pytest?
I get the impression that this feature is going to be far more popular than importlib.util.LazyLoader which is explicitly called out in the documentation as “heavily discouraged”, and reading through the PEP it would also apply to more situations than LazyLoader can manage.
What exactly would that look like? A series of tests that import the packages modules with the eager_imports() context manager?
Wouldn’t it be exciting if it this feature was called "heavily encouraged” instead? some times I’ve thought this could be the default for Python imports. But I think we’re a long way away from it (I wouldn’t mind if it happens at some point, but we’ll see). I get that there are a lot of cases where we expect import to immediately load and execute code, but in essence I think it’s thrilling to think about a way to have things imported lazily (only when it’s actually needed) instead of having to pay for the, some times very expensive, eager costs.