Bear in mind that as mentioned before this is an advanced caveat emptor feature (so users are expected to have to think what they are doing as it comes with a set of advantages and disadvantages) but just to be clear: security-sensitive tools like pip can and should disable laziness right at startup with sys.set_lazy_imports("disabled") (or install a filter that forces eager imports). That way the environment variable can’t affect them. The with trick would work too since those imports are always eager, but it’s potentially suboptimal compared to just turning off the flag once at the beginning.
Edit: The original context for this reply was a now-edited reply, apologies to future readers!
The current section says that ‘There are no known security vulnerabilities introduced by lazy imports.’, I think it might benefit from a short reference to a situation like in pip, and a reminder to the reader that sys.set_lazy_imports('disabled') may be used in such a case.
An alternative might be to note the precedence of the -X flag vs the environment variable vs the function. As with Damian it wasn’t immediately clear to me that the function overrides the other two.
I’m assuming you mean, “How large of a win can a user expect when leveraging lazy imports?” I’d suggest taking a read over these three articles on how similar implementations have performed on forks of cpython at scale:
While they each use a different implementation of Lazy Imports than the one we’re proposing in this PEP, the implementation in PEP-810 would deliver similar results.
+1 for this PEP - While there is the mentioned third-party integration via lazy_loader, the hoops you have to jump through to get type checking to work along with it are not ideal.
I have a suggestion on import side-effects. While it’s probably not the best idea in general to have import side effects, people do make use of it, and accidentally lazy-loading something like that could cause issues. What about some way to flag a module as not lazy-loadable? e.g.
# not_lazy_loadable.py
import foo
import bar
__lazy_loadable__ = False
...
Then, the following code would raise an error (syntax? import?)
On reading some of these comments and rereading the PEP, the option to make all imports potentially lazy seems like an odd thing. What is the purpose of having -X lazy_imports=enabled? That seems somewhat dangerous because now code that was never written with lazy imports in mind will suddenly have a potentially different execution order.
Is the idea to give people an easy option to try it out and see if it results in a performance win / not have to wait for maintainers to update their code? It I was designing this I wouldn’t include that option and force the explicit use of the lazy keyword (or mark them in __lazy_modules__) where it was potentially desired (unless you globally disabled it).
Now, I’m not recommending this as an actual change, but if an unmarked import could be lazy or eager depending on the context, it might make sense to add another soft eager import foo to mark that an import is never lazy even if PYTHON_LAZY_IMPORTS=enabled. Or at least I might intuit that such a keyword could be reasonably available (even though it isn’t).
But that is predicated on the idea that a regular import could be potentially lazy, which IMO is just slightly non-intuitive. I would have designed this such that an umarked import was always eager, with the new lazy import being potentially lazy.
Granted, this PEP is such a huge win, I don’t want to bikeshed about minor details too much. But I think an improvement could be discussing the rational on the use-case for the 3 modes. As they are “default” and “disabled” make a log of sense. But a discussion on why you would ever want “enabled” could be useful. It might also be beneficial to think about better values to represent these behaviors, to me PYTHON_LAZY_IMPORTS=enabled just implies that the lazy keyword is being (potentially) respected. I had to think about it to clearly distinguish why it was different from “disabled”. I think the current values might be slightly difficult to teach. Why should a variable having to do with lazy imports have any impact on non-lazy imports?
But in short: we believe this global mode is important because there is a real and pressing need for it: large organizations are already forking Python or patching their own builds to enable global lazy imports (see the links in the discussion, the PEP and other material). Providing an official, supported mechanism means they can experiment and deploy without diverging from upstream CPython. We don’t want to make the global mode the main point of the PEP, but we do think it’s an important advanced feature: the fact that people are already forking Python to get it shows a strong need, and we believe providing it upstream is the right way to satisfy that need without splitting the community.
It’s intentionally marked and documented as an advanced feature, not something for day-to-day users or libraries. The default remains fully explicit: only lazy import (or __lazy_modules__) changes behavior. But for operators running huge fleets or codebases, having a single switch plus a filter mechanism is critical to get performance and memory improvements quickly, without waiting for every package to adopt the syntax.
So the “enabled” mode exists to serve that specific need. For most developers, nothing changes; for large deployments, they get a supported, standardized way to do what they are already doing today.
For checking if the module has the flag you need to import the module and that basically removes all the benefit. Something goes for any way of “marking it” since modules can be technically anything.
Anything more advanced such as a way to mark this in packaging metadata is explicitly left out of scope and we will explore it once we have the basic core
I think even with the explicit only lazy import leaves a subtle hazard for operators of systems where individual request latency matters, especially if they are working with free-threading. For batch processing and interactive use cases this is a definitive win and dramatically cleaner and easier than moving imports to be inside of where they are used. That effectively moves the latency from “import statement” to “first use” which is a significant win. Yes, that could have side effects if imports touch global state (ex. logging configuration, etc) but that’s fine to me as the PEP addresses.
What I see as happening for Ops/SRE teams though is:
“New version of Python with PEP 810 implemented ships”
“Developer updates to newer CPython”
1+ dependency is updated to use lazy imports
Latency gets bad for a subset of endpoints in production because there is global locking as the lazy import is “reified” on first use in a request handler (in free-threading potentially blocking other threads running)
Someone eventually notices and debugs
“Oh, I need to set this global flag to disable it”
deploy, fixed.
A team that has never read this PEP and is unaware of lazy updating CPython + one dependency can significantly change runtime in a problematic potentially hard to debug way. Organizations with robust teams/infrastructure to measure, deploy and tune can deploy globally the flag and accept that risk/cost and flag all their deployments. As it is though a team upgrading to the newer python would need to become PEP 810 aware to be able to upgrade a production system safely which seems like a bad breakage. Just tweaking the default to disabled I think makes it possible for the ecosystem to evolve and orgs which benefit to enable without requiring all operators to learn the PEP.
I do wonder if it would make sense to add an eager soft keyword as well. It would certainly be an advanced feature but give library authors the ability to overwrite the global import global flag if they know an imports needs to happen eagerly. (Similarly __eager_modules__ could also be added.)
I am a fan of lazy imports as a concept and a +1 on this proposal. (I’d personally like to see a way to enable this at a module level, similar to from __future__ import annotations in its ergonomics, but I could understand why that would be omitted.)
As a minor nit, the PEP sort of implies that PEP 690 was opt-out (by saying that PEP 810 is “explicit, opt-in”, unlike PEP 690), but IIUC, PEP 690 was also designed to be opt-in, but at a global level? I may be missing some of the nuance in the extensive PEP 690 discussion though. I’ll also note that explicit syntax was a rejected idea in PEP 690 so I’m guessing there are some relevant arguments for/against this PEP’s approach buried in those threads.
Is there any way to pass a lazy object to a function w/o immediately trigger its reification?
Usecase: Marshmallow has a module to generate schemas from SQLAlchemy models, e.g. something like this:
from marshmallow_sqlalchemy import SQLAlchemyAutoSchema
from mymodels import User
class UserSchema(SQLAlchemyAutoSchema):
class Meta:
model = User
This is very useful, except that, with the schema definition being evaluated at import time, it:
requires a non-lazy import (and even right now there’s no clean way to use a local import for this, short of wrapping the whole schema definition in a function, making it more cumbersome to use)
triggers SQLAlchemy model/relationship initialization (this is a problem in MM though, which could probably be solved already now by deferring initialization until the first time it’s used)
But even w/o the SQLAlchemy-related issue, eagerly importing models for use in schemas is a great and very annoying source of circular import issues.
So if the above snippet worked as expected when using lazy from mymodels import User, this would be amazing. However, based on my current understanding of the PEP any kind of access to the imported name triggers loading, not just doing something with it (such as accessing an attribute on it). So I guess MM would need to support something like model = lambda: User to avoid triggering the reification too soon?
I really like this PEP, I do feel it is missing the potential usage/issues with optional dependencies . Say a library like pandas which uses xlrd library optionally. Normally the import is only done when the dependency is needed.
def run():
import xlrd
...
This PEP would allow a library like pandas to put this at the top level.
lazy import xlrd
def run():
...
This is great for ergonomics purpose and prevents having to import xlrd at numerous locations, but then leads to an issue if a user disables lazy load. Now even a user of pandas doesn’t need to use xlrd they would be forced to install it. This could make the lazy import feature not so easy to disable.
Now i expect pandas probably wouldn’t change especially for a while to support older pythons. But if other libraries start using this method, it could lead to people being edged out of disabling this feature.