PEP 690: Lazy Imports

No-one is suggesting that application users should flip this on. If we need to remove both the env var and -L, leaving only a code-based opt-in, to make this even more clear, IMO that would be fine.

The same way they test any other change to their codebase to make sure their application still works. In practice I have not seen evidence that “turn on lazy imports” is more unsafe than many other invasive changes people make to their Python code all the time, for smaller benefits than lazy imports. Python is very flexible and permissive, thus there are lots of ways that code changes can break unexpectedly in the presence of some edge cases; that’s why Python projects tend to have a lot of tests.

Yep, in practice I think much more so.

Well, the plus side of PEP 690 is that it is actually an extremely effective “go faster” switch :slight_smile: It also comes with caveats.

1 Like

Huh, I consider python command line options and environment variables to be very much end user options, not application choices. If that’s not the intent, then 100% I think they should be removed, and only a setting that can be switched on via the application code be offered.

If it’s an application code change, then I guess so. Consider this another consequence of the confusion caused by proposing this be switched on via user-level command line options, if you like :slightly_smiling_face:

4 Likes

This whole discussion makes me think of zip imports, and how most libraries work completely fine with zip imports without any effort on their part, and the ones that don’t most of them rarely bother to put any effort whatsoever into making them work.

Things like zipapp, pex, etc still tend to work out fine in practice though, because the application author ultimately is in control, and is responsible for ensuring that their deps work when used as a zip app.

5 Likes

Except that PEP 690 is a tool we can provide to application developers so they don’t have to manually hack lazy imports into functions or otherwise. Improving Python startup time is worthy work, but it’s not sufficient to move the needle for users of Python CLIs (and some other cases, such as @carljm describes for developers of their server). People won’t care if Python starts up faster if it still takes a noticeably long time to get to --help or other functionality.

As a Python CLI author, I would much rather enable -L globally, and use the programmatic API or try/except/with hack to specify the eager loading for known problematic modules.

11 Likes

Environment variables maybe, but CLI options seem to be within the domain of the application author. Yes, we need a way to tell entrypoints what their shebang options ought to be, but that something the application author wants to specify, not the end user.

Perhaps analogously, think about using __file__ to locate resources, which won’t work in a zipapp. That’s why we have importlib.resources as a better API. But some libraries still use __file__. As an application author distributing CLIs as zipapp, maybe I need to convince one of my library dependents to use importlib.resources instead of __file__ to get the full benefit. If not, then I might have patch the library myself or use something else. OTOH, PEP 690 seems better positioned to workaround uncooperative dependent libraries, because it defines a number of tools I as an application developer can use to force eager imports for known incompatible libraries my CLI will use.

1 Like

I’m not sure this is really a problem. I don’t care if pip runs a little faster with lazy imports, but I care a lot about the CLIs I’m building at $work, which I control the environment they run in and the shebang options they use.

1 Like

To me that sounds like the sort of thing that has the potential to cause quite a bit of churn for, and generate pushback from, library authors. I don’t want to go around opening issues on the bug trackers of all of my hobby CLI’s dependencies that I’ve tested and know to work when imported lazily so they might declare themselves compatible with PEP 690. Feature “pragmas” also feel out of the norm for Python - has something like this been done before (__future__ imports aside)?

5 Likes

You have the same “problem” with type annotations and there’s the typeshed to work around this.

Furthermore, if you control the CLI code, it wouldn’t be a problem to dynamically mark modules or whole packages as safe for lazy import, after you have tested them.

However, if you are a user and turn on this feature globally in your venv because it makes your code seemingly run faster and things break, you will blame Python and the packages loaded into the venv for being unstable, causing random failures, etc. Why ? Because the errors will not easily be traceable back to the lazy import feature. Not good for Python’s reputation, nor good for the packages you happened to use.

I can sympathize with the argument against -L too: enabling lazy imports is a significant change to an application codebase that requires testing, and should really be done by the application author, not an end user. Both the env var and -L make this feature kind of easy for end-users to abuse with applications that were never tested with it.

When I suggest that we could eliminate those in favor of programmatic opt-in, I still mean a single global opt-in that an application developer can call exactly once in their main module and it will apply to all imports everywhere (unless opted out) from that point forward. (Note that this is also precisely how opt-in works for the existing importlib.LazyLoader, with the addition of an easy way to do per-module opt-out.) So I don’t think it makes the feature too much harder to use for CLI authors; maybe just the right amount harder :wink:

OTOH I’m also still fine with keeping -L and clearly documenting how it should / shouldn’t be used.

Can you explain this reasoning in more detail? Specifically, why would -L give the end user receiving a Python application a vector for abuse? To me, -L would firmly be an opt-in for the application author, not end user, and thus testable in the application author’s testing matrix to ensure it functions correctly.

That’s as opposed to the envar, which does give the end user a vector to turn on lazy imports when the application doesn’t claim to support it. I would definitely keep -L but I’d be okay with eliminating the envar, provided that entry points had a way of specifying shebang options. (Otherwise, the envar is a useful convenience, but maybe it can be underscored or undocumented to signify buyer beware?)

The problem with is that things like .pth files mess with an application author’s ability to control when their main entry point gets run. Don’t get me started on .pth files, but given that they still exist, I think it makes programmatic enabling of lazy imports more problematic.

1 Like

I definitely agree it’s less easily/likely abused than the env var. But there are a wide variety of Python applications, distributed to end users in a variety of ways (not always via pip/setuptools entrypoints), and some of them get run with python somescript.py or python -m someapp; in these cases it might be tempting to just slip in a -L.

It’s true that a general issue with programmatic opt-in is that some amount of imports in core startup will happen eagerly before the opt-in, and pth files could potentially cause even more of that.

Agreed, a command line option makes far more sense in an environment where the invocation is tightly controlled. I’m not sure that’s the most common situation, though. And I am sure that an API to “flip the switch” in the application’s main routine is a lot more focussed on being something that’s owned by the application author rather than the end user.

I was just pointed to SPEC 1 – a proposal for lazy loading from the SciPy world. It would be useful to compare and contrast the two proposals.

2 Likes

That uses 3.7’s module level __getattr__ and __dir__ to implement the laziness. lazy_loader · PyPI. I’ve used a similar approach to help solve a major performance problem due to overly-importy generated code in the past. But it is effectively something that either every module that wants to be lazy needs to self opt-in to OR packages to be lazily loaded need to be enumerated explicitly in reasonably top level early on executed code to pre-setup their lazy stubs.

Entirely doable as a library today (see above) - but requiring boilerplate to enumerate and opt transitive deps in rather than opting specific things out. You can get decent results out of this approach when you time all of your imports and identify a set of largest offenders. It’s a similar, yet different, maintenance story as it inverts the lists that need to be maintained. The PEP authors practical experience seems to suggest that the list of opt-outs has proven to be a lot smaller than the inverse.

3 Likes

The SPEC-1 proposal uses the explicit is better than implicit approach, where the library authors have to explicitly declare which of the sub-modules in a package should be lazy loaded.

They apparently previously used a lazy per default mechanism (for sub-modules in the package), but dropped it:

For a while, SciPy had a lazy loading mechanism called PackageLoader . It was eventually dropped, because it failed frequently and in confusing ways—especially when used with interactive prompts.

I have been using lazy imports in mxDateTime in a similar way for quite a while. The approach works great, if you know what you’re doing and control the code.

Library authors of packages which do require quite a bit of startup time will most likely use such an explicit feature to enhance their user’s experience. Adoption would take longer, but also be a lot safer and what’s even more important: Python users would learn about the feature during the adoption process, which will help them debug problems arising from such late imports.

I see no scenario in which slowly gaining adoption will actually happen, as nice as we may find the idea. What is the carrot for library owners? The incentives do not exist for most library maintainers to care about larger application startup time.

PEP 690 seems to be about providing a practical tool to enable application owners who control their dependencies and integration testing to take ownership of reducing their own startup time without waiting for uncountable uncoordinated library owners to do something they have no incentive to care about. I doubt most people will use it but seems valuable to those with a need.

3 Likes

Another research datapoint worth investigating: Look at what Mercurial has done. They long fought startup time battles due to Python. Have they rewritten their entry point in another language yet? I believe they had their own wide python version compatibility spanning lazy import mechanism?

1 Like

First of all, thanks for working on this! The positive impact would be huge.

The most important thing I think you should consider: at a fundament level, lazy imports effectively turn the import syntax into a declaration instead of a statement. This realization helped me look at the various semantic changes in a useful way. With this in mind, you could rename the PEP title to “Import as a Declaration”, for example. I think this different point of view could also make the concept easier to communicate and clarify potential pain points.

Of course, import would still act like a statement in some contexts, which would be at least a little confusing to most Python users. Would it be a pain point? Would it complicate learning the language? (One value to the import-as-a-declaration concept is it makes that dichotomy much more pronounced.) This duality deserves more discussion.

I have a more observations that I’ll include in a follow-up message.

5 Likes

My other observations…

– Opting In –

From my point of view, the recommended way for an application to opt in should be with sys.enable_lazy_imports() at the top of the script or main.py.

-L would be useful to try it out for an app, but we should actively discourage it as the primary way for an application to opt in.

Also, count me as part of the group that would rather not support an env var.

– Focus on Opting Out? –

It may make sense to put the focus on opting out, instead of opting in, especially if we eventually want to make lazy imports the default behavior. I noticed that a number of comments in this thread make a bit more sense in terms of opting out.

For example, instead of asking libraries to opt in, whether globally or per-module, why not recommend that they explicitly disallow use with lazy imports:

# mylib/__init__.py
import sys

if sys.is_lazy_imports_enabled():
    raise ImportError('mylib does not support lazy imports (yet)')
...

We could even provide something like importlib.disallow_lazy_imports() to do that as a helper. It could also be something like importlib.lazy_import_unsafe() to make it sound more ominous.

This would give library authors a simple solution that they can use temporarily, which avoids any hard-to-diagnose failures for their users. We would make this the prominent recommendation for now.

It may also make sense to focus the various proposed API on the operation we expect to go away. So:

  • sys.is_lazy_imports_enable()sys.is_eager_imports_enabled()
  • importlib.disallow_lazy_imports()importlib.require_eager_imports()
  • sys.enable_lazy_imports()sys.disable_eager_imports()
  • (keep importlib.eager_imports(), etc.)

– Changing the Default Behavior –

Perhaps I missed it, but there didn’t seem to be any discussion about if/when lazy imports would become the default. It makes sense to plan for that. At the least the PEP should say “This proposal does not include any considerations for making lazy imports the default behavior.”, though I’d expect it to at least define a basic plan.

– Backward Incompatibilities –

Clearly these are the sticky points to this proposal. :slight_smile:

Import Side Effects

IMHO, things like registration-as-a-side-effect are a code smell. I’m definitely a big fan of being explicit about composing things together and activating functionality. However, it may not be so clear to everyone, nor clear what to do about it.

One thing that the PEP could do to help library authors (and probably application/script devs) is to give clear guidance (best effort) on how to replace import side-effect patterns with a corresponding side-effect-free alternative. (It wouldn’t hurt to identify the pros and cons of the side-effect patterns.) I can imagine an enumeration of each pattern, with use cases and examples of the before & after. The docs would benefit from a similar treatment.

Dynamic Paths

This is definitely a tricky one. It could be really hard to diagnose resulting failures (spooky action at a distance, etc.).

The only solution that comes to mind is to cache (on the module) the full import state relative to each import statement and use that later for the corresponding lazy imports. However, that seems pretty expensive, especially for something that won’t be a problem often.

Perhaps a less expensive solution would be to remember the dict/list version of each part of the import state where the import statement is. Then when the lazy import happens, emit a warning if those versions changed. (A dict watcher would probably reduce the cost, but wouldn’t help with the lists, e.g. sys.path.)

Half-lazy imports would mostly resolve the problem, but the perf penalty is too high.

Deferred Exceptions

One thing came to mind that could help with this: preserve a traceback (or similar) relative to the import statement. Then, if an ImportError gets raised for the lazy import, set __cause__ (or would it be __context__?) on that error to a copy with the preserved traceback.

– Rejected Ideas –

Per-module opt-in

I agree that this isn’t worth the trouble. It would be nothing more than busy work in most cases.

Also, I’d favor opt-out (e.g. my importlib.disallow_lazy_imports() example above).

Explicit syntax for lazy imports

I agree that this wouldn’t pay for itself. The context manager would be more than enough and doesn’t affect readability all that much. IMHO, the only advantage of the syntax would be the little bit of help the compiler could provide to make the operation more efficient.

If we were to have explicit syntax, I’d argue it would make more sense to (again) focus on opt-out, e.g. eager import.

Half-lazy imports

agreed on the perf penalty

I was going to suggest caching the module spec, but just can’t get past the performance hit.

Lazy dynamic imports

agreed

Deep eager-imports override

agreed (plus it’s easier to add later if needed than to remove it if not)

Other Thoughts

  • consider making builtin/frozen modules always eager (probably too messy)
5 Likes