PEP 810: Explicit lazy imports

We could special case LOAD_FAST but that can have performance implications and it will only be done to cover a very weird case (accessing the thing via globals) so we think this is the correct compromise for this edge case. I also suspect @markshannon will not be a fan if we mess with LOAD_FAST.

What you want the PEP to say? That accessing an arbitrary attribute on a PyLazyImportObject will trigger an error? I think we can say that but is a bit of a slippery slope: should we say all the ways doing things with a PyLazyImportObject can create errors?

Thanks @markshannon for your thoughtful comments. I suspect that everyone wants the same thing here.

I wonder if it would be more productive since Discourse is difficult (at least for me) to add your specific thoughts on the PEP itself as comments. It might reduce the cognitive load and help with misinterpretation of ideas.

2 Likes

It may be worth having an one of the examples in the reification section demonstrate that any access of any lazy object from a global variable - not just those directly from lazy import statements - will cause that object to reify.

Currently you have the example of resolved where you explicitly call .resolve()[1], although the result of the subsequent print statements would be the same without this call as the lookup of resolved for the print statement would reify.


  1. I assume that this has replaced .get() but that it hasn’t made it into the reference implementation yet. ↩︎

2 Likes

Thanks for the comment, this is a good suggestion. I will discuss this with the team since they are already concerned about the size of the PEP (with good reason) and the longer the PEP the harder is to evaluate and read. I still think this is net positive so I will try to make a case but please understand if we decided to not add it at the end.

2 Likes

I think “more complicated” and “creates more edge cases” are both essentially ways to say the same thing, no? Or at least getting at the same thing: we are trying to estimate the likely “blast radius” of the change, and the more complex something is the more ways that it can go wrong — hence more edge cases.

In this vein, I would like to argue that the global reference swapping approach feels dramatically less complicated to me than the object transformation approach. I am very used to references changing their values, that happens every time I reassign a variable. Objects mutating on the other hand, always feels dangerous because there’s a bunch of ways it can go wrong: if you don’t perfectly do it atomically, you may end up in weird intermediate states. Objects changing class is a bird of a different feather altogether and feels unprecedented.

Honestly, though, discussions of which approach is more complex or creates more edge cases feels like it buries the lede here — if you are going to support lazy from imports (which I think we should), the transformation approach is unworkable, which means that the transformation approach is necessarily more complicated and creates more edge cases, because you end up needing to do both things, so you get all the downsides of the transformation approach when using modules, all the downsides of the reificiation approach when doing attribute imports, and then on top of that you have the added complexity and mental load coming from the fact that you have to deal with both at the same time (plus users will not understand that these two things will be using very diferent mechanisms).

11 Likes

In relation to something as simple as:

from stdlib_new_module import lazy_import, resolve_lazy_import

np = lazy_import('numpy')
print(type(np))    # LazyImport

def foo():
    # Will only use it once here
    a = np.array([])

def bar():
    # Will use it many times
    # So to not incur LazyImport (proxy) cost
    np = resolve_lazy_import(np)
    b = np.array([])
    c = np.array([])
    d = np.array([])
    e = np.array([])

what benefits does this offer?
(Not taking into account things that were done that are not implementation specific, such as filtering, etc)

The lazy import demonstrated in the importlib documentation would handle that without the need for your resolve function. You can see the arguments against that here.

One obvious difference is if you want to re-export specific objects from a submodule to provide a nicer API without incurring the cost of importing the submodule eagerly. Currently in order to do this you have to do significant extra work, especially if you want it to play nicely with static analysis tools.

2 Likes

I think you should follow the implementation here. I can break your assumptions about the specialization of LOAD_GLOBAL if you allow it in a loop.

I’m curious what the issue here is, because I’d be concerned if you can break it in a loop you can break it a function as well.

One thing that wasn’t immediately obvious was we need to forcibly clear the keys version on reification because updating a value alone doesn’t alone update the version number. The alternative would be a check for lazy import objects in the specialized functions which seemed worse.

So I think a loop which for example imports, reifies, and then loops would probably just keep the version number unassigned or repeatedly re-assign it depending on what else was going on.

But this is currently just a consequence of checking if there’s any fblocks to disallow it in the compiler. When we allow it in the with statement the check will be more nuanced and we can make an explicit decision on loops if there’s a strong argument to disallow them.

A lot of information here that came rather quickly - thank you for links.

So 3 points there:

  1. Requires verbose setup code for each lazy import.

Why can’t it be abstracted? I used it initially and I don’t think I needed anything else, but to call wrapper function.

(Ended up not using it as it was rather slow compared to other approaches, but if it was “leveled up”, I would likely come back to it)

  1. Doesn’t work well with from ... import statements.

Surely some work can be done to address this limitation? Might be involved. But could be worthwhile?

Given no need to resolve to things such as “name bindings changing when accessed is new and alien” (@markshannon).

I would scrap “alien”, but it is new and this has never been considered a favourable practice (possibly falling under the umbrella of “black magic”). And for good reasons.

  1. Less clear and standard than dedicated syntax.

Less clear? - yes. Dedicated syntax is (almost) always more clear and convenient.
Less standard? - currently it is more standard. And if it was possible to address all limitations it could become widely adopted.


One thing, however, is undeniable:

This, I don’t think can be easily addressed without new syntax.

But I am just unsure whether costs are worth it.


This currently looks good given things stay static forever.
All issues for this specific path have been solved to sufficient degree.
But this has a higher than average chance of causing issues / becoming hindrance for new (likely more game-changing) ideas in the future.

I gotta say that objects changing their __class__ at runtime when you access them seems way weirder and more surprising than name bindings updating? Like your calling the current approach “alien” but having an object literally transform its type under you is even more alien IMO. I cannot imagine how that plays with caches and MRO and garbage collection and all that!

And it doesn’t even solve the whole problem since you’d still need the replacement mechanism for lazy from X import Y anyway. So now we have two different reification strategies instead of one consistent approach which seems objectively worse.

On the C API stuff: i am not an expert on this but most C API calls on modules should just be import + getting and attribute and that will work. I have never seen import + getting __dict__ + getting the value from it. Also even if that was done you can call the thing that solves the object and you are done. Is not like it’s going to do something that will crash the interpreter or something.

I think that the edge cases your worried about like globals()['a'] is globals()['b'] and the other ones are so obscure compared to the cognitive load of “sometimes the object changes class, sometimes we replace it, sometimes it does something different depends which import syntax you used”. That just seems harder to teach and reason about?

1 Like

If accessing a module’s __dict__ does not trigger reification of lazy imports therein, that means the PyLazyImportObject could become visible to C extensions, as C extensions often (usually for perceived performance reasons) access attributes via dictionaries, not through PyObject_GetAttr().

I think anyone doing this is already broken though… A module could define __getattr__ (maybe because they want to lazily load things!) and the C code here would be missing values that already exist. The failure mode is a little bit different but it’s still a failure.

3 Likes

A comment way above asked for this, but there wasn’t a response, I think, so let me restate this:

If implemented, and used in my project, how can I unit-test the expected lazy imports?

(That is, given import foo , foo.bar (reference) is lazy and not eager or resolved)

I don’t think there is a good way, and not just because referencing something reifies it. We could easily add a function that tries to “resolve” a reference as a string, but the fundamental problem is that modules are interpreter-global state, which is fundamentally problematic for a unit test. I’m not sure if there’s a good way to unit test this that doesn’t inherently rely on specific implementation details (like the tests in the prototype do).

An systems test shouldn’t be too hard, depending on the modules involved. Testing for the side-effects of the module, or checking if the module exists in sys.modules, or using the audit hook to verify the module isn’t loaded until later. If this is a common need we can add something to the unittest module (which, after all, isn’t really about unit tests anymore.)

1 Like

Would it work to access the __dict__ (won’t reify) and assert that you have a types.LazyImportType?

That will tell you if you have a lazy object, but not that you’ve avoided importing the module. Something else in your code can have caused the module to be reified (through a different lazy object), or maybe just caused the module to be import eagerly.

2 Likes

The way to do it is to import the module you’re investigating in a minimal script running in a Python subprocess. You pass a module name to the subprocess, it computes the difference between sys.modules.keys() before and after importing the module, and prints out a list in reply.

There’s a minor issue about debugging prints an warnings that can get mixed up with the output if you’re not careful, but that’s manageable. Or you could communicate the list using files or pipes or whatever to get around it completely. I never bothered.

Once you have the list of modules imported indirectly, then you can check that none of the expected lazy imports are on that list. Or you could just have the subprocess do a check of the names in __lazy_modules__, if that’s what you’re using.

1 Like

I’m wondering about how the migration to the lazy imports will look like in practice.

The proposal mentions __lazy_modules__, which will work for incrementally changing non-lazy imports to lazy ones, and thus profiting from the performance gains at least on newer Python versions.

However, most libraries most likely already implement some form of lazy-importing for expensive imports (say by just importing it in the methods that actually use the functionality, or by implementing a custom lazy_import function, …). How would the migration look in this case? You cannot really convert those to ordinary imports + __lazy_modules__ because that would introduce a performance penalty on older Python versions.

Maybe it would be helpful to temporarily introduce a lazy_import method (in some additional python package) to facilitate such migration scenarios?

1 Like

The idiomatic way to do that, IMHO, is just to spawn a subprocess with your unit test.

1 Like

Perhaps they can stick to their own system, until the maximum version without lazy imports has reached EOL. That would generally mean, that if lazy imports were introduced in 3.15, they’ll have to wait for 3.14 to reach EOL status, and then switch from their own system to the newer one. That would ensure there being no syntax errors at all, which obviously is something desirable.

2 Likes