Question: Understanding imports a bit better - how are cycles avoided?

First off, I don’t know if import falls under the packaging domain, apologies if this is the wrong place to ask this question.

I’m trying to get a better mental model for the way imports resolve, and I’m getting tangled up in the sample code I’m writing.

Given the following files:
test/__init__.py

from foo import Foo
__all__ = ["Foo"]

test/foo.py

class Foo: pass

Something like import test.Foo will work just fine. If I try to list the steps, it looks like there should be a cycle though:

  • run test/__init__.py → encounters import foo
  • try to run test/foo.py, but that needed to run test/init.py before → cycle

Or, in words, how can a parent package import something from a child without creating a cycle? Do __init__.py files get a special treatment, somehow, to make things work?

My actual problem at hand was that I wanted to answer the question “which imports trigger when I run this file” for modules in a package I was writing. One assumption was that all imports that happen in an __init__.py would propagate to all sub-modules. The result was that there would be a ton of cycles, which was obviously wrong, but I couldn’t suss out where exactly my assumption was wrong, because in the end, import x.y.z will run x/__init__.py and x/y/__init__.py before x/y/z.py gets its turn.

What I’m afraid of is that the answer to my initial question, e.g. “through some black magic (maybe partial modules?)”, won’t help much in answering the second one.

The sequence of steps you’re looking for are:

  • import test
    • creates module object test
    • sets test.__path__ (among other things)
    • sets sys.modules['test'] to the new module
    • executes test/__init__.py, which encounters …
  • from test.foo import Foo (swapping to fully quilified import here)
    • finds sys.modules['test']
    • searches test.__path__ for search paths
    • finds test/foo.py
    • creates module object foo
    • sets sys.modules['test.foo'] to the new module
    • executes test/foo.py, which encounters …
  • class Foo
    • executes the body pass
    • creates the class object type("Foo", (), {})
    • assigns the class object to Foo
    • returns back to executing test/__init__.py
  • ... import Foo
    • gets Foo attribute of test.foo (or would try another import if it isn’t there)
    • assigns it to Foo in the current module (test)
  • __all__ = ["Foo"]
    • just an assignment of a literal
  • returns back to whoever imported test in the first place

So the cycles are mainly avoided by looking in sys.modules first when importing a module, and storing new modules in sys.modules before executing their code. So there’s absolutely a partial module in place when you are importing, but as long as anything you need has already been assigned (and __path__ is most common) then you can use it just fine.

I believe the error message you get warning about cyclic imports is actually a heuristic. The actual error will be AttributeError (or potentially NameError? I might be wrong on that one), but the importlib module will see the exception first. It knows that it’s in a recursive import situation, so it replaces the error with one saying that it’s probably due to a cycle, but sometimes it isn’t due to the cycle.

(e.g. pyzmq has/had a situation where a dynamic import would fail, and because it fails at a point where a submodule is importing a different submodule via the top-level module, the importer assumes it’s because the partial module doesn’t have the name yet. But it’s actually totally unrelated!)

1 Like

And yeah, this isn’t really packaging specific, though having to debug import problems normally comes up because you’re doing packaging of some sort. I don’t think it’s off topic here, even though all the relevant implementation lives in the standard library (specifically importlib and its “bootstrap” submodules).

I don’t follow - why should that be necessary? The code inside foo.py doesn’t involve using anything that’s defined in __init__.py. I mean, yes, as it happens Python will be in the middle of running __init__.py when it starts running foo.py, but foo.py isn’t looking back into that module state when the top-level code runs.

Thanks for the detailed run-down @steve.dower, that answered all my question. The bit I was missing seems to have been “why do modules need to execute a local __init__.py in the first place”, I didn’t know that that creates the __path__ (on which code in the __init__.py may rely on as well, if I read that correctly), makes sense though.

For practical purposes in my second problem, I’m going to pretend that a module always imports its __init__.py file, even if its technically not correct. I don’t know how else to express “importing test.foo will execute test/__init__.py”.

Yeah, that’s where my knowledge of imports was off. I thought that importing something like test.foo would require any test.__init__.py to have been executed before, because that’s the order that you can observe during runtime if you have print statements (and nothing else) in either file, and run import test.foo from a repl.

Learning that __path__ initialization happens before the first line of code in __init__.py is run, and that that is the only thing that test.foo needs from __init__.py (unless it’s explicitly importing from it, at which normal import-cycle rules apply) was the kicker.

Also cheers, didn’t know you hang out here as well :wave:

Just one more explicit note here, import test.foo is literally (by specification) executed like:

import test
import test.foo

That is, all parent modules are fully imported before the final one. Which is also why it’s the one that’s bound in the main scope. (It’s also how import os.path works despite os.py not being a package - it sets sys.modules['os.path'] while importing os, and then the import os.path finds it cached and doesn’t even search for an actual file.)

Your original post mentioned “running test/foo.py”, which can be interpreted as python3 test/foo.py which does not touch test/__init__.py at all, since there’s no import going on (at least until you do import test - and import test.foo will not get the same module as the one you started running). I assumed you meant import rather than run, since it made your later assumptions make more sense.

1 Like

Your original post mentioned “running test/foo.py”, […]

You’re right, I meant “test/foo.py is executed because it was imported with import test.foo”.

That is, all parent modules are fully imported before the final one.

Except if a parent imports something from a child module, right?

Once something’s been imported once (even if it’s still being executed), subsequent imports fetch the same module object - it’s just test = sys.modules["test"] without re-executing anything.

I see, thanks. I really appreciate the detailed feedback and all your patience, as far as I’m concerned everything is clear now.

2 Likes

I came here because of trying to sort out the details of Tkinter installation for that one Stack Overflow canonical, and liked it enough to stay :slight_smile:

1 Like