First off, I don’t know if import falls under the packaging domain, apologies if this is the wrong place to ask this question.
I’m trying to get a better mental model for the way imports resolve, and I’m getting tangled up in the sample code I’m writing.
Given the following files: test/__init__.py
from foo import Foo
__all__ = ["Foo"]
test/foo.py
class Foo: pass
Something like import test.Foo will work just fine. If I try to list the steps, it looks like there should be a cycle though:
run test/__init__.py → encounters import foo
try to run test/foo.py, but that needed to run test/init.py before → cycle
Or, in words, how can a parent package import something from a child without creating a cycle? Do __init__.py files get a special treatment, somehow, to make things work?
My actual problem at hand was that I wanted to answer the question “which imports trigger when I run this file” for modules in a package I was writing. One assumption was that all imports that happen in an __init__.py would propagate to all sub-modules. The result was that there would be a ton of cycles, which was obviously wrong, but I couldn’t suss out where exactly my assumption was wrong, because in the end, import x.y.zwill run x/__init__.py and x/y/__init__.py before x/y/z.py gets its turn.
What I’m afraid of is that the answer to my initial question, e.g. “through some black magic (maybe partial modules?)”, won’t help much in answering the second one.
from test.foo import Foo (swapping to fully quilified import here)
finds sys.modules['test']
searches test.__path__ for search paths
finds test/foo.py
creates module object foo
sets sys.modules['test.foo'] to the new module
executes test/foo.py, which encounters …
class Foo
executes the body pass
creates the class object type("Foo", (), {})
assigns the class object to Foo
returns back to executing test/__init__.py
... import Foo
gets Foo attribute of test.foo (or would try another import if it isn’t there)
assigns it to Foo in the current module (test)
__all__ = ["Foo"]
just an assignment of a literal
returns back to whoever imported test in the first place
So the cycles are mainly avoided by looking in sys.modules first when importing a module, and storing new modules in sys.modules before executing their code. So there’s absolutely a partial module in place when you are importing, but as long as anything you need has already been assigned (and __path__ is most common) then you can use it just fine.
I believe the error message you get warning about cyclic imports is actually a heuristic. The actual error will be AttributeError (or potentially NameError? I might be wrong on that one), but the importlib module will see the exception first. It knows that it’s in a recursive import situation, so it replaces the error with one saying that it’s probably due to a cycle, but sometimes it isn’t due to the cycle.
(e.g. pyzmq has/had a situation where a dynamic import would fail, and because it fails at a point where a submodule is importing a different submodule via the top-level module, the importer assumes it’s because the partial module doesn’t have the name yet. But it’s actually totally unrelated!)
And yeah, this isn’t really packaging specific, though having to debug import problems normally comes up because you’re doing packaging of some sort. I don’t think it’s off topic here, even though all the relevant implementation lives in the standard library (specifically importlib and its “bootstrap” submodules).
I don’t follow - why should that be necessary? The code inside foo.py doesn’t involve using anything that’s defined in __init__.py. I mean, yes, as it happens Python will be in the middle of running __init__.py when it starts running foo.py, but foo.pyisn’t looking back into that module state when the top-level code runs.
Thanks for the detailed run-down @steve.dower, that answered all my question. The bit I was missing seems to have been “why do modules need to execute a local __init__.py in the first place”, I didn’t know that that creates the __path__ (on which code in the __init__.py may rely on as well, if I read that correctly), makes sense though.
For practical purposes in my second problem, I’m going to pretend that a module always imports its __init__.py file, even if its technically not correct. I don’t know how else to express “importing test.foo will execute test/__init__.py”.
Yeah, that’s where my knowledge of imports was off. I thought that importing something like test.foo would require any test.__init__.py to have been executed before, because that’s the order that you can observe during runtime if you have print statements (and nothing else) in either file, and run import test.foo from a repl.
Learning that __path__ initialization happens before the first line of code in __init__.py is run, and that that is the only thing that test.foo needs from __init__.py (unless it’s explicitly importing from it, at which normal import-cycle rules apply) was the kicker.
Also cheers, didn’t know you hang out here as well
Just one more explicit note here, import test.foo is literally (by specification) executed like:
import test
import test.foo
That is, all parent modules are fully imported before the final one. Which is also why it’s the one that’s bound in the main scope. (It’s also how import os.path works despite os.py not being a package - it sets sys.modules['os.path'] while importing os, and then the import os.path finds it cached and doesn’t even search for an actual file.)
Your original post mentioned “running test/foo.py”, which can be interpreted as python3 test/foo.py which does not touch test/__init__.py at all, since there’s no import going on (at least until you do import test - and import test.foowill not get the same module as the one you started running). I assumed you meant import rather than run, since it made your later assumptions make more sense.
Once something’s been imported once (even if it’s still being executed), subsequent imports fetch the same module object - it’s just test = sys.modules["test"] without re-executing anything.