Implicitly packaging scripts to enable more intuitive relative imports

Multihuntr · November 15, 2024, 3:40am

Problem

I believe that the current interaction between packages, scripts and relative imports to be somewhat unintuitive.

Examples:

I have an idea that would be more intuitive IMO.

In my view these posts all stem from a single misunderstanding. People expect a project folder to be treated similarly to sub-folders; that a sub-package is self-same to the project folder, just further down the file tree. And then they try to do a relative import and get an error that confuses them:

ImportError: attempted relative import beyond top-level package

Or

ImportError: attempted relative import with no known parent package

They think to themself: Wait, aren’t I working on a package? Isn’t that precisely what I’m doing? Why isn’t my project folder a “package”?

This most often seems to come up with writing tests, because the recommended file structure for test files leads to this situation.

Examples of intuition

Let’s say we have the following structure:

project_folder/
    |-- __init__.py
    |-- outer_script.py
    |-- module_a.py
    |-- module_b/
        |-- __init__.py
        |-- inner_script.py
        |-- has_sibling_import.py  # from . import module_y
        |-- has_parent_import.py   # from .. import module_a
        |-- module_y.py
        |-- module_z/
            |-- __init__.py
            |-- has_parent_import.py  # from .. import module_y

Let’s look at what’s allowed:

python outer_script.py is allowed to import module_b.has_sibling_import.
cd .. && python -m project_folder.outer_script is allowed to from .module_b import has_parent_import
python outer_script.py is allowed to import module_b.module_z.has_parent_import.

And what’s not allowed:

python module_b/inner_script.py can run import has_sibling_import, but it will fail because of the relative import contained within.
python outer_script.py can run import module_b.has_parent_import, but it will fail because of the relative import contained within.
python module_b/inner_script.py can run import module_z.has_parent_import, but it will fail because of the relative import contained within.

This seems pretty clearly unintuitive to me. In my opinion, because python module_b/inner_script.py is allowed to use import has_sibling_import, it sets up an expectation that it is implicitly, by default, already using a kind of relative import. But then the from . import module_y inside that file is sometimes allowed, and sometimes not, and people get confused.

Proposal

So, my idea is: why don’t we just make it work like people expect it to work in the simple cases? Why don’t we make an assumption and implicitly (or explicitly) treat a particular folder as if it were a package?

A few options I see (in order of what I consider to be most intuitive):

The folder the script lives in.
The folder the script is run from.
Something in sys.path or PYTHONPATH.

Or, it could explicitly be a flag when running python for backwards compatibility? python -p outer_script.py

If at least one of these was implicitly a package, and allowed relative imports, I think it would dramatically reduce the number of people who get confused in the first place. And for the people who try to do something that jumps up one step further, I think talking about “packages” will make more sense to explain why it’s not allowed.

What do you guys think? Is there some reason this is a bad idea? Am I wrong about intuitiveness?

MegaIng · November 15, 2024, 4:24am

Note: it cannot be allowed to import the same module with different paths, so you would have to add some rules to prevent this. Because if this were allowed you would get all kinds of confusing errors.

Rosuav · November 15, 2024, 5:08am

Yep, this will work in simple cases, and in fact, I proposed a while back that from . import x become the preferred way to import sibling scripts (it wouldn’t fully prevent the problem of import random finding a file called random.py in your current directory, since that would break backward compatibility, but at least there could be linter/editor support saying “are you aware that this will not import the stdlib script?”) which has similar effect to what you’re saying here.

But there’s a bit of a problem with the more complex cases. You gave this example: python module_b/inner_script.py We could go further: it could be python /full/absolute/path/to/module_b/inner_script.py Or it could be that you’re already in module_b’s directory and you just type python inner_script.py. This is a bit of an issue. How does Python know where the top-level folder is?

So the question is: Is it worth having a setup that’s great for the simple cases, but fails badly on the complex ones? Can from .. import module_y be supported? What happens if you say from project_folder import outer_script - is Python going to be able to figure out that this is a package?

Actually, here’s a thought. What if you could create a file inside project_folder saying that this is the top-level package? Something like .package_directory containing the text “project_folder”. Then, any time you attempt a relative import after a regular script was run, Python checks the script’s directory for that file, and then progressively goes up the tree until it finds one (maybe refusing to cross to other filesystems). If it does, it treats the entire tree as though it were a package with that name.

I’m not sure how that would work; would the current script have to suddenly become a package module?

Rosuav · November 15, 2024, 5:10am

Yeah, this is the hardest part. But I think it wouldn’t be too hard to have a rule “if you’re going to use implicit package imports, don’t ALSO use old-style local imports”. So you would simply never write import module_y in any of these scripts. Basically, if you want to use this feature, go all-in on it and do everything as a package.

Multihuntr · November 15, 2024, 5:12am

I don’t quite understand this. I thought we already can import the same module with different paths.

import module_b.module_y # directly import
import module_b.has_sibling_import # imports using relative path
assert module_b.module_y == module_b.has_sibling_import.module_y

(P.S. Sorry for deleting my previous post; I thought it would disappear; next time I will just edit it)

blhsing · November 15, 2024, 5:31am

I fully support this. I don’t see why Python as it is goes out of its way to disallow from . import x when it’s at the top level. Allowing from . import x (or maybe even allowing import .x for short) makes it that much more explicit that x is a sibling module, without having to worry about shadowing or being shadowed by modules of the same name in other paths in sys.path.

I don’t get what you’re trying to say here. from . import x is relative to where the module issuing that import statement is, i.e. os.path.dirname(__file__). How does where the top-level folder is matter?

Multihuntr · November 15, 2024, 5:38am

Well, that’s why I suggested a single implicit folder. We just choose the top-level folder implicitly, plus some way to configure it. I can see an argument for using each of the implicit options I mentioned. The only thing I can’t see is an argument for not using any of them.

Rosuav · November 15, 2024, 6:05am

Brandon Victor:

I don’t quite understand this. I thought we already can import the same module with different paths.
import module_b.module_y # directly import
import module_b.has_sibling_import # imports using relative path
assert module_b.module_y == module_b.has_sibling_import.module_y

There are different import commands that will fetch it up, but it’s always the same module, as shown by your assertion. Notably, it gets the same key in sys.modules[] and thus it will not simply be another module from the same source, it will be the same module. Failure on this is extremely annoying to debug (for example, one thing raises an exception, another fails to catch it).

We already have a bit of this problem. When you run a script directly, it’s imported under the name __main__, and it is NOT loaded into sys.modules under its own name. You can then import it again, and get another copy of it. PEP 499 was put forward to fix this issue, although it hasn’t been done yet.

So ideally, we need to prevent more of the same problem from happening.

Rosuav · November 15, 2024, 6:09am

When you from . import x, Python has to translate that into a package import and then look it up in sys.modules. If you were in the module spam.ham and you say from . import eggs, that has to look up sys.modules["spam.eggs"] to find it. Same if you were in sausages.spam.ham - it’d have to look for sausages.spam.eggs.

Multihuntr · November 15, 2024, 6:52am

Ok. That seems very manageable. Just to check. The consideration that @MegaIng brought up was that we have to make sure that we don’t end up with both sys.modules["module_y"] and sys.modules[".module_y"] as separate names for the same module (and similar), right? And that’s why @Rosuav suggested from . import module_y to be the default.

This makes sense and seems resolvable to me. I like @blhsing’s suggestion for a shorthand like import .module_y, and I would advocate for ".module_y" to be the key in sys.modules in all cases that actually does a sibling import (implicitly or explicitly). Seems like this could be detected and standardised.

blhsing · November 15, 2024, 9:01am

Ah I see. Makes total sense that the top-level package needs to be determined before it can decide which key in sys.modules to look up.

But isn’t the top-level folder simply the first folder in sys.path where the specified module to be imported is found?

So the resolution of . will follow the usual order of sys.path even at the top level, matching the first folder that contains os.path.dirname(__file__). And the key to look up in sys.modules will be the package path relative to the matching top-level folder, plus the module name.

EDIT: Ah, I now realize why from . import x is disallowed at the top level–because according to the rules above the key of the module in sys.modules would then simply be 'x', which would easily conflict with a regular import x that also uses the same key but may resolve to a different module.

I don’t have a decent solution to this issue myself then.

Rosuav · November 15, 2024, 11:40am

Yeah, you see the problem.

It’s something that I’d love to see solved, though. Not just for your use-case but also to give some protection against the “oops I named my file the same as stdlib and now weird things are happening” bug.

Multihuntr · November 18, 2024, 5:34am

After some thought, I think that whatever method is used for import x, could be equally applicable to from . import x. That is, scan through sys.path directories until it finds the requested thing to import. The only difference is that satisfying the latter case requires an __init__.py in the sys.path directory to indicate that it is a package. This would play nicely with using PYTHONPATH or modifying sys.path for more complicated use cases where the script is not in the running folder.

This should work for both simple and complex cases, right?

(EDITED) If we enforce that having __init__.py in the directory means that it is a package, then the unique name would be “.x” when it’s present, and “x” when it is not. This would be determined entirely based on whether there’s an __init__.py. It wouldn’t depend on whether it was imported with relative or normal import statements. This should remove the problem of two keys for the same module and be a step towards dealing with that type of shadowing error.

And now, if I am not missing anything, and there’s a legit solution to be found here, what would be the next steps?

Nodd · November 18, 2024, 6:59am

With this solution, the __init__.py could be inside a package itself, meaning that the module would be known both by .x and by a.x.

MegaIng · November 18, 2024, 7:11am

Additionally, __init__.py files are not required for packages. If they are missing you get a namespace package, but the file could still have been imported as a.x even if your system assigns it the name x.

We should not encourage executing files within a package (without -m), and that is exactly what you are trying to accomplish. Instead of some obscure rules that will bite people we should just improve error messages, warnings and guidance so that people have an easier time figuring out what they should be doing.

Multihuntr · November 18, 2024, 7:45am

I don’t quite follow. Whether it is inside a package or not shouldn’t define its sys.modules name in that way. If __init__.py and x.py are inside a module "a", my proposal wouldn’t let that be known as ".x". I mean, unless you explicitly add the directory of “a” to sys.path. But that currently leads to a module known by two keys, so it’s not like that’s a problem with my proposal.

import sys
from pathlib import Path
sys.path.append(str(Path('./a').resolve()))

import a.x
import x

assert (a.x is x) # Fails

MegaIng · November 18, 2024, 7:51am

If you proposal would be useful for correctly solving this issue, it would prevent this by making these two modules have the same name and therefore the same module object.

MegaIng · November 18, 2024, 8:06am

To answer this process question: find a core dev who is willing to support you, write a PEP and be willing to spend a few months, probably a year arguing for this proposal, to then get added in 3.15 the earliest (unless this gets really fast tracked). Ideally you are also willing to actually implement the changes required for this.

For finding core dev, I honestly don’t know how to best do that. Probably create a new post in this category laying out the exact proposal you want to make and explicitly ask for a core dev to sponsor you.

Multihuntr · November 18, 2024, 9:04am

But… that’s just a problem that exists with how python decided to identify things. So long as subpackages are identified by their sys.modules key, there will be sys.path hacks to make it import the same thing twice. That system looks deep and old. I’m not here to drastically change things.

But if someone really wanted me to, this is how I'd solve that problem

If I were designing the imports, and I wanted to make 100% sure that there was no chance of accidentally importing the same file twice, I wouldn’t use a short string key. I’d use inode number of the file as the base as the hash. Then for things like zips, I’d append an offset into the zip file. That would work, but it would be far less interpretable than a name. This is the only way I know of that could avoid quasi-malicious sys.path (or filesystem) hacks that might lead to multiple names for the same module.

Relative imports would not use the sys.modules key to identify the thing trying to be imported, instead it would just use the normal file system rules plus explicit extra rules that already exist.

Anyway, I’m currently trying to solve a recurring problem with people getting bitten by relative imports at top-level scripts, and I’m not sure why I would need to fix other problems before we talk about the problem I brought up in the first place.

I’ve not heard that opinion before. Why don’t we want to encourage executing scripts within a package?

If there are no new ways to accidentally import the same model twice added by my proposed change, how could the rules changes bite someone? Sorry, I’m not trying to be obtuse, I just don’t see what use cases would lead to unexpected results for someone. Can you give an example?

I’m all for better guidance. Problem for me is that I have thought a lot about this problem and the only conclusion I’ve come to is that python simply doesn’t let you do something that seems obviously correct to me. That’s why I’m here and made the proposals I have.

MegaIng · November 18, 2024, 10:09am

Because them “being bitten” is actually them doing some wrong and they should do something else instead.

All you are doing is masking the underlying issue in some situations.

What about relative imports going up a level? Or multiple?

What about packages that also use absolute imports to refer to the packge (i.e. a file y.py that is part of the a packaging using import a.x)?

How are you going to prevent people doing obviously stupid things like manually execute files in site_packages (which will be on the path)?

Or is your plan to just replace a somewhat unclear error message with an even more confusing footgun that prevents people from learning the correct solution for even longer?

The reason I am asking you to solve the general problem, or at the very least clearly consider it, is because half solutions are the worst of all. Not doing anything and keeping the status quo is IMO preferable to small, somewhat backwards incompatible changes that might end up being a hindrance to larger, more complete changes.

But I don’t think I am going to continue to engage. I said my piece, if this doesn’t convince you, I don’t think I can.