Implicitly packaging scripts to enable more intuitive relative imports

Multihuntr · November 19, 2024, 12:50am

Sorry if I wasn’t clear. I mean that the problem you are talking about (that sys.path hacks can allow someone to import the same module twice), and the problem I’m talking about (that relative imports don’t work at top-level) are entirely independent problems. The former isn’t the general problem; if it were completely solved, then it wouldn’t affect the latter at all. And vice versa.

Of course, I agree with you that if there is a general problem to solve, it’s usually better to solve that general problem rather than do half measures. It’s just that that’s not what’s going on here.

In my proposed solution, all the rules for relative imports and their associated errors are identical. That is, arbitrary relative imports are allowed within a package, but only within that package. In particular, since site_packages is not itself a package, my solution has zero changes for how people interact with it.

Using my proposed solution, if I’m working on a package a that has four files: __init__.py, the script, y.py and x.py. Then y.py can’t do import a.x because a isn’t the name of a package (precisely the same as it currently is). But y.py could do from . import x. This would be added to sys.modules as ".x".

To be clear, I will restate my proposed idea from above: if a sys.path has some identifying file (e.g. __init__.py), treat it as an anonymous package for relative imports. You could call what I’m proposing “anonymous top-level packaging”.

I’m not opposed to simply updating documentation/communication, but I think that there seems to be legitimate cases where it makes sense to allow these relative imports.

Let’s take the examples I linked at the start. They had their source code in one folder, and their tests in another folder. That’s a very common pattern, and it doesn’t work without some sys.path hacks or creative commandline calls. Are they doing something wrong? The answers to those linked questions doesn’t have anyone saying that what they are trying to do is fundamentally wrong, just that python doesn’t support it very well.

So, what’s wrong with those cases? What should they do instead?

MegaIng · November 19, 2024, 1:00am

Use -m from a higher directory as was the suggested answer in both examples you linked.^[1]. Which is almost always the correct answer.

Are you sure you actually understand the issues you are talking about here?

Well, ok, the SO post contains it within the question, the answer just try to explain why it’s necessary ↩︎

Multihuntr · November 19, 2024, 2:05am

I’m not sure I do, honestly. That’s why I’ve followed everything up with questions. Anyway, here’s my best guess at understanding your point here:

Relative imports should only work within a package.
If a script is considered part of the package, then use “-m” from outside the package to run it.
If a script is not considered part of the package, then you shouldn’t be allowed to use relative imports to sibling locations of the script because that would violate rule 1.

Is that right?

bwoodsend · November 19, 2024, 9:00am

Yes. If your code is not a package then relative imports don’t make sense. Instead you should use plain import other_file imports.

I think most of the confusion here stems from people mistakenly thinking a directory containing .py files (possibly with an __init__.py wrongly thrown in without understanding it’s purpose) equates to a package. This used to be dead easy to explain. If a directory is in sys.path then any directories it contains are packages and any .py files it contains are modules. Now however, we have meta path finders and I just shrug and write it off as unexplainable…

Multihuntr · November 21, 2024, 2:44am

Thank you for validating my understanding. I think your perspective here is very useful, and contains the key to explaining it to people, which I hadn’t fully grokked before.

A folder does not attain the status of “package” by its contents at all. Instead “package” is a temporary status granted (or not) if the folder is in the sys.path during the execution a running python program.

That allows me to more succinctly describe the issue I’m trying to raise. There seems to be two definitions of “package” being used. From an ecosystem perspective, a “package” is a pure folder containing only what should be distributed to others. In particular, it excludes dev tools and tests that might have gone into creating that distributed result. But from the perspective of executing python, a “package” is just any folder on sys.path.

The fact that these two definitions exist and are in conflict with one another, is, to me, the central issue I’m getting at here. We are being encouraged to both think of the outer folder as a “package” in order to do what we want (use “-m” from above), and to not consider it a package because it contains all the extra stuff. So, using “-m” would be the “wrong” solution, according to the ecosystem definition.

Maybe the solution to the overall issue is just to spread an awareness these two definitions of “package”?

MegaIng · November 21, 2024, 2:55am

Yeah, these are good insights, but doesn’t quite apply with the context here.

This is not the case for any of the examples you have shown. All of them in my reading pretty clearly establish that the top most folder being shown should be an import package because it contains an __init__.py file.

However, if you actually have two separate folders, e.g. src and tests, i.e. tests is not a subpackage that is part of the install, then yes, the -m solution is not good. However, IMO if you have such a structure you should be modifying the PYTHONPATH and import the install package in tests with import package instead of relative imports, like any user of your package would.

I agree that there is no particularly good first party solution for this, but various tools exists to correctly setup PYTHONPATH semi automatically.

BrenBarn · November 21, 2024, 6:09am

That is the problem, but I would turn that around and say that it’s Python’s problem for insisting on a notion of package that is often counterintuitive. What would be helpful is a mechanism for treating directories as packages without an installation per se or “environment-level” changes like modifying sys.path.

Or at least not requiring the user to think about such things — it might be possible to do the necessary modifications behind the scenes. In fact I’ve got a little project that does this and lets you run a .py file in such a way that relative imports work even if it’s not installed, but it still needs a bit of polish. I hope to release it at some point.

Multihuntr · November 21, 2024, 8:55am

Thanks for the discussion, everyone!

I no longer think it is a good idea to automatically consider the running directory to be a “package”. I still think it should be possible, but I agree that it would hide an ambiguity for the user. I realised that there were two separate cases I was mixing up and trying to solve at the same time. Thanks to @bwoodsend I recognised that I was mixing up two definitions of package in my head. And thanks to @MegaIng I realise there are two cases: either it is a package, so use -m from above or it isn’t a package, so use something that puts '.' in sys.path and don’t use relative imports. And, especially I have convinced myself that that covers all cases with a satisfying solution without allowing relative imports at script-level.

Now that I feel I understand what’s going on, I feel less inclined to introduce changes. So you’ve got me thinking: is there a reason to avoid things that affect sys.path? The only thing I can think of is that you don’t want to allow all of the folders in the folder to be importable?

BrenBarn · November 24, 2024, 8:20am

That is a reason yes. Apart from that, I’m not necessarily saying sys.path shouldn’t be affected, but just that the user shouldn’t have to think about it in those terms. If there is a directory subtree that contains Python code with relative imports and __init__.py in all subdirectories, it should be possible to run anything in that subtree and have those internal relative imports work, without having to modify the code itself. In other words, it should be possible to run in a mode where relative imports are relative to the directory structure and not only relative to Python’s notion of “package” (which often is derived from directory structure but with some extra wrinkles).

pf_moore · November 24, 2024, 11:25am

I disagree. Python’s import system is all based on the package structure of modules. The filesystem layout of the files in which module code is stored is (logically) only marginally relevant to this. Import hooks and namespace packages are two obvious ways in which a package might be assembled from different filesystem locations. And conversely, it’s entirely possible for one file in a directory to be visible to the import system, but for the rest of the directory not to be (i.e., it’s not on sys.path).

Relative imports are defined in terms of the environment’s package structure, not in terms of the filesystem layout that may or may not match that package structure.

Yes, in the common case, package structure and directory layout match. So I can see that the idea of a “directory-relative import” is tempting. But as soon as you get away from simple cases, the behaviour is likely to be confusing at best, and contain bugs at worst. And that, IMO, means we’re better off not mixing the two concepts in the first place.