Basically, when you python path/to/a/script.py, where script.py is a symlink, we have a platform inconsistency on which path is set as sys.path.
On POSIX, the symlink is resolved and then the parent of the resolved path is set as sys.path.
On Windows, the symlink is not resolved, and so the parent of the user-specified path is set as sys.path.
If an earlier segment of the path is a link, I’m not sure what the behaviour is. I’d assume we’re doing a realpath on POSIX, which means the entire path is resolved, and that we’re not doing anything on Windows and so it won’t be.
I’m not clear on which one should be considered correct or preferable behaviour. (I’m also trusting the OP that these are actually the behaviours, I haven’t checked myself.) Any thoughts?
Not a core dev, so I don’t think I get a vote, but I would tend to expect abspath() instead of realpath(). I assume that a user or distro has intentionally set up the logical view of the filesystem and that tools should basically respect it unless there are technical reasons why real paths are absolutely required.
I would consider failures that relate to not finding objects relative to the link target to be failures to use links properly, and not Python’s job to fix.
I can sympathise with the idea that the user wants to run python build_i.py and have it do the same as python scripts/i_build.py.
This could be done explicitly via sys.path.append(Path(__file__).parent) but I’m inclined to think that if we expect scripts to add that line simply so they will work as expected when invoked via a symlink, then we should simply drop the idea of adding “the script directory” to sys.path at all.
So I’m basically in favour of the POSIX behaviour, I guess, although I’m not a heavy user of symlinks so I don’t know whether this symlink-as-alias usage is more or less reasonable than the symlink-as-logical-filesystem-layout view that @effigies describes.
I do think that we should have consistent behaviour across platforms, assuming that we consider symlinks to be the same concept on Windows and POSIX (which seems self-evident, but the existence of similar-but-not-quite things like junctions on Windows muddies the water somewhat ).
Junctions luckily don’t impact this case, as they can’t be used for files. So the parent path of a file will be the same directory, whether there’s a junction in there or not.
One potentially relevant precedent that comes to mind is that venv (when using symlinks) relies on not resolving symlinks before searching for pyvenv.cfg. A counter-point is that ._pth files do use the real path (intentionally, to avoid someone bypassing it by just adding a symlink in some other location). I’m not certain whether an import search path should necessarily mirror either of these cases.
Fortunately, since they can’t be utilized for files, junctions don’t matter in this situation. So whether there is a junction in the route or not, the parent path of a Fouad will always point to the same directory.
Sorry for side-tracking things by mentioning junctions. My main point is that I assume there’s no reason in principle why symlinks should work differently between Windows and POSIX. If there is, someone should explain what that reason is, both here and in the docs somewhere.
Since I’m probably responsible for this mess, I think the reason for the different treatment is that symlinks didn’t exist on Windows when Python was first ported there, and when they were finally introduced, for a long time they weren’t reliably supported (IIRC at one time you needed sysadmin permissions to use or enable them?).
So the symlink-specific feature for the script name wasn’t implementable for a long time, and I think Python can be forgiven for not yet implementing it.
Moreover, I now think that the special-casing of symlinks (on UNIX) for the script argument was a mistake. It was a “cute” feature: you could install an app in its own directory, then create a symlink in (e.g.) /usr/local/bin to its “main” script, and the app would magically be able to implement its component modules without any sys.path manipulation, and it would appear on your shell’s $PATH without any editing of your .profile.
For a variety of reasons (not just because it’s not supported on Windows) that’s not the best practice for installing scripts any more – I wouldn’t be surprised if the symlink feature predates package support (which I recall was introduced in the late '90s).
So maybe there’s a way out? Do we really care any more about this symlink behavior? Could we perhaps deprecate it? Are there any other languages, interpreters or tools that have similar behavior?
Nice to know that it was intentional, that makes a difference. And yeah, omitting Windows in the first place makes sense.
I don’t personally know whether this is the case, so I’ll take your word for it.
This is certainly true (the person who reported the bug is clearly relying on it).
As far as a way out, unless it’s actively harmful on POSIX in some way, I don’t think it’s easy to deprecate it. However, we can declare it a platform-specific behaviour/oddity, and then not implement it anywhere else. But if we do that, I feel we should offer an alternative that is cross-platform (even if it’s “write a platform-specific shell script to launch it”).
As I’ve said, I’m not a heavy user of symlinks, so I don’t have good intuitions here - but is there an actual specification which defines what the correct answer is in this case? I can see arguments for either /a/c or /d/c. I’ve been in discussions where this sort of edge case comes up, and I’d like to have an authoritative reference I could point to and use.
(Sorry, I know this is somewhat off-topic, but I think it does have some relevance to my question about whether symlinks are “the same thing” on POSIX and Windows, as I can imagine the two systems taking different positions on this question).
Edit: I just did a quick experiment. On Windows, gvim a/b/../c creates a file a/c. On Ubuntu, echo hello >a/b/../c says “-bash: a/b/../c: No such file or directory” and vi a/b/../c opens vi with a message "a/b/../c" [New DIRECTORY]. So it looks like Windows and POSIX do disagree on this issue, further confusing the whole question…
So you’re saying that /../ sections get dealt with before symlinks get resolved? That’s certainly a reasonable approach. But isn’t working out the final path step by step also reasonable? That’s how I’d traverse the filesystem, after all - go to a, enter b (which puts me in /d/e), go up one (to /d), find file c.
I’m genuinely unclear as to why one approach is self-evidently better than the other.
For our implementation of abspath, which includes a normpath, this is correct. But I’m inclined to agree with the sentiment that we shouldn’t be doing more than join(os.getcwd(), path), which is reasonably approximated by saying abspath.
The side discussion on how to handle symlinks can go on for a long time (I know, there are hundreds of posts on various issues/PRs about it!). Maybe split it out?
It may not be supporting best practices, but it can be very convenient, and I can’t imagine a situation in which the opposite behavior (not resolving symlinks before adding to sys.path) would be useful. Does it hurt?
Thanks for tracking down the source. Yes, it looks like Linux isn’t resolving the full path, just following a file symlink once if it gets one. Both will give different results from realpath, which is closer to GetFinalPathNameByHandleW (directly equivalent if the target file exists and is accessible).
We’ve started doing this (you’ll note that most of the rest of path initialization is in Python already), but haven’t finished. However, at this stage of initialization, we don’t have access to the standard library, so it wouldn’t be the real os.path.realpath anyway, and might well end up with the same discrepancy. Which is why it’s important to notice these quirks and then design them properly.