Symlink resolution in starting script path

storchaka · October 19, 2023, 8:17pm

What is set as sys.argv[0]?

On some Unixes, true and false referred to the same executable file. It looked at argv[0] and chose behavior accordingly. It seems that this is not possible with Python scripts and symlinks.

guido · October 19, 2023, 9:54pm

What is set as sys.argv[0]?

A little experiment suggests that it is the original first argument, before symlink expansion. So you can play the games you mentioned.

j4n_bur53 · October 20, 2023, 8:52am

While doing bug hunting I played a game with JavaScript:

process.argv[1] = original input script argument
import.meta.url = seems to be some sort of real path, but URI-fied

The import.meta belongs to the new ES modules and is context sensitive.

Maybe this helps a little bit inform a decision about Python.

Disclaimer: So far only tested on Windows with Node.js v20.7.0.

steve.dower · October 20, 2023, 12:09pm

I’m leaning towards we should properly define this and make it reliable cross-platform, specifically:

When determining the initial contents of sys.path, and the launched script file is a symlink to a file in another directory, the parent directory of the target is used as the default search path (sys.path[0]) for the script rather than the directory containing the symlink. Links in other parts of the path are not relevant for this check, and only one link is followed (that is, a link A to a link B to a file C will use the directory of B, not C). The contents of argv[0] is not affected, and will contain the path as provided by the user.

It’s only a single check at startup, so I’m not concerned about the perf implications. Security-wise it might be possible to abuse, but it’s likely less vulnerable to privilege escalation than the current (Windows) behaviour.^[1] I don’t think there’s a need to backport, so it would be new in 3.13 (though obviously the POSIX behaviour is unchanged, it’s just got a definition now.

Might also be a good opportunity to move the implementation into this part of getpath.py.

Any other thoughts/concerns?

If you create a link to a script you can’t access, only someone with access can actually launch it, and when they do it’ll have its original dependencies and not the attackers. Compared to today, where a symlink could also substitute modules at runtime… ↩︎

ruro · October 20, 2023, 7:13pm

Yes, please. I am a proponent for deprecating symlink resolution and PYTHONPATH magic for scripts. I’ve ran into this behaviour a couple of times and although it’s pretty easy to work around most of the times, it doesn’t really follow the unix conventions and the principle of least astonishment.

While I wouldn’t say that it is actively “harmful”, but it is definitely non-POSIX-y. Symbolic links are supposed to be the “soft” counterpart to hard links. Just like a hard link, a symlink is supposed to behave “as-if” the file was just copied under most circumstances. If you want to interact with a symlink as a symlink, you always have to do something “extra” (use a different, special syscall to inspect the symlink itself, actively resolve the path to get the target file, etc).

Notably, symlinks are NOT shortcuts. There is no reason for python ./symlink_to_target.py to act any differently from python ./copy_of_target.py. I think that it’s pretty clear, that if we (temporarily) ignore the backwards compatibility angle, there is no good reason to have this special case.

Now, regarding how we could deprecate this behaviour – I’ll admit that it’s not going to be seamless for anyone relying on this behaviour (like any deprecation). Luckily, it should be fairly easy to incrementally deprecate the old behaviour.

For starters, keep the old behaviour, but emit a warning. Make the new behaviour opt-in with a __future__ and/or a CLI flag and/or an environment variable.
During the transition period, all instances where symlinks are used should either opt-in to the __future__ (if the script didn’t actively rely on the old behaviour), replace the script with a wrapper script (similar to the entry point scripts) or modify the script to add its resolved path to sys.path before doing anything else.
After the deprecation period passes, make the new behaviour the default.

Optionally, we could include a simple way to explicitly opt-out of the new behaviour with something along the lines of import sys; sys.add_script_dir_to_path(). This would further simplify points (2) and (3).

steve.dower · October 23, 2023, 3:58pm

We can’t make the behaviour opt-in/out with anything in the Python file, because we haven’t looked at it by this stage (unless we’re going to do something really clever with importers… which we could, but I suspect is not worth it).

So we can add a warning in the case where the script is a link, with an environment variable to suppress the warning, and then later remove it entirely.

Code that wants to include its own directory post-symlink can already sys.insert(0, str(Path(sys.argv[0]).realpath().parent)), or for 2-3 lines can more precisely match the current behaviour. Without a bunch of people jumping up and down saying they rely on this functionality and can’t change, I wouldn’t want to promote it to a supported sys function.