Should console_scripts entry points exclude the scripts directory from sys.path?

hroncok · February 23, 2022, 1:08pm

Hello packaging folks.

I assume that a script that is created in /usr/bin or a similar directory (called scripts in the sysconfig installation scheme) from a console_script entry point would never need to import Python modules from /usr/bin itself. A pip-installed script looks like this:

#!/usr/bin/python3
# -*- coding: utf-8 -*-
import re
import sys
from yyy import xxx
if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(xxx())

Would it make sense to remove the script’s directory from sys.path? E.g. do something like this in each such file:

script_dir = os.path.dirname(__file__)
if script_dir in sys.path:
    sys.path.remove(script_dir)

That could prevent unexpected shadowing of the imported modules by random files installed by other (even non-Python) tools, such as what happened here: bad magic number in 'six' · Issue #359 · benjaminp/six · GitHub

Or is importing from the scripts dir something that we expect should actually work?

steve.dower · February 23, 2022, 8:06pm

I can’t think of any supported scenarios that would require it to work. Typically the files that are going to go into scripts are generated on install based on metadata anyway, so you don’t really know what’s going to be there.

brettcannon · February 23, 2022, 9:42pm

Same here, but then again there isn’t a standard as to what the exact code for an entry point should be, so individual installers will need to be updated (although maybe example code in Entry points specification - Python Packaging User Guide would probably be a good thing).

cameron · February 23, 2022, 9:47pm

Yes.

In fact, I do this with my own wrapper scripts, example code:

import sys
sys.path[:] = [ path for path in sys.path if path ]
from cs.fstags import main
sys.exit(main(sys.argv))

Note I’m stripping empty paths, not a path which just happens to be
where I’m standing.

This “trust where I’m standing (getcwd)” thing in Python’s default
sys.path makes me quite unhappy from a security standpoint, and has done
for years. If I want the modules in the working directory, I’ll add
that directory in full to the path explicitly.

Cheers,
Cameron Simpson cs@cskk.id.au

hroncok · February 24, 2022, 10:40am

See also bpo-13475: Add -P command line option by vstinner · Pull Request #31542 · python/cpython · GitHub which could simplify this significantly by using that flag in the shebang of generated entrypoints. That would also only be possible with Python 3.11+ so this would not break any current deployments of software.

eryksun · March 4, 2022, 12:18pm

When executing a script, the directory of the script is added to sys.path. This generally has nothing to do with the current working directory. Automatically adding the script directory by default is as safe as one’s search PATH and execution habits permit (e.g. not executing files located in “~/Downloads”). Adding the current working directory by default is generally unsafe, but thankfully that doesn’t happen when running scripts.

By default, the current working directory is added for “-c” and “-m” commands and the REPL, since there is no main script in those cases. It gets added as the empty string '', so it varies with whatever the current directory happens to be when an import is executed.

cameron · March 4, 2022, 11:06pm

cameron:

This “trust where I’m standing (getcwd)” thing in Python’s default
sys.path makes me quite unhappy from a security standpoint, and has done
for years. If I want the modules in the working directory, I’ll add
that directory in full to the path explicitly.

When executing a script, the directory of the script is added to sys.path. This generally has nothing to do with the current working directory. Automatically adding the script directory by default is as safe as one’s search PATH and execution habits permit (e.g. not executing files located in “~/Downloads”). Adding the current working directory by default is generally unsafe, but thankfully that doesn’t happen when running scripts.

That is nice to know; I’ve perhaps been letter my interactive testing
mislead me about this. I’ll test that. […] Ok, testing shows that it
does indeed add the script’s directory and not the current directory.
Adding it ahead of everything else is pretty iffy, convenience over
caution IMO. But ok, I can keep this in mind.

By default, the current working directory is added for “-c” and “-m” commands and the REPL, since there is no main script in those cases. It gets added as the empty string '', so it varies with whatever the current directory happens to be when an import is executed.

And here we part company. I remain against this (with the possible
exception of the REPL, still with misgivings). If I write some shell
script and invoke:

python -m foo ...

it will very much NOT be my desire that the current working directory
magicly get inserted into sys.path - my previously sound shell script
suddenly has a component which can misbehave in a malicious setting.
Such as that of the sysadmin doing some work inside an arbitrary user’s
directory, or inside a malicious software package (generic, not “python
package”). It needn’t be a sysadmin; any user standing somewhere
unfortunate gets this misfeature.

It is a security mine waiting to go off.

Python badly needs some switch to say “do not change sys.path at all”.
The -s and -S options do not provide this. Maybe it is too late to
change the default Python behaviour here, but I remain convinced that
this is a misfeature, and refer again to the maxim Heuer’s Razor:

If it can't be turned off, it's not a feature. - Karl Heuer

Grumblingly,
Cameron Simpson cs@cskk.id.au

cameron · March 5, 2022, 1:09am

Ugh, “letter” → “letting”. - Cameron

CAM-Gerlach · March 5, 2022, 5:59am

As @hroncok mentioned in the message just above this conversation, this has been discussed at quite some length on previous BPOs, and @vstinner has an open PR for Python 3.11 that will add a -P option that will no longer add the cwd to sys.path, and additionally -c will no longer do so by default (only -m). This is certainly quite welcomed by many (including myself), and should hopefully address most of these concerns.

github.com/python/cpython

bpo-13475: Add -P command line option

python:main ← vstinner:add_path0

opened 01:54AM - 24 Feb 22 UTC

vstinner

+127 -27

* Add -P command line option to not add sys.path[0]. * Add sys.flags.dont_add_p…ath0 flag. * Add PyConfig.add_main_path member. * Programs/_bootstrap_python.c uses config.add_path0=1. * Update subprocess._optim_args_from_interpreter_flags(). * Modules/getpath.py sets add_path0 to 0 if a "._pth" file is present.  https://bugs.python.org/issue13475

eryksun · March 5, 2022, 9:24am

I consider it to be a reasonable design decision to give a script priority access to importing modules and packages in its directory. That said, I’m used to this. In Windows, the application directory has priority in SearchPathW(), CreateProcessW(), and, by default, LoadLibraryW(). An exception is made for reserved names of known system DLLs and API sets. I can see doing the same for core parts of the standard library. In fact, that’s effectively implemented now by freezing critical modules, including _collections_abc, _sitebuiltins, abc, codecs, importlib, os, os.path, io, site, stat, and zipimport.

If “foo” is a module in the current working directory, then adding this directory to sys.path is required for the import. Where I part ways is with adding "" to sys.path in this case. The working directory should be added as a resolved path when running a -m module or -c command. Only the REPL should add an empty string to sys.path.

pf_moore · March 5, 2022, 10:18am

If “foo” is a package installed into site-packages, adding the current working directory to sys.path is not needed, and shouldn’t happen.

And I’d actually argue that -m should not work for running packages that aren’t installed (i.e., it shouldn’t work for packages in the current working directory). You don’t need -m in that case, as python foo works (although it adds the directory “foo” to sys.path, rather than the directory that contains foo, which I’d argue is wrong).

eryksun · March 5, 2022, 10:43am

The module search gives priority to the current working directory. That was a design choice in PEP 338. Nick Coghlan is the expert on that subject. I don’t even use this feature broadly speaking, except for two cases: -m pip and -m venv.

My only qualm is with adding the working directory to sys.path generically as "", which remains for the lifetime of the process, affecting all imports according to whatever the working directory happens to be at the time, which can change any number of times. I think It should add the working directory as a resolved path.

cameron · March 5, 2022, 10:50am

If I’m running python -m foo I truly don’t care what’s in the current
directory - I want Python to find it in my $PYTHONPATH or the
unrelated-to-the-current-dir default i.e. in an installed place.

If I wanted foo from the current directory I would explicitly add it
to $PYTHONPATH.

In fact, I’ve got a shell alias named dev for precisely this kind of
effect - to run code in the local development environment. So to test
run some dev code my practice is to go:

dev some command here ...

which sets up $PATH, $PYTHONPATH etc suitably to find the development
stuff - the modules here or what have you. Without the dev prefix
command I expect to be uninfluenced by the local dev code, even though
I’m standing in there.

Cheers,
Cameron Simpson cs@cskk.id.au