Proposal: Add utilities to more easily manipulate sys.path

uranusjr · February 6, 2023, 6:06am

This is taken from the PEP 582 thread, inspired by the discussion around this point, especially:

It seems to me that a lot of the discussion around PEP 582 is people can’t disagree on how an environment should be activated (I’m using these terms very loosely) and where exactly those environments should be placed, but seem to mostly agree that the current virtual environment interface (with its activation scripts and interpreter symlinks/shims) can use some improvements. So I’m wondering whether it would be a good idea to allow people to experiement on various approaches without needing elaborated hacks (a la old virtualenv) or something hooked deep into interpreter startup (a la PEP 405).

The idea is to add utilities somewhere (site? somewhere else?) that people can call to understand how sys.path currently looks like, and more confidently modify it without breaking the interpreter (entirely or subtly). The problem with manipulating sys.path is it’s non-trivial to tell how each item in sys.path end up in there, because for an environment implementation to work, it generally needs to

Keep standard library paths where they are
Identify custom paths (added via PYTHONPATH or manually manipulating sys.path)
Find existing site package paths, which are generally after stdlib, but could be either before or after custom patha, and replace them with new site paths. (If no existing site paths are found, insert new site paths after stdlib.)

To me, the crucial missing logic is a way to identify how an entry in sys.path end up in there, and it’s only possible to manipulate the value with any reliability with that information. So something like:

class PathType(Enum):
    stdlib = auto()
    site = auto()
    pythonpath = auto()
    custom = auto()

def inspect_sys_path() -> list[PathType]:
    """Return a list of the same length of sys.path.

    Each item in the returned list describes how its
    corresponding item was added to sys.path.
    """
    ...

The function can be implemented in two ways. The easy way is to simply use inspection, comparing the actual sys.path value with information from sysconfig and PYTHONPATH etc. This is not fool-proof, but should be good enough most of the time. We can also add mechanism to actually keep track how each value is added, but the additional complexity is very arguably not worthwhile to me.

Once we have the mechanism in place, it would be quite trivial for things like python -m pep582 to be implemented, and people can start trying out solutions and hopefully interating on the design.

pradyunsg · February 6, 2023, 7:21am

Put it up for discussion in #ideas?

pf_moore · February 6, 2023, 9:58am

This seems like a good idea, and very much in the spirit of the way the import system has grown over the years to be more customisable. I’m sure it will run up against issues from the “static analysis” community, for whom runtime path and import system manipulation is a big blocker. So maybe this feature should get input from that group on how to work nicely with their tools (a way to write a static file that defines what the runtime manipulations expect to do?) Ultimately, though, if it’s just a runtime feature, I’m fine with that.

There is one missing aspect, though, which PEP 582 came up against, which is that typically custom paths come with a requirement to install stuff in the added paths. So maybe this feature also needs to define a mechanism whereby a user can say “here’s a new install scheme” - and tools like pip can get a means of installing to that location. This could be something fairly straightforward, like an API to add a new “scheme” to sysconfig, and pip could have an --install-scheme=<scheme name> option to request use of that scheme.

PythonCHB · February 7, 2023, 9:51pm

When I read this, I think maybe the solution is to have more than one PATH – and sys.path would be the joining of them all [*]

In short, rather than keeping track of how an antry was added, have them added to separate lists.

In particular, a path for the standard library that is separate from the other(s) – then it could alway be searched first, and then we’d never get accidental shadowing of stdlib modules (of course, there would have to be a way to override a stdlib module on purpose, but it’s OK if that’s a bit cumbersome).

I’m not sure how many different path entries there should be, or if users should be able to add them at runtime, but maybe this would be cleaner.

Just spill balling here, maybe that would just create even more confusion.

[*] maybe as simple as: sys.path = itertools.chain(stdlib_path, user_path, pip_path)

pf_moore · February 7, 2023, 9:56pm

I’m fairly sure you could implement this using the facilities currently available in the import system (importlib). Obviously, it could only be a prototype - making it the official way that sys.path gets initialised would need it to be built into the interpreter - but if you think it’s worth considering, prototyping the idea would be a very good way of thrashing out the details.

steve.dower · February 7, 2023, 11:42pm

Technically, sys.path is only used by the default importers. No other importer has any obligation to use it, and if it appears in sys.meta_path before PathFinder, then it will get to resolve modules at a higher priority.

We may be able to introduce some interesting optimisations by using more importers, though only by breaking user’s expectations about how imports would work, which we’ve decided in the past isn’t worth it. But when we do decide to break how the default search paths work, I would certainly advocate for more explicit importers rather than the convoluted getpath logic we have to deal with it today.

BrenBarn · February 8, 2023, 3:37am

This is related to something I was thinking about in some of the other packaging threads. For people working in a “projectless” fashion (i.e., just throwing around a bunch of scripts and dinky libraries of convenience functions), it is a constant annoyance that there is no way to use relative imports from within executable scripts without installing the code as a package. This is also an issue for non-developers (e.g., in academia) who want to distribute code in somewhat unofficial ways (like “here is a zip file with everything you need”). In these contexts people often want their scripts and their libraries in the same directory tree and want to use them directly from there.

It would be nice to have a way to tell Python to treat a given directory tree as a package in a “local” manner without having to execute any kind of persistent install. I think this could be done with a custom loader or maybe even some sneaky sys.path manipulations.

uranusjr · February 8, 2023, 4:09am

Unless you want to also include the interpreter in the zip (a serious problem this won’t solve anyway), there is an official way for this: zipapp. A package (enabling relative imports) does not need to be “formally installed” to work, they just need to be importable (i.e. not a top-level script). So it seems to me what you are looking for is orthogonal to the install scheme issue.

PythonCHB · February 8, 2023, 4:50am

Yes, indeed. However, as pointed out by @uranusjr - packages don’t need to be installed, then only need a init.py in the dir. But you still can’t relative import from a top-level script.

I struggled with this for ages, but finally realized that a basic package and “develop mode” (now editable mode) is actually a great way to solve this issue.

Frankly, even better than the pypackages idea (if it comes to be), and certainly better than sys.path hacking. One of the key things is that all of those require that you run your code from the dir where the code is – or do some other PATH hacking to make it runnable. But as a rule, I don’t want to put my data and code in the same place. It’s OK if it’s guaranteed that there will be only one set of data I’ll want to manipulate with that code, but that’s actually a rare case fo me.

And if you make a package, you can zip it up and share with others, and simple tell them to do:

pip install -e ./

and away you go.

What’s missing from teh cpommunity to support this is two things:

documentation – the docs on packaging are oriented toward proper pacakges on PyPi – point a “non-developer” to those docs, and they’ll like response with ugh! I don’t want to do all that!
installed scripts – back in the say you could put them all in one dor and have your setup.py auto-add them – now you have to do a somewhat cryptic incantation to make an “entry point”, and you can’t have a simple script – it HAS to have a “main” function to run. I wish we could get the old way back

Anyway, I’ve suggested this on another thread, but I don’t know that we need to do anything to sys.path to help this use case.

steve.dower · February 8, 2023, 2:08pm

import os.path
__path__ = [os.path.dirname(__file__)]

That will do it. After this, from .spam import eggs will find spam.py alongside the original file and import eggs from it.