PEP 582 - Python local packages directory

We have PYTHONSAFEPATH and -P option since Python 3.11, how would PEP 582 interactive with it? As described in gh-57684, I think it’s a real problem.

From the PEP

For example, __pypackages__ will be ignored if the -P option or the PYTHONSAFEPATH environment variable is set.

If you haven’t yet, please read the updated PEP. It does try to cover a lot of this.

2 Likes

Are you suggesting that people check in __pypackages__?

Enforcement of where it can’t be is where I’d start: Can’t be in $HOME, /, /tmp, /[s]bin, /usr/[s]bin, etc.


I have a light retraction-ish to make: I implied Node doesn’t recurse by “looking in the current directory”.

I actually just did a little reading/testing and Node does recurse to find node_modules, and npm will generate a loose package.json if you don’t have one already:

indrora@DESKTOP-HTA6J0U:~/src$ mkdir test
indrora@DESKTOP-HTA6J0U:~/src$ cd test
indrora@DESKTOP-HTA6J0U:~/src/test$ npm install colors

added 1 package in 96ms
indrora@DESKTOP-HTA6J0U:~/src/test$ mkdir foo/bar/baz/quux -p
indrora@DESKTOP-HTA6J0U:~/src/test$ cd foo/bar/baz/quux
indrora@DESKTOP-HTA6J0U:~/src/test/foo/bar/baz/quux$ npm root
/home/indrora/src/test/node_modules
indrora@DESKTOP-HTA6J0U:~/src/test/foo/bar/baz/quux$ node
Welcome to Node.js v19.6.0.
Type ".help" for more information.
> var colors = require("colors");
undefined
> console.log(colors.rainbow("Hello, friends!"));
Hello, friends!
undefined
>
indrora@DESKTOP-HTA6J0U:~/src/test/foo/bar/baz/quux$

Note that Node warns people to not check in node_modules (it’s often quite large, in the gigabytes) and it looks funny when you do so in git.


as for startup cost: Any startup cost that this would incur is negligible to zero. More time, I guarantee, goes into logic for selecting the right locale than checking for a few directories. Currently, at startup, python3.10 on my VM makes 146 calls to stat() before it shows the banner. 35 of those calls result in ENOENT.

1 Like

I don’t know whether it is appropriate to post review comments here. The PEP gives the following code to retrieve the install scheme:

scheme = sysconfig.get_preferred_scheme("prefix")
purelib = sysconfig.get_path("purelib", scheme, vars={"base": "__pypackages__", "platbase": "__pypackages__"})
platlib = sysconfig.get_path("platlib", scheme, vars={"base": "__pypackages__", "platbase": "__pypackages__"})

But since the preferred scheme on Windows is nt, the packages will be installed under __pypackages__/Lib/site-packages, it seems to cause package mixing between different Python versions.

cc @kushaldas

Yes, that is the standard behavior of current pip too, and Cpython follows the same path. Instead of defining a new path, we are reusing what is already expected.

That is different. The current install scheme is associated with a specific Python interpreter, so there is no possibility to mix packages from different python versions. Think __pypackages__ as a venv without an interpreter, installers can install packages for different python versions into the same __pypackages__ directory. A similar situation is for the user scheme, so even on Windows, the site-packages are isolated with Python{py_version_nodot_plat} part.

First, remember that __pypackages__ isn’t “venv without an interpreter”. It’s just a sys.path entry. I get the analogy, but don’t push it too far. (Sorry, I know I’m starting to sound like a stuck record on this)

Second, there’s no suggestion in the PEP (or in my personal view of the PEP, for what that’s worth) that being able to install packages for multiple Python versions in the same __pypackages__ is a goal. The key use case is for beginners, who are extremely unlikely to have multiple Python versions in the first place. If it’s something you view as a key requirement for the PEP, then you need to argue for it as a feature in its own right, not query details of the PEP based on the assumption that it’s useful and “should work”.

As a practical issue with handling this differently, there’s no existing install scheme that has versioned directories on Windows, so we’d need a new scheme. That idea was mentioned in the “rejected ideas” section, so see there for the reasons for not going down that route.

Edit: Whoops, I forgot the user scheme on Windows is versioned. Sorry, I’d forgotten about that one, but I don’t think the “user” scheme is in general an appropriate choice here (we’d be using it because it’s convenient, not because its intended use matches our needs).

3 Likes

I suspect that not having versioned directories on Windows will be problematic.

The paths on Windows don’t have versioning because the expectation is the base path will be versioned, so the individual parts do not need to be. The opposite is true of both *nix and __pypackages__. People are going to get very weird, confusing errors if they have multiple versions of Python installed on their Window machines and they’re using __pypackages__.

Rejecting adding a dedicated scheme feels shortsighted, it’s adding a long term cost to avoid a short term cost.

3 Likes

Adding one is a valid choice to make. The downside is that until pip adds support for installing into that dedicated scheme, which won’t happen until some time after Python introduces that scheme[1], the PEP is largely unhelpful to its intended target audience. Whereas with the current approach, pip install --prefix is a good enough short term option.

It’s not me you need to persuade, of course, it’s @kushaldas. (And I guess as a co-author, you have some direct influence on the PEP :wink:)

Just to make my position here clear - I helped @kushaldas with the latest re-work of the PEP, ensuring that the lack of clarity that I’d complained about was addressed, and the intent and scope of the PEP was clear. I did not, however, try to change his mind on the content of the PEP, nor does the resulting text necessarily reflect my views. Personally, I’m neutral on the PEP itself, in particular I’m not sufficiently interested in it to try to get the details changed - the only things that mattered to me, that the PEP didn’t state that pip would change its default install location, and that any tool changes were clearly described as what the PEP hoped would happen, rather than as requirements, have been addressed, so I’m good.

I am trying to channel @kushaldas and explain what I believe his position is on people’s questions. But that’s mainly because I’m frustrated with the way this discussion struggled to keep focus in the past, and I’m hoping that by doing this, I’m helping people express their concerns in a way that’s actionable. But I’m not an author of the PEP (or even a sponsor), just an interested bystander.


  1. How long after largely depends on whether anyone steps up to do the work of implementing it. ↩︎

3 Likes

Right, that’s the short term cost-- by adding one it will take longer to roll out, but once that’s rolled out that cost is paid and doesn’t require any ongoing pain.

The flip side is the long term cost of people on Windows who have multiple versions of Python installed (or who switch from one version to another), which is something that will just never go away.

1 Like

I agree, so speaking with my SC hat on, my first piece of feedback would be wanting versioned directories on all platforms.

3 Likes

I will push another update to the PEP tonight. We will have a function in sysconfig, which will return a partial scheme, that then can be used by the interested installers. And the paths will be versioned as you all suggested.

1 Like

Why not just use a normal scheme? IMO sysconfig is already difficult enough to work with, without adding the idea of a “partial” scheme.

The “partial” scheme here would include all the necessary keys to install wheels. It would provide purelib, platlib, scripts, and data.

If we have a full scheme, things will leak to the system, which is already a mistake we made with virtual environments, and none of the missing keys (stdlib, platstdlib, include, platinclude) should be installed to anyway[1].

IMO it is unwise to introduce new API/design that is bad by design.

You have two options:

  1. Build a full scheme yourself

    scheme =  sysconfig.get_paths() | sysconfig.get_local_packages_paths()
    

    You will have to write new code for PEP 582 anyway, so it’s not like this will be the thing preventing you from using old versions of installers (eg. pip), and I think it’s a reasonable enough ask.

  2. Handle missing keys in installers

    This would personally be my preference, and something installers should be doing anyway, as PEP 427 does not specify a canonical list of .data keys.

    The only breakage this may cause is not installing the headers data, which I think beats installing it on the system, a completely unrelated location. Realistically, this should basically only affect package building, which is recommended to be done in isolation anyway.
    But if you think it is too much of a risk, go with 1), but acknowledging its issues. Once the headers issue has been dealt with, you can then go back to just using the “partial” scheme as a full scheme.


  1. stdlib/platstdlib should definitely not be installed to, I think it’s clear enough why. include/platinclude are trickier, I would strongly discourage people from using them, as they are system directories, hence not isolated in virtual environments (!!), but installers do use them to calculate the path for the headers key, which is needed to keep backwards compatibility with the distutils install paths. I would also strongly discourage people from using that key, and recommend installers to raise a warning. Most projects have already moved away from using it, in favor of installing the headers as module data. Several, but not many, still remain, and IMO we should encourage them to move away from using it. ↩︎

1 Like

That said, sysconfig is suffering from being stuck with old design from the distutils’ days. I think it needs some work to make better match today’s model and make it easier to be used by installers and similar users, but I am still unsure how exactly to do that in a way that both makes sense and doesn’t break things. The documentation can definitely be improved though, I have struggled there too, but should take another look.

2 Likes

If we can use the existing APIs, and handle KeyError if it gets raised, I don’t see the problem. Why wouldn’t we? It’s the documented interface.

I’m -1 on creating a new interface for no better reason than to protect people who don’t support the existing API properly. (And I’m happy to consider it our bug, and fix it, if pip currently has an issue because of this).

I completely support this, and I would love to see improvements like you suggest. It’s a bit off-topic for this thread, but thanks for confirming that this is your intention :slightly_smiling_face:

1 Like

Hum… I interpret the current documentation as clearly stating which keys we should expect :sweat_smile:

From sysconfig — Provide access to Python’s configuration information — Python 3.12.1 documentation

Each scheme is itself composed of a series of paths and each path has a unique identifier. Python currently uses eight paths:

stdlib: directory containing the standard Python library files that are not platform-specific.
platstdlib: directory containing the standard Python library files that are platform-specific.
platlib: directory for site-specific, platform-specific files.
purelib: directory for site-specific, non-platform-specific files.
include: directory for non-platform-specific header files for the Python C-API.
platinclude: directory for platform-specific header files for the Python C-API.
scripts: directory for script files.
data: directory for data files.

From sysconfig — Provide access to Python’s configuration information — Python 3.12.1 documentation

sysconfig.get_paths([scheme [, vars [, expand ]]])

Return a dictionary containing all installation paths corresponding to an installation scheme. See get_path() for more information.

If scheme is not provided, will use the default scheme for the current platform.

If vars is provided, it must be a dictionary of variables that will update the dictionary used to expand the paths.

If expand is set to false, the paths will not be expanded.

If scheme is not an existing scheme, get_paths() will raise a KeyError.

But I guess maybe it is not that explicit? Though, reading that, I personally wouldn’t expect any of the listed keys to be missing. So, I wouldn’t really blame any users for interpreting it the same way.

All and all, I understand your opinion regarding adding a new interface, but I think in this case in specific, it would be the most beneficial choice :face_with_diagonal_mouth:

It does not have any major drawbacks that I can see, and solves the issue in a reasonably clean way, considering the current model and interface weren’t really designed to handle such use-cases. We just shouldn’t make it common practice.

I wouldn’t want to use it in pip, especially if it’s not considered “not common practice”. As I’ve said a few times, we’re trying to move pip to work consistently with (normal) sysconfig schemes. I don’t see a problem for pip if normal sysconfig schemes omit a path (I’d just say that if the user tries to install to such a scheme, any files in the wheel that would go in that path would be ignored with a warning). But if you feel that sysconfig schemes must have all of the listed paths, I’m fine with that.

I honestly don’t see why __pypackages__ can’t just have a normal scheme. There’s no reason I can understand why it has to be partial in any case. I guess that’s something @kushaldas needs to answer.

So my position is that I want the __pypackages__ scheme to be a normal scheme, which means that sysconfig doesn’t have to do anything special for it. If PEP 582 requires changes to sysconfig, my objection is with PEP 582, not with sysconfig.

I mean, we shouldn’t make adding new API a common practice. We should try to fit new use-cases into existing API and very carefully consider when and when not it makes sense.

This is essentially a normal config scheme for pip, and you will need special handling for the local packages scheme even if you use the existing API anyway.

There are two things to consider here.

  1. It will leak

    This is something I’d really like to avoid, but I suppose that is already the case with virtual environments, so it wouldn’t really be a blocker.

  2. The current API isn’t ergonomic for this use-case

    The API isn’t designed to require adding variables, you need to copy the variables’ dictionary, update it, and pass it to sysconfig.get_path/sysconfig.get_paths.

    sysconfig.get_paths('local_packages', vars=sysconfig.get_config_vars() | {'local_packages_base': ...})
    

    Similarly to the missing scheme path issue, this will result in some API breakage because we do not document some schemes might require extra keys and raise an error if they are missing. I don’t think it’s as bad as the missing scheme paths, but I do think it will cause some breakage.

I guess this mostly depends on how much weight you give to 1). Neither 1) or 2) are blockers, and I think 2)'s weight will be similar for most people.

Personally, I think the separate API is worth it, but it is not required. I’ve opened Deprecating the `headers` wheel data key to try to mitigate my worries with 1).

@kushaldas sorry for putting more weight on you, but I think you’ll have to make a call for what to put in the PEP. Or is it possible to give both options to the SC and have them choose?

Resuming, the two options are:

  • Normal sysconfig scheme
  • New API (sysconfig.get_local_packages_paths(base_directory))
    • The main downside is it being new API IMO
    • Easier/better to backport (no patching required)
1 Like

OK. You’re the expert on sysconfig, I don’t want to dictate to you on what API you think makes sense.

But as you say, it does put the responsibility back on @kushaldas to define (in the PEP) the exact layout of the __pypackages__ directory, and how installers should behave when installing wheels into that location - if we’re not going to say “just treat it like any other scheme”, then the details need to be given explicitly.

I’ll reserve comment on whether I’m comfortable that the proposal is reasonable to implement in pip until I see what @kushaldas proposes.

For the record, it was my suggestion that we use a sysconfig scheme, in response to the fact that the original PEP was too vague to be implementable. Sorry if by suggesting that, I’ve made more work for you.