Linux distro patches to `sysconfig` are changing `pip install --prefix` outside virtual environments

But should pip use the posix_prefix when --prefix is used? And should it also us it with --root? The semantics of all this is blurry indeed.

I don’t think there’s much “should” involved here. I’m pretty sure pip’s current behaviour is inherited from the original distutils (subsequently setuptools) --prefix and --root options for the install command. Ideally, sysconfig should have reflected that (as @steve.dower noted, that was the intention behind the module) but it doesn’t do so[1], because if nothing else, there’s no obvious equivalent of --root. Previously, I would have said that sysconfig should document how it maps onto the distutils install options. Unfortunately, now that distutils has been removed from the stdlib, it’s possible to claim that such a mapping no longer belongs in the stdlib docs :frowning:

I do think that as things stand, having --prefix use get_preferred_scheme('prefix') is the best way to follow the likely intent here. And maybe sysconfig needs to get a root preferred scheme?

Or, plan B, we could just decide what options we want from scratch, and agree on mappings to the available information in sysconfig, treating the existing options/behaviour as legacy. I was planning on doing something similar anyway, here, so this would just be a slightly more extreme version of that exercise.

Whatever we do, though, I’d strongly prefer that we don’t look at it as just about pip, but rather as how any installer should offer options to customise the install location. And that’s why I think this needs a PEP, because of the implications for all tools.


  1. Or if it does, it doesn’t document things well enough. ↩︎

2 Likes

I think there are fundamentally two points that deserve a PEP here:

  • a standards track PEP to define what paths are on sys.path by default (and implicitly, where importable modules should be installed)
  • a packaging PEP to define where non-importable files get put

Without anything to do with distribution or compilation in the standard library anymore, there’s no need for the latter PEP to involve the core runtime. If the packaging community agrees to install headers in a certain location and to search that location when compiling extension modules, that’s great, but it’s all external to core… unless we decide to depend on sysconfig to figure it out.

I’d rather see it move outside the stdlib, since it’s not at all tied to the language version (let’s say it goes to packaging.locations or something). That also offers the fresh start needed for distros to be able to patch cleanly, and the only ones who’ll be upset are pip, because they won’t be able to rely on a vendored copy of that library :wink: (but Paul already said to not only design for pip, so I’m sure this will be fine :wink: ). Or maybe there’s some way to patch a per-install config file that even the vendored copy will find?

In any case, my main point is that it can be designed fresh and applied forwards and backwards across all versions, provided we don’t make it dependent on sysconfig. Let sysconfig remain for things very specific to the runtime and how it was built, and start fresh for package installations.

I think that’s likely to be a very bad idea. Apart from the fact that I don’t see how distros and other Python redistributors can reliably patch a 3rd party module to set up their layout, it seems far too easy for users to (accidentally or deliberately) end up with the wrong version. The stdlib is much harder to override, which is an important point in this situation. Plus, if pip isn’t going to vendor whatever this library is, how would it get installed into the user’s system in the first place? And if pip is going to vendor it, we’re back to how do distros patch it?

1 Like

Indeed. We must patch the stdlib for this, as users will sudo pip install --upgrade pip (or other package this lives in).

If the whole library is just looking for a distro-provided config file in some fixed location, and when that file is missing assumes platform-specific defaults, it’s no worse off than where we are today.

By contrast, if you want to do it via sysconfig, it’s all going to need a standards-track PEP, which is only available for 3.12 at a minimum (and given the lack of “obvious” answers here, I’ll be surprised if it’s ready by then), and you’ll still have five years before it can be relied upon.

A packaging track PEP only has to be an agreement between those involved in packaging, and it carries just as much weight when getting distros/consumers to do the right thing. The back-compat constraints are also different, and I think easier to handle (very easy to backport a config file to older releases in a distro - not so for actual layout or sysconfig modifications).

(In case it’s not clear, I’m suggesting this because I want this to happen sooner, not because I’m trying to sink the idea :slight_smile: We’re going to spend a lot of energy convincing the core team over where both first and third-party files should go, when we’re only really worried about defining it for third-parties.)

1 Like

@FFY00 pinging you on this since this likely needs a sysconfig level change.

I think it’s not only sysconfig, but the site initialization too. We are aware of the issue, and working on it. Here are some of the relevant CPython issues:

Some related good news, I got some time from $dayjob to work on the new sysconfig API proposal (rough initial draft: sysconfig-build-compilation.md · GitHub). While that doesn’t necessarily fix the issue at hand, I think it will make that much easier, by taking this use-case in account in the API design.

Note: I didn’t read the whole thread, only skimmed it, so it is possible I have missed something.

1 Like

Things are still very up in the air right now, but if we manage to fix this issue in a newer sysconfig API, we can provide a backport on PyPI and have projects being able to adopt the changes right away and have this kind of issues fixed. Doing this on only in a handful of packages, like pip, should mitigate the issues for 99% of the users, so I think it might be a viable way to handle it.

I’m pretty sure that fixing it in sysconfig is going to require storing more information at CPython compile time. We can’t really backport that, so what you’d be doing is shipping a library that encodes the known information about existing releases, but then uses the new API when it’s available, so that tools using it don’t have to figure out the version check themselves.

That is something vendors can do, though. One thing I have considered for eg., but am still very unsure about, would be making it so that this kind of patching could be moved to the sitecustomize module. That would allow us to effectively backport the changes without much issue, as long as the vendors are on board. However, I don’t want to be driving the API design for this kind of stuff, which is only temporarily helpful, so we’ll see.

In a backport module we can also detect the known vendor patches, which aren’t that many, and adapt the behavior based on that, so I don’t think this kind of thing is out of reach.

I’m wary of relying on vendors doing something, but it is likely reasonable to account for the fact that they might do so within the implementations we end up having for the tools.

Would it perhaps make more sense to spec out a data file to contain the info instead? Vendors who want to support it now can generate the file for any downlevel release they want, and we can make CPython’s build generate it. Then the library to read it could be totally outside of the stdlib (possibly in packaging).

I don’t know if patching a static data file is easier or harder than code, but I’d have to assume it’s easier.

1 Like

That can be a solution, but the hard question is what do we allow them to change? My main worry is that is likely something that can’t be expressed in a static data file.

Fedora, for eg, patches sysconfig to only change the default scheme when RPM_BUILD_ROOT is defined, meaning it only happens inside when building an RPM package.


Anyway, I think the reason these patches break expected behavior is because IMO they are conceptually wrong, as nothing related to the /usr Python should be installed to /usr/local [1], breaking the core design of how prefixes are supposed to work. IMO this is a vendor bug – because the design is not compatible with such thing, they simply add /home after the prefix placeholder in the sysconfig scheme templates, which is objectively incorrect. I think pip has a very strong case to simply close these bugs as a vendor bug and tell people to report them to the vendor.

We can standardize a data file, but without fixing these conceptual divergences, I don’t think it’d change much, it’d only serve to let tooling more easily detect and understand vendor patching, so they can maybe undo it.

The solution I have been proposing for a while now is to add an officially CPython-supported way to perform patching that can achieve the high-level outcome that vendors want, in hope that it may deter them from patching Python in broken ways.


  1. The FHS 3.0 says that the /usr/local prefix “may be used for programs and data that are shareable amongst a group of hosts, but not found in /usr”. ↩︎

2 Likes

Include environment markers in the spec? If we know the rough scope of things that may vary it, we can spec it out.

One really valuable aspect of having this as a data file would be for cross-compilation. It’s often easier to get the runtime for the targeted architecture and read a data file than to actually run it.

Actually, there’s a thing worth mentioning in this context, Various package-index filtering flags do not affect the environment markers · Issue #11664 · pypa/pip · GitHub

We’ve got some discussion of making pip easier to use in a cross compilation context, with a file containing a bunch of data. It’d be useful complementary piece to a data file as is being discussed here.

1 Like