Python Packaging Strategy Discussion - Part 1

dstufft · January 17, 2023, 3:30pm

I think that pip growing extra features is probably the least controversial way of arriving at a unified tool ^[1], since it already occupies a special place in the Python ecosystem. I also think that the way pip has been either implementing new features by incorporating reusable libraries, or splitting out old features into reusable libraries is also the best way to function for a hypothetical blessed tool, since it still enables alternatives in cases where the pip workflow doesn’t work for people.

The three biggest challenges I see in doing so (outside of tech debt, which I think is a problem for any project other than a greenfield one) are:

Environment management is one of the big things that people would want from a unified tool, but that pip’s current architecture is pretty ill suited to handle.
Publishing is one of the other big things people would want is publishing workflows, but those can easily get confusing with end user targeted versions of the same tooling (for instance, if there’s a command to build a wheel, how does that different from the existing pip wheel command and would the existence of both confuse people?).
It puts more burden on the already burdened pip team.

The environment management one is probably the thorniest technically, but I think it actually has a fairly tractable solution. The problem is roughly that pip needs to get information from the Python that it is installing into, and the way it does that now is to call a bunch of Python APIs to fetch the various bits of data it needs.

Historically pip had a -E flag, which allowed pip to “target” another environment and install into it, which was removed because the implementation of that feature was awful (it would just shell out and execute the pip that was installed into that environment). However, I now think that that flag had the right idea, we just need to implement it in a better, more sensible way. Presumably this would be to have pip subprocess out to the target Python with some tiny script that just executes the Python APIs it needs and then serializes that data onto stdout for pip running under some under Python to read ^[2].

At that point pip no longer needs to run in the environment it is installing into, which means it’s able to manage the environments itself as well. This has the additional benefit that we no longer need to proliferate a thousand copies of pip throughout a working system, and you can end up in a situation where pip is just installed once, but can install into many different environments.

Somewhat at least. I don’t think we’re ever going to get to a place like Rust with Cargo where there is a singular tool that just everyone uses. The genie is already out of the bottle on that in Python and I think the use cases in Python are varied enough, coupled with semantic differences between Python and Rust, that it’s not really possible to get to that end state BUT I do think we can get there for a subset of users. ↩︎
The various sysconfig and such APIs are pretty easy to handle in this way, the hardest thing is going to be things like reading installed packages, which the libraries to do don’t support targeting another Python, but in theory they should be able to be extended to support handing them a set of paths to look at rather than sys.path. ↩︎