Python Packaging Strategy Discussion - Part 1

groodt · January 17, 2023, 8:01am

I don’t think it’s been explicitly mentioned in this mega thread, but perhaps one way to “break the deadlock” or decision fatigue is to choose none of the options!

What I mean by this: what’s the downside of merging much of the functionality from the community of tools into pip? So pip becomes the “one true tool”?

This might actually be what the casual user expects to be honest. They’re wondering why pip hasn’t improved (from their perspective, I am well aware it has improved A LOT) and what this “conda” thing is that people keep talking about. They’re looking and waiting for leadership.

I personally see a lot of pros.

2 cons I thought of:

pip maintainers might not want to do this
pip will need to commit and be seen to move in a direction towards a unified tool, perhaps moving at a faster pace than has historically happened (This isn’t really a con, but definitely changes the game. It’s a great opportunity and responsibility. I think you can do it though. )

pradyunsg · January 17, 2023, 8:53am

I was literally writing this yesterday.

We’ve got a “privileged”/default tool in the ecosystem already, and that’s a big part of why it serves as the common denominator.

I don’t think we can have pip combine all the workflows/innovations into one thing – that isn’t tractable or reasonable, but (1) pip build, (2) pip publish, (3) pip lock/pip resolve and (4) pip sync will get us ~90% there and it’s not an intractable task as I’ve seen people claiming in the thread earlier.

pf_moore · January 17, 2023, 9:30am

IMO pip has a lot of technical debt which we would need to pay off before we could add a lot of extra complexity. So while “just add the functionality to pip” is plausible, it’s not a quick option.

Also, I don’t know if the wider workflow management aspects that I believe people are asking for would fit as well into pip. The debate over a pip run command suggests that this might be a tough path to follow.

Basically, and I think someone mentioned this previously, I think a lot of the actual user complaints are around environment management, and that’s the area that is least likely to fit into pip.

groodt · January 17, 2023, 9:58am

There’s certainly unfinished business in pip. It’s a software tool, it’s always going to have things to fix or improve.

Can you elaborate on what aspect of environment management is challenging to fit into pip?

In my opinion, if pip did the following:

added support for PEP-582 (or venv-by-default)
settled on a lockfile format
supported a publish flow

Those items (along with existing functionality and ongoing improvements), would address ~90% of the requirements that I’ve encountered in projects as a python developer over the last 10 years.

Yes, it doesn’t solve everything. There’s still lots of challenges around native extensions, static metadata etc, but I would be very surprised if the reception from the community wasn’t warm with this simplification of the ecosystem. It simplifies docs and concepts that users need to understand, it simplifies tools people need to install, it centralises effort of many of the leading experts in python packaging in the same project.

Yes, it won’t be easy. Yes, it’s a 180 degree pivot from the previous direction. However, I personally think it could be transformational if there was a consolidation of efforts behind a single tool after so many years of trying to increase the diversity of tools. To be clear, I think this ebb and flow is natural. But it’s clear from the survey and from most general python developers I speak to, that it’s possibly over corrected towards diversity and now there are too many choices with a dizzying number of PEP numbers that most users honestly don’t care about or understand.

johnthagen · January 17, 2023, 11:54am

If I said “PDM is the official packaging workflow tool, poetry, hatch, conda and any other tools operating in this area are now obsolete”, no-one would listen.

As others have said, the power to influence this is by bundling the tool in the default Python installer. Rust does this with Cargo and Node does this with NPM, both of which are successful.

As to what gets bundled, it seems to me like a combination of what people need from Poetry/hatch/PDM + native code building extension/plugin support.

I certainly appreciate the concerns about existing tech debt and hesitancy taking on such a big task, which is why I think the PSF would need to help raise funding for this.

abravalheri · January 17, 2023, 12:45pm

I don’t think only funding is enough. PSF has been keeping a list of projects that are fundable for a while and I don’t know how effective this approach has been…

Maybe, it would be important to actively find people to “hire” (or something similar) and maybe even “project manage” it.

dstufft · January 17, 2023, 3:30pm

I think that pip growing extra features is probably the least controversial way of arriving at a unified tool ^[1], since it already occupies a special place in the Python ecosystem. I also think that the way pip has been either implementing new features by incorporating reusable libraries, or splitting out old features into reusable libraries is also the best way to function for a hypothetical blessed tool, since it still enables alternatives in cases where the pip workflow doesn’t work for people.

The three biggest challenges I see in doing so (outside of tech debt, which I think is a problem for any project other than a greenfield one) are:

Environment management is one of the big things that people would want from a unified tool, but that pip’s current architecture is pretty ill suited to handle.
Publishing is one of the other big things people would want is publishing workflows, but those can easily get confusing with end user targeted versions of the same tooling (for instance, if there’s a command to build a wheel, how does that different from the existing pip wheel command and would the existence of both confuse people?).
It puts more burden on the already burdened pip team.

The environment management one is probably the thorniest technically, but I think it actually has a fairly tractable solution. The problem is roughly that pip needs to get information from the Python that it is installing into, and the way it does that now is to call a bunch of Python APIs to fetch the various bits of data it needs.

Historically pip had a -E flag, which allowed pip to “target” another environment and install into it, which was removed because the implementation of that feature was awful (it would just shell out and execute the pip that was installed into that environment). However, I now think that that flag had the right idea, we just need to implement it in a better, more sensible way. Presumably this would be to have pip subprocess out to the target Python with some tiny script that just executes the Python APIs it needs and then serializes that data onto stdout for pip running under some under Python to read ^[2].

At that point pip no longer needs to run in the environment it is installing into, which means it’s able to manage the environments itself as well. This has the additional benefit that we no longer need to proliferate a thousand copies of pip throughout a working system, and you can end up in a situation where pip is just installed once, but can install into many different environments.

Somewhat at least. I don’t think we’re ever going to get to a place like Rust with Cargo where there is a singular tool that just everyone uses. The genie is already out of the bottle on that in Python and I think the use cases in Python are varied enough, coupled with semantic differences between Python and Rust, that it’s not really possible to get to that end state BUT I do think we can get there for a subset of users. ↩︎
The various sysconfig and such APIs are pretty easy to handle in this way, the hardest thing is going to be things like reading installed packages, which the libraries to do don’t support targeting another Python, but in theory they should be able to be extended to support handing them a set of paths to look at rather than sys.path. ↩︎

pf_moore · January 17, 2023, 3:32pm

Step one would be to add the proposal to that list. Step two (likely needed before anyone would fund the work) would be to precisely define the deliverables. From that point, deciding on what resources were needed and how to find them should follow fairly naturally, alongside the process of finding someone to provide the money.

pf_moore · January 17, 2023, 3:40pm

For a long time now, I’ve wanted to do something like this. I wasn’t aware that it had previously existed. I certainly think this is a useful direction for pip to take, although I’m not sure how close that gets us to what people mean when they say “environment management” - as usual, the difficulty is likely to be in agreeing what we actually want, in sufficient detail to deliver it.

We already have something like that with pip --python, which runs a second copy of pip using the target environment’s interpreter (basically re-using the isolated build environment code). So installing pip in every environment is no longer needed. It’s still done because (a) inertia and legacy expectations from people, and (b) tools that expect to be able to run pip in a subprocess don’t have a reliable way to find pip if it’s not installed in the environment.

dstufft · January 17, 2023, 4:05pm

Hmm, is it just running the whole pip inside that target environment? That feels more fragile that just interrogating the environment and running Pip outside of that target environment… but in any case, as long as it doesn’t depend on pip being installed in that environment then that’s excellent. I had missed that feature had already been implemented.

I don’t think the -E flag, which had a UX similar to that of --python ^[1] is what people mean when they say environment management. I think that the ability to install into a Python you’re not running on is, or well was, the main technical blocker to doing so. With that ability, the problem then shifts to defining what it is we want pip to actually do.

Like just as an example, if we decide that something like PEP 704 is the way forward, we could actually modify that such that instead of erroring out for people. pip locates where the environment should be, creates it, and then runs as if --python .venv had been run.

Given the existance of --python, I think the hardest parts then become:

Getting agreement that evolving pip to be that unified tool is a good path forward, or at least could be a good path forward.
Getting agreement on what our desired end state actually looks like.

For the first of those, I think we can make a rough consensus decision on this thread. It wouldn’t block the ability of other tools to continue to exist, iterate, and compete. It would just be declaring that we view a future where pip was the primary tool for interacting with Python packaging for the 80% use case as a reasonable outcome. It wouldn’t be any sort of mandate, it would just really be keeping pip where it is now, but extending it so that people don’t have to also learn twine, virtualenv, etc.

If/Once we had that rough consensus, the for the second of those, I think maybe we’d be best served by taking proposals for what exactly we think our endstate goal should look like, what commands exist, what do they do, etc. Then weigh between them and figure out what path we can take to get from where we are now, to where we want to go.

Then we would just need someone(s) to look at those proposals, and pick one as the roadmap for unification. That could be through the PEP process, or it could just be treated as a pip issue and let the pip maintainers select one. Then we “just” work towards that end goal ^[2].

Except implemented in a bad way, it executed the pip that was installed in the target environment. ↩︎
Of course, we can mutate that end goal as needed if we change our mind or something becomes more obvious as implementation happens. ↩︎

pf_moore · January 17, 2023, 4:29pm

No more so than setting up a build environment (which does exactly the same as --python).

pradyunsg · January 17, 2023, 5:05pm

We have this again today, with --python now.

github.com/pypa/pip

Add a --python option

pypa:main ← pfmoore:python_option

opened 11:16AM - 28 Jul 22 UTC

pfmoore

+152 -3

This is an initial draft for a `--python` option for pip, that runs pip with a s…pecified Python interpreter. To do: - [x] Tests - [x] Documentation - [x] Refactoring (`_get_runnable_pip` should probably be moved somewhere more general) - [x] Support friendlier interpreter names, not just the full path (check how other apps do this - is there a common library?) The use of an environment variable to stop infinitely creating new subprocesses feels OK to me, but is there a better way? Stripping the `--python` option out of the command line is probably impractical (i.e., even more risk of missing something and fork-bombing the user's machine 🙁).

We’re at this point already.

The new pip build and pip publish can be “clearly” end-user facing, and pip wheel might benefit from getting an alias to pip wheelhouse to reflect that it’s intended for creating a directory full of wheels.

I think we can get a long way with changes to help and documentation to separate the usecases/workflows clearly + communicating about this.

sinoroc · January 17, 2023, 6:19pm

Right now the scenario that I have in mind from a UX point of view would be something like this:

1. Some kind of “bootstrap” tool:

easy to distribute, install, update, and use
- possibly a single file binary (not necessarily written 100% in Python *)
- release schedule independent from Python versions
feature scope and use cases:
- (make sure to cater to the simplest cases only, no feature creep)
- install and manage Python interpreters à la pyenv
  - possibly others than just regular CPython
- install and manage (lightweight) Python applications à la pipx
  - consumer of lock files
- run Python scripts and commands à la py launcher
  - this could be the place to include support for __pypackages__ without changes to the Python interpreter itself

* Most features would require a Python interpreter to be installed anyway, so once this is done the work can be delegated to some Python code executed by the Python interpreter

2. A “developer” tool

From my point of view this could be hatch, pdm, or poetry. I think I would prefer this tool not to be pip (but if it is, it is fine by me). Maybe pip should stay focused on what it already does best, installing things, and maybe even unlearn things like pip wheel.

I think PyPA should pick one (hatch, pdm, pip, poetry, or whatever), not overthink it too much, and then slowly build it up into the thing that covers “the opinionated PyPA workflow(s)™”.

PyPA should choose, document, and recommend things like the project directory structure (src-layout, .venv at the root, tests?), handling of janitorial tasks, and so on. This step is important because then when a project does not adhere to this workflow, PyPA can say “sorry not supported by us” and move on (focus on the hard things: lock files, hard to compile dependencies, metadata override).

On the other hand PyPA should still work on writing the standards and libraries that are less opinionated, in order to nurture a healthy competitive ecosystem for things that PyPA does not want to (can not) support.

I guess a good rule to decide if a feature should belong in the bootstrap tool or in the developer tool would be whether or not the task requires writing Python code. For example if I want to use httpie, I should be able to do it without using the developer tool. If I want to create a library I should not be able to do it with the bootstrap tool.

I think the bootstrap tool was already described earlier in this thread (was it under the pyup name?). I would like to see such a tool, I would most likely use it every day.

Yes to finding a technical project manager (to write down specs and requirements), then get funding, and finally hire developers.

cgdae · January 17, 2023, 7:22pm

Ok, so packaging does some of what i want. But not all, and the platform tags it generates differ from what cibuildwheel creates.

It seems to me that this situation is crying out for standardisation in the standard library. It’s the only way that things like cibuildwheeland packagine can be made consistent with each other.

Can i ask where this is being worked on? Is it in packaging or somewhere else?

Thanks for explaining. This shouldn’t preclude providing the basic low-level packaging functionality in the standard library though, and i suspect this would be of enormous benefit.

sdispater · January 17, 2023, 8:55pm

Disclaimer: I am the creator of Poetry and one of its current maintainer.

I would like first to ask a question: is it the role of the PyPA to endorse or promote a single tool? Is this even needed?

As far as I know, other languages do not have a packaging authority and it did not prevent their community to rally around a unified tool.

Poetry is the perfect example of this: it was never endorsed by the PyPA and was even in direct “competition” with the PyPA-backed tool at the time (pipenv) and it did not prevent it to thrive and gain traction making it the second most downloaded “packaging” tool (behind pip, obviously, and before pip-tools and pipenv) today.

Poetry now has a presence, a community, a great team of contributors and is popular enough to be seen as a potential unified tool. And now that Poetry supports plugin, it can be extended to support more use cases that are not part of the base workflow that Poetry provides (monorepo with workspaces – even though it might ultimately make it to Poetry at some point – or npm-style scripts support).

I know some of its detractors have been vocal about Poetry doing its own thing and not supporting standards but bear in mind that Poetry was started before some of these standards even existed. However PEP 621 support is coming along with the depreciation/removal of non-standard dependency specification operators. With the user base that Poetry now has these kinds of migrations takes time to plan and do well to ensure nothing breaks.

Regarding extension building, Poetry gives free reign to build them how you see fit while using its own build backend (poetry-core). You can specify a build.py script in which you can pretty much use any tool you like. Here is an example with Meson: pendulum/build.py at master · sdispater/pendulum · GitHub. You just have to add your build requirements in the build-system section in addition to poetry-core and that’s it.

I am obviously biased but Poetry covers a lot of the needs most users might have and its popularity shows that. The fact that similar tools released after it did not gain as much traction is proof enough that the differences, even standards support, were not incentive enough to make the switch. This is especially true for Hatch which does not support lock files which is something users want and need, so switching to it would actually be a regression.

Do I advocate for Poetry to be this unified tool? In part, yes, but in the end I can’t make this decision for the users. What I know is that we are trying to build the best experience to make building and managing Python projects as intuitive and fun as possible. That’s what matters, the rest is secondary.

dstufft · January 17, 2023, 9:29pm

It’s one of the most common things that people ask for. Do we need it? I mean obviously the status quo works, but it solves a common problem people have with the tooling.

Rust has a packaging team, Go’s core team invented Go Mod, there’s probably other examples.

I don’t think anyone is suggesting preventing there to be options. The question is whether there should be a recommended or “default” option, not whether we should provide an only option. Obviously users want it, and they don’t feel well served by the status quo.

ofek · January 17, 2023, 10:29pm

Taking a step back, I think right now is not a good time to even choose a tool as others have mentioned. I would like to strongly emphasize that the Python packaging community is fundamentally missing features that the future tool would be expected to have/improve upon.

Concretely, on the user facing consumer side we don’t have a lock file and that is a hard blocker. On the building side we don’t have a standardized way to build extension modules for build backends and, as the Conda folks have pointed out, the situation is super complex (my assumption is that we will never support every use case with ease, but we can come up with ways to easily solve the majority of these cases).

I think that we should really focus on these fundamentals first. If we do not, I think we’re not going to make any progress (kind of like how we aren’t making any in this thread).

brettcannon · January 17, 2023, 10:33pm

Consultation and a lot of talking. Otherwise the SC could help set up a packaging advisory committee or something if you/Paul felt more comfortable with that.

I don’t want to side track on this topic, but:

If by “spearhead” you mean “invent”, that’s not going to happen because I personally don’t want that, but I am willing to help push anything this group rallies around.
If people don’t already trust me to do the right thing for this community, then I don’t know what more I can do to convince them to trust that I always have the community’s best interest at heart.
Haters gonna hate.

The building is separate from the rest since that comes down to what you put into [build-system] in your pyproject.toml, so there’s nothing to distribute (especially since you have to download your build dependencies anyway).

For me, I think when people say “environment management” it covers:

Creating
Selecting (as in how to specify which environment to use when there are multiple options)
Deleting (if its location is non-obvious)

environments. After that, because pip has historically been installed into environments (and conda/mamba handle environment management already), I don’t think flags on pip to point at specific environments has been what people have thought about.

And I will say that after diving into the topic of virtual environments, I will say there is no 90% answer, so there will be some teeth gnashing regardless of the decision.

That’s already the plan if PEP 582 happens: Support PEP 582: __pypackages__ · brettcannon/python-launcher · Discussion #70 · GitHub

That’s going to come down to your build tool then since cibuildwheel isn’t directly creating anything, but instead driving the build tools for your project. But at this point, packaging is as close as you get to a standard library for packaging specs.

I understand why you think this, but this isn’t going to happen to the extent you’re thinking. We are removing distutils from the stdlib for a reason already, so going too far into pushing things into the stdlib would be taking a step back.

Now, having said that, somehow making it so interpreters provide details about their wheel tags directly has been discussed, but that’s off-topic.

packaging

This is getting off-topic for end user UX, so I’m going to say that I personally do not support moving large chunks of packaging library code into the stdlib for a myriad of reasons and ask that other questions on this topic be done in a new thread.

dstufft · January 17, 2023, 11:06pm

I don’t think getting these features is going to change the situation. People who prefer X tool are always going to advocate that tool, and people who prefer Y tool are going to be grumpy if X tool is selected as the recommendation. Adding more check marks on the feature matrix isn’t going to change that.

I also don’t think it’s critical. It’s not like we’re choosing one tool that we’re freezing in time, and everyone must use that frozen in time tool forever. We’re choosing what tool to recommend, by default, that will continue to evolve and get new features, bug fixes, etc.

All of those things are technically possible to add to pip, even in a way that doesn’t break the usage of pip from within another environment manager like conda.

Here’s my argument for why it makes sense to just evolve pip:

We’ve, both the PyPA and the community at large, already chosen it. The community chose it as the default way to install stuff prior to the PyPA being anything more than a tongue in cheek joke about the lack of an authority. Python Core chose it as the default way when they accepted PEP 453 and bundled pip inside of Python. PyPA chose it when documentation was written on packaging.python.org and on pypi.org.

So, IMO, the decision has already been made. However, user’s want more and we’re trying to retread that ground while ignoring the tool that has the vast bulk of community consensus around it already. On top of that, by reopening the discussion about what tool should be the recommended tool, we put ourselves in an unwinnable position.

Our only real choice here is whether we want to meet users where they are already, or whether we want to try and convince everyone to abandon the tools they’re already using to use something completely new.

In other words, we can more easily move an ecosystem by incremental improvements to pip, then we can by boiling the ocean and trying to get everyone to migrate to something new.

ofek · January 17, 2023, 11:57pm

I think we should also consider the practicality of adding all of the required extra features to pip. It would be a massive undertaking and would likely deter contributions (at least I would not want to). Having a tool with the right UX calling tools that specialize (like pip for dependencies) is in my mind orders of magnitude easier and won’t take several years.