Python Packaging Strategy Discussion - Part 1

smm · January 4, 2023, 11:13am

Summary of discussions:

We requested the Packaging community for feedback on the proposed survey questions
The Python Packaging Survey was conducted in Sept-Oct. The results are summarized here.
We asked maintainers/contributors on which forum they would like to discuss the Packaging Strategy. The most popular option was Discourse.

Discussion cue:

One of the common themes from the Packaging Survey was that the Python Packaging landscape is complex.

Survey Respondent 1 said: “Unify the multiple tools. It’s good to have new ideas and new implementations, but it has to converge after a while. If my package has a compiled part, I’m stuck with setuptools, but all other tools are pushed forward while not covering this feature.”

Survey Respondent 2 said: “There should be one– and preferably only one –obvious way to do it. Get rid of the fragmentation”

Survey Respondent 3 said: “I definitely want to Python to introduce the One True packaging tool, ideally both easy as Rust’s cargo (where building, adding dependencies, running tests, code checking and deploying are all subcommands) and extensible (support for different build regimes, extensions in foreign languages etc). Package installing is easy, package building is a wild west at the moment, and no one tool is good enough for all use cases.”

Survey respondent 4 said: “I would blow it all away and replace it with one damn thing. Too much choice leads to chaos and that chaos spills out to the user, rather than being confined to the folks choosing to revel in it. The dynamic library problem is very bad, and Python’s struggles with it are no worse, really, than those of other environments, but the choices made along the way, and the lack of consistent stern guidance from the top, have made Python’s build/deploy story a joke.”

In a nutshell, there are too many tools and users are not sure which ones to use.

Can we reduce the number of tools and bring about some form of unification? Can we do anything else to reduce the complexity?

If we do reduce the complexity, are there any obvious or not so obvious disadvantages in going down that route? If we do decide to reduce the complexity, how do we go about doing it?

Rules:

The cues listed above are only suggestions. Please add your thoughts or questions below.
This discussion is open to any active/passive PyPA/non PyPA tool maintainer/contributor
When posting for the first time, please indicate which tool(s) you are representing. This is for my benefit as I would like to gauge how many tools have been represented in the discussion.
The discussion for this post will be open until Jan 20

FFY00 · January 4, 2023, 1:18pm

(I’m not necessarily representing any project, but I am a maintainer of pypa/build, meson-python, and sysconfig)

I think so, but it’d require a great effort, which means it probably won’t happen without any funding.

It may be obvious to some, but I just wanted to highlight this: just unifying tools won’t do any good, we need to do a full UX analysis and then design something that fills the needs of users more naturally.

IMO, a unified frontend for all the different tasks we have is something worth pursuing, but it needs to be lead by someone/team with a strong UI/UX background.
It also shouldn’t replace existing tooling, but rather just re-export the functionality in a more coherent manner. This lets development be driven by area experts, in projects meant to target each of the required tasks specifically, leaving this unified tool as really just a UI project mostly.

pip already tries to do this, but it is burdened with historic decisions, and lack of maintainer time AFAICT. I am not sure if a clean slate would be worth, but it’s probably something it would make sense to consider – if we can find a big enough workforce to work on it, that is, which is the main problem.

TLDR: Lack of maintainer time is the main issue in the packaging ecosystem right now, but if we can fix that, my proposal would be to split each technical challenge into their own standalone project, and then have a unified tool re-exporting the functionality, with a team that is solely focusing on UI/UX.

IMO, there are two main factors that drive complexity in the Python packaging ecosystem, the first being historic decisions, which cannot be fixed without breaking things for lots of users, and the second being, because lots of things are inherently complex, and we are already doing our best to reduce complexity, so I don’t think there’s much we can do there other than providing more resources to projects to help tackle them.

pf_moore · January 4, 2023, 2:34pm

(I’m a maintainer of pip).

I agree wholeheartedly with this. And I’d go further and say that for such a tool to be successful, it will be at least as important to decide what workflows we won’t support, as what we will. The lesson we have learned from pip is that if you try to support everything that people come up with, you’ll end up in a mess. And of course, the big problem here will be that many of the respondents wanting a unified tool will have been making an implicit assumption that their preferred workflow will be supported…

Again, I agree. I do not think pip is a good base for such a tool. It’s too low level, and it tries to support everyone.

+1 again. Historical complexity can be removed, but do we have the stomach (or the resources) for deliberately breaking how a lot of users work? As I said, I doubt any of the survey respondents were asking for a unified tool that didn’t support their workflow…

Inherent complexity is a different matter. It would be a mistake to assume Python is a “special snowflake” and has nothing to learn from other languages and ecosystems, but we do need to be careful. For example, any lessons we learn from cargo need to consider that Rust has no deployment complexity - you build a binary and ship it^[1]. I don’t know npm as well, but I’m sure there are aspects of the Javascript lifecycle that don’t match Python. For example, does Javascript have the concept of compiled extension libraries?

I’m frankly not optimistic about doing something like this. It’s a worthwhile goal, and I completely agree it would be a substantial improvement for Python users. But it could suck up a huge amount of resource, which might be better spent on smaller, more incremental improvements in other areas. Even as a funded project, with its own hired resources, it would consume a big chunk of attention from the packaging community (assuming we wanted at least some say in what the successor to all the tools we’ve built would look like ).

One alternative that I think we should consider is continuing to work on the goal of splitting out the various “components” of packaging into reusable libraries^[2]. Projects like installer, build, packaging and resolvelib are good examples of this. Using those libraries, and more like them, it will be a lot easier to build new workflow tools, any one of which has the potential to become the unified solution people want. It’s definitely going to result in more complexity on the way to something simpler, and I’m pretty sure it’s not what the survey respondents were imagining, but I feel that it might be a better trade off for the long term.

That’s an over-simplification, of course. You can build applications with Rust with dependencies on runtime DLLs, etc. But it’s not common, and the ecosystem doesn’t really offer any support for deploying applications built like that. So I guess it’s a “workflow that the tool chooses to discourage”. ↩︎
In particular, I’d like to see an ecosystem where “pip is the only viable approach to installing packages” is no longer the self-evident truth that it is today. ↩︎

ofek · January 4, 2023, 3:29pm

As I mentioned here, I’m fine with Hatch providing this unified UX since it really is almost there already. The only thing that is out of my control is the lock file standardization which Brett discusses here.

As far as resources go, this is the only side project that I am working on nowadays and we use it at work so time allocation shouldn’t be an issue. Also we have some great contributors now, for example they just completely added type hinting to Hatchling. Another boon here as designed is the concept of plug-ins so maintenance is distributed.

brettcannon · January 4, 2023, 10:02pm

(I’m a maintainer of packaging and VS Code.)

I see this from the Python experience in VS Code perspective all the time. People often don’t have exposure to other workflows, so they innately think their workflow is “normal”, so if something doesn’t work for them then it obviously is broken for everyone, right? And when you point out that they are not necessarily normal, not everyone is up for changing to match a more common workflow.

Yes, but they are not hosted on npm as built libraries; think of them like sdists.

This is the approach I also support. If we can get everything backed by standards and then make using those standards easy by making sure they have backing libraries, then we can make it easier for people to experiment a bit as to what the proper UX should be for any front-end packaging tool. But it also gives people an escape hatch (no pun intended) to use a different workflow if the preferred/default tool doesn’t meet their needs.

More complexity for us or users? I can see for us as we have to put in the effort to write the code for everyone to use (see the ongoing journey of packaging.metadata as an example ), but I personally can’t think of how users are impacted by these things.

Probably a page on packaging.python.org that lists the tools and why one might choose them. This came up on Mastodon at Antoine Beyeler: "@daneah@fosstodon.org I'm currently reading "Publ…" - Mastodon just today. I think part of the issue about the perceived complexity is people simply don’t know where to turn for current knowledge and lack guidance without finding the right blog post or article explaining how to do something.

Only if it becomes too restrictive (which I don’t see us allowing to happen). While people bemoan Python’s packaging story as being too complex, it’s flexibility is what helped it become the glue language of the programming world. While I would always prefer people wrote more Python code than wrapping existing code, we also have to accept, and to an extent embrace, that folks simply won’t do that.

I’ve previously said I’m fine with that viewpoint if we wanted to push a tool we have today.

I haven’t forgotten. GitHub - brettcannon/mousebender: Create reproducible installations for a virtual environment from a lock file is the high-level steps left to work towards an MVP lock file format.

pf_moore · January 4, 2023, 10:12pm

I was thinking that while we are experimenting with new tools, there would be more tools and hence more choice/confusion, rather than less. But this is something that’s in our control - we could manage that to limit the impact. The question then is whether the packaging community would co-operate on a single solution, in that way, or whether everyone would want to “do their own thing” once they get that ability.

h-vetinari · January 5, 2023, 12:12am

(I’m a regular contributor to conda-forge, a large(ly) parallel ecosystem of python packages that’s used widely in data-, ML- and science-heavy python projects. This does not make me a maintainer of the actual tools – conda, mamba, etc. – that are the UI for this ecosystem)

@smm, have you seen this thread by @rgommers? It introduces a resource page that’s intentionally solution-free to establish a baseline understanding among the various different problems/needs/constraints in the python packaging ecosystem, particularly for those projects involving “native” code (i.e. code wrapped by, but not written in, Python).

With great respect for Hatch, “almost there already” is a big stretch IMO, given all the problems pointed out with (e.g.) native code. This is not Hatch’s fault (nor responsibility), but we’re (IMHO) emphatically not close to declaring success.

As long as the ecosystem currently being served by conda cannot be folded back into “the One True Way”, we have not actually solved the schisms in python packaging (i.e. everyone can just use the same tool). Note that this is not some zealous attachment to conda as a tool or philosophy, but about not regressing the capabilities that are necessary to solve large classes of problems for the “data science persona” at scale. My impression is that this pragmatism is shared by many if not most in conda-land.

Indeed, it is a blessing and a curse, but now we have to deal with it.

To this point, from my POV, the uncomfortable “math” here is to either:

solve most of the problems outlined in https://pypackaging-native.github.io/ (a gigantic undertaking)
define large parts of the data science ecosystem as out of scope (…)

Almost certainly, 2. won’t fly for the SC (who would want to define ~half their user base out of existence), and the wider PyPA community has consistently declared 1. as out of scope (unsurprisingly, given the monumental complexity resp. the available resources).

Both points are understandable for the respective stakeholders, but they are at odds with each other, and (IMO) the fundamental tension underlying the lack of tooling homogeneity.

As painful as 2. looks from a language governance POV, this is effectively what’s happening in various pockets of the ecosystem most affected by these problems (e.g. the geospatial stack), where installation instructions often uniformly recommend an alternate (non-PyPA) installer, and wheels etc. are not provided.

Hopefully this can be mitigated with things like PEP 668 (which would make it less “all-or-nothing” to use other package managers, and could more or less gracefully hand off installation of too-complicated packages from pip/wheels/PyPA to another package manager where necessary), but even achieving that is still a far cry from the “unification” that the survey comments cited in the OP are calling for.

ofek · January 5, 2023, 12:19am

Yes to be specific I was mostly talking about the command line interface similar to Cargo/NPM being almost already there. For that lock files are the missing feature and for Hatchling extension modules are the missing feature; the former I can’t do anything without Brett and the latter I can’t do anything without Henry. Both of them are aware of this and are helping on their own time.

sinoroc · January 5, 2023, 9:29am

Regarding the seemingly impossible task of providing wheels for some native libraries that are difficult to compile… I read more and more headlines about WASM/WASI, is it something that could help us here?

I have only very surface knowledge of those topics, but my impression was that WASM/WASI could provide some kind of “compile once, run anywhere” type of workflow. Maybe the way there is still too long for it to be a solution in our case here, or maybe there would be some performance loss, I do not know…

pf_moore · January 5, 2023, 11:03am

That’s not what I was saying. I was specifically talking about support of workflows, not of use cases. So, in particular, I am absolutely not in favour of declaring any groups of users or parts of the ecosystem as “out of scope”. I do expect that we may need to ask users to do things in certain ways in order to address their use cases, but I don’t see why that should be unacceptable - there needs to be compromise on both sides if we’re to create a single solution that works for everyone (which is the topic of this discussion).

I don’t think this is a fair assessment. The PyPA focus is on supporting the “standard” builds of Python (python.org, Linux distro builds, Windows Store, self-built interpreters, Homebrew, …) Solutions that require the user to switch to a different Python build don’t fit that remit. I don’t think that “declare all of those Python builds as out of scope” has any more chance of being acceptable to the SC than “declare a big chunk of the user base” does.

Maybe that does mean that we need two independent (but hopefully co-operating!) tool stacks. That’s something that could come out of this discussion. But even then, why does the UI have to be completely different? Could we not have a unified UI with different “backends” somehow? At the moment, we’re focusing on the technical challenges of the backends, but this discussion is supposed to be about the user experience - so maybe a standard command structure that both conda and “the PyPA tool” follow is an acceptable compromise here.

Overall, your comments sound pretty pessimistic. I’d much rather we approached the discussion with a more positive attitude - we’ve been given a pretty clear indication from the user survey that a “standard frontend” is something that our users want, let’s see how we can approach that goal, even if the technical challenges behind the scenes make it difficult to do everything we believe the users want.

https://pypackaging-native.github.io/ is a great example of this^[1]. It sets out the technical challenges without presuming a particular solution. Maybe we could do something similar for the front end - document what users want to do, as opposed to what tools currently let them do, and clarify the challenges involved in delivering that without focusing on the limitations or constraints of particular tools.

I believe you were involved in producing this document, so many thanks for your efforts in that. ↩︎

rgommers · January 5, 2023, 11:24am

EDIT: my first post here, so adding projects I work on: meson-python, pypackaging-native, NumPy, SciPy

This would be very useful. One challenging part of this discussion is that it’s not even all that clear what the frontend exactly is. It’s certainly not a “build frontend”, but it gets confused by there not being another such term as well as by Poetry, Hatch, PDM & co supplying various other pieces of the puzzle (build frontend, build backend, and/or a resolver). From the perspective of dealing with native code as a packager or library author, Poetry, Hatch and PDM are basically all the same, and not really in the picture (if it works for Pip, it works for those workflow tools as well, modulo some details).

h-vetinari · January 5, 2023, 11:52am

It’s a problem grown over ~30 years (longer if counting the problems of C/C++ underlying many python packages), with many deep rabbit holes. I’d prefer to call it realism to think that we’re not a PEP away from fixing this.

So despite my not-exactly-rosy outlook, I’d like for this problem to be solved, which is why I’m trying to contribute something on that journey. I also happen to think that pypackaging-native is a step in that direction.

h-vetinari · January 5, 2023, 12:20pm

Those other python builds exist primarily (IMO) because the existing tools were underserving a substantial amount of packages and users. The root causes of that schism should absolutely be in the remit of PyPA, even more so in a discussion that starts with the premise of unification. And previous efforts at such unification have run into that “out of scope” stance pretty verbatim.

Regardless of the fact that this stance is understandable, the goal should IMO be to come up with something that obviates using those other Python builds in the first place, or at the very least gets them to a level of constructive coexistence.

To my mind (and though it’s still too early to tell what other ideas people come up with), the task at hand will be to analyse the set of challenges & constraints (partly summarized in pypackaging-native), and then decide how to solve them and under what umbrella.

For brainstorming about those solutions, I’d really like us not to think in terms of “python.org installers” or “Windows store builds” or “conda”, but in terms of anything that can satisfy the same relevant set of requirements. After that we can iterate on the solution^[1] until we have something that gets enough consensus for implementation, but a priori anything should be fair game for change.

needless to say, such a solution must include a sane migration path from here to there ↩︎

pf_moore · January 5, 2023, 12:40pm

Fair. Individuals have differing opinions. But I don’t think there’s any PyPA policy saying this is out of scope.

Again fair. But I think that “how do people get Python” is part of the question about a unified solution. In much the same way that the “rust experience” isn’t just cargo, it’s also rustup. But just as we can’t assume the SC would be OK with ignoring a chunk of the user base, I don’t think we can assume the SC will accept dropping those ways of getting Python. We can ask, but we can’t assume.

This is where the boundary between packaging (the PyPA) and the SC blurs. I personally think that the SC’s “hands off” approach to packaging puts us in a bad place as soon as we get close to areas where the SC does have authority. Distutils was dropped from the stdlib, and the packaging community had to pick up the slack. We have to address binary compatibility, but we don’t have control over the distribution channels for the interpreter. We provide tools for managing virtual environments, but we don’t control the venv mechanism. We install libraries, but we don’t control the way import hooks work. Etc.

Assuming we don’t want to involve the SC (which is not a foregone conclusion in my view, but doing so would be a much bigger change even than the topic of this discussion), we have to accept that people get Python by means that are out of our control, and declaring such users “out of scope” simply marginalises our impact, and fails to achieve our goals^[1].

Or at the bare minimum, my goals ↩︎

h-vetinari · January 5, 2023, 12:49pm

Fully agreed with that and everything below. I honestly don’t think the problems can be solved comprehensively without language level (read SC) involvement, but in any case, I think manageable changes in the distribution channels should not be off-limits in the context of this discussion.

pradyunsg · January 5, 2023, 9:46pm

I have a lot of thoughts to share but I’ll do that separately^[1]. Before that, I feel some urgency to say…

Everyone is thinking of different things when they see “unification” in this question – which is why this discussion is all over the place.^[2] We ought to start by setting up shared vocabulary+understanding of what we’re even talking about in the context of unification.

I can see the following dimensions/aspects to the unification story:

Unification of PyPI/conda models ^[3]
Unification of the consumer-facing tooling^[4]
Unification of the publisher-facing tooling^[5]
Unification of the workflow setups/tooling^[6]
Unification/Consistency in the deployment processes^[7]
Unification/Consistency in “Python” installation/management experience^[8]

Can anyone think of any other dimension/aspect contributing to the “tooling complexity” problem?

Listen, at this point, I’ve been typing for the last 3 hours and I need to eat dinner now. That I moved to VS Code, after spending nearly 2 hours on discuss.python.org’s text editor, is a really good indicator that this isn’t the right medium for what I have written – so, I’ll put it up on my blog (with some polishing to make it readable without this thread). Sorry for the cliffhanger-ish opening sentence. ↩︎
This sentence should have started with “I feel that”. I’ve omitted that since it’s more assertive this way.
Also, sorry, but it was a bit of a roller coaster reading the discussion so far. ↩︎
i.e. the non-Python code dependency problem ↩︎
i.e. consuming libraries ↩︎
i.e. publishing libraries ↩︎
i.e. organising files, running tests, linters, etc ↩︎
i.e. going from source code → working application somewhere ↩︎
i.e. the rustup/pyenv aspects of this, which is absolutely a thing that affects users’ “Python Packaging” experience (think pip != python -m pip, or python -m pip vs py -m pip, or python being on PATH but not pip etc) ↩︎

pf_moore · January 5, 2023, 10:09pm

I may have missed how it fits into one of the other categories, but unification of the interface of tools?

By which I mean, common subcommand names, common options and terms (“index” vs “channel”), common configuration files (so that you can set your options in one place and have all tools respect them), etc.

pradyunsg · January 5, 2023, 10:22pm

I didn’t think of that and it doesn’t fit into any of the existing buckets as-is. And, yea… it is definitely another aspect here:

Unification of similar configuration/info across different tools (similar to what .pypirc does for auth credentials, PEP 621 did for project metadata, .editorconfig does for linters etc).

brettcannon · January 5, 2023, 10:53pm

Not right now for WASI, maybe for WASM if you mean Pyodide. See WebAssembly and its platform targets for an explanation of the differences and what it means for extension modules.

I don’t think we are under the illusion that any of this is going to be fixed quickly. But trying to be positive while we tackle the problem is at least appreciated, because at least for me, if we start out as doom-and-gloom it just isn’t motivating to try and tackle such a hard problem when things continue to function as-is, well or not.

It’s simply a matter of asking.

I’m on the SC for one more term before I step down (5 years is enough ), so if you want to ask the SC for something packaging-related and want me to help explain it while I’m still on the SC, 2023 is your chance to do that (not that I wouldn’t be happy to provide info to future SCs I’m not on, but it’s obviously easier when I’m already sitting in the meetings).

hauntsaninja · January 5, 2023, 11:08pm

(I maintain a custom packaging tool at work and have made minor contributions to several PyPA projects)

On the point of being positive, just wanted to thank people for work already done and to +1 points already made!

One alternative that I think we should consider is continuing to work on the goal of splitting out the various “components” of packaging into reusable libraries. Projects like installer , build , packaging and resolvelib are good examples of this.

This is a great approach. I maintain a custom packaging tool at work that addresses our specific needs and the libraries you mention have made this increasingly easy.

It may be obvious to some, but I just wanted to highlight this: just unifying tools won’t do any good, we need to do a full UX analysis and then design something that fills the needs of users more naturally.

Agree, the hard thing here is the hard thing. To a first approximation, we did have a unified single “frontend” that did everything: python setup.py <command> and for various reasons that led to the ecosystem not meeting the needs of users. poetry and hatch have won usage by solving users’ problems, which IMO is usually the best way to solve the xkcd: Standards problem