Python Packaging Strategy Discussion - Part 1

EDIT: my first post here, so adding projects I work on: meson-python, pypackaging-native, NumPy, SciPy

This would be very useful. One challenging part of this discussion is that it’s not even all that clear what the frontend exactly is. It’s certainly not a “build frontend”, but it gets confused by there not being another such term as well as by Poetry, Hatch, PDM & co supplying various other pieces of the puzzle (build frontend, build backend, and/or a resolver). From the perspective of dealing with native code as a packager or library author, Poetry, Hatch and PDM are basically all the same, and not really in the picture (if it works for Pip, it works for those workflow tools as well, modulo some details).

2 Likes

It’s a problem grown over ~30 years (longer if counting the problems of C/C++ underlying many python packages), with many deep rabbit holes. I’d prefer to call it realism to think that we’re not a PEP away from fixing this.

So despite my not-exactly-rosy outlook, I’d like for this problem to be solved, which is why I’m trying to contribute something on that journey. I also happen to think that pypackaging-native is a step in that direction. :upside_down_face:

1 Like

Those other python builds exist primarily (IMO) because the existing tools were underserving a substantial amount of packages and users. The root causes of that schism should absolutely be in the remit of PyPA, even more so in a discussion that starts with the premise of unification. And previous efforts at such unification have run into that “out of scope” stance pretty verbatim.

Regardless of the fact that this stance is understandable, the goal should IMO be to come up with something that obviates using those other Python builds in the first place, or at the very least gets them to a level of constructive coexistence.

To my mind (and though it’s still too early to tell what other ideas people come up with), the task at hand will be to analyse the set of challenges & constraints (partly summarized in pypackaging-native), and then decide how to solve them and under what umbrella.

For brainstorming about those solutions, I’d really like us not to think in terms of “python.org installers” or “Windows store builds” or “conda”, but in terms of anything that can satisfy the same relevant set of requirements. After that we can iterate on the solution[1] until we have something that gets enough consensus for implementation, but a priori anything should be fair game for change.


  1. needless to say, such a solution must include a sane migration path from here to there ↩︎

Fair. Individuals have differing opinions. But I don’t think there’s any PyPA policy saying this is out of scope.

Again fair. But I think that “how do people get Python” is part of the question about a unified solution. In much the same way that the “rust experience” isn’t just cargo, it’s also rustup. But just as we can’t assume the SC would be OK with ignoring a chunk of the user base, I don’t think we can assume the SC will accept dropping those ways of getting Python. We can ask, but we can’t assume.

This is where the boundary between packaging (the PyPA) and the SC blurs. I personally think that the SC’s “hands off” approach to packaging puts us in a bad place as soon as we get close to areas where the SC does have authority. Distutils was dropped from the stdlib, and the packaging community had to pick up the slack. We have to address binary compatibility, but we don’t have control over the distribution channels for the interpreter. We provide tools for managing virtual environments, but we don’t control the venv mechanism. We install libraries, but we don’t control the way import hooks work. Etc.

Assuming we don’t want to involve the SC (which is not a foregone conclusion in my view, but doing so would be a much bigger change even than the topic of this discussion), we have to accept that people get Python by means that are out of our control, and declaring such users “out of scope” simply marginalises our impact, and fails to achieve our goals[1].


  1. Or at the bare minimum, my goals :slightly_smiling_face: ↩︎

3 Likes

Fully agreed with that and everything below. I honestly don’t think the problems can be solved comprehensively without language level (read SC) involvement, but in any case, I think manageable changes in the distribution channels should not be off-limits in the context of this discussion.

2 Likes

I have a lot of thoughts to share but I’ll do that separately[1]. Before that, I feel some urgency to say…

Everyone is thinking of different things when they see “unification” in this question – which is why this discussion is all over the place.[2] We ought to start by setting up shared vocabulary+understanding of what we’re even talking about in the context of unification.


I can see the following dimensions/aspects to the unification story:

  1. Unification of PyPI/conda models [3]
  2. Unification of the consumer-facing tooling[4]
  3. Unification of the publisher-facing tooling[5]
  4. Unification of the workflow setups/tooling[6]
  5. Unification/Consistency in the deployment processes[7]
  6. Unification/Consistency in “Python” installation/management experience[8]

Can anyone think of any other dimension/aspect contributing to the “tooling complexity” problem?


  1. Listen, at this point, I’ve been typing for the last 3 hours and I need to eat dinner now. That I moved to VS Code, after spending nearly 2 hours on discuss.python.org’s text editor, is a really good indicator that this isn’t the right medium for what I have written – so, I’ll put it up on my blog (with some polishing to make it readable without this thread). Sorry for the cliffhanger-ish opening sentence. :sweat_smile: ↩︎

  2. This sentence should have started with “I feel that”. I’ve omitted that since it’s more assertive this way. :stuck_out_tongue:
    Also, sorry, but it was a bit of a roller coaster reading the discussion so far. ↩︎

  3. i.e. the non-Python code dependency problem ↩︎

  4. i.e. consuming libraries ↩︎

  5. i.e. publishing libraries ↩︎

  6. i.e. organising files, running tests, linters, etc ↩︎

  7. i.e. going from source code → working application somewhere ↩︎

  8. i.e. the rustup/pyenv aspects of this, which is absolutely a thing that affects users’ “Python Packaging” experience (think pip != python -m pip, or python -m pip vs py -m pip, or python being on PATH but not pip etc) ↩︎

7 Likes

I may have missed how it fits into one of the other categories, but unification of the interface of tools?

By which I mean, common subcommand names, common options and terms (“index” vs “channel”), common configuration files (so that you can set your options in one place and have all tools respect them), etc.

1 Like

I didn’t think of that and it doesn’t fit into any of the existing buckets as-is. And, yea… it is definitely another aspect here:

  1. Unification of similar configuration/info across different tools (similar to what .pypirc does for auth credentials, PEP 621 did for project metadata, .editorconfig does for linters etc).

Not right now for WASI, maybe for WASM if you mean Pyodide. See WebAssembly and its platform targets for an explanation of the differences and what it means for extension modules.

I don’t think we are under the illusion that any of this is going to be fixed quickly. But trying to be positive while we tackle the problem is at least appreciated, because at least for me, if we start out as doom-and-gloom it just isn’t motivating to try and tackle such a hard problem when things continue to function as-is, well or not.

It’s simply a matter of asking.

I’m on the SC for one more term before I step down (5 years is enough :sweat_smile:), so if you want to ask the SC for something packaging-related and want me to help explain it while I’m still on the SC, 2023 is your chance to do that (not that I wouldn’t be happy to provide info to future SCs I’m not on, but it’s obviously easier when I’m already sitting in the meetings).

1 Like

(I maintain a custom packaging tool at work and have made minor contributions to several PyPA projects)

On the point of being positive, just wanted to thank people for work already done and to +1 points already made!

One alternative that I think we should consider is continuing to work on the goal of splitting out the various “components” of packaging into reusable libraries. Projects like installer , build , packaging and resolvelib are good examples of this.

This is a great approach. I maintain a custom packaging tool at work that addresses our specific needs and the libraries you mention have made this increasingly easy.

It may be obvious to some, but I just wanted to highlight this: just unifying tools won’t do any good, we need to do a full UX analysis and then design something that fills the needs of users more naturally.

Agree, the hard thing here is the hard thing. To a first approximation, we did have a unified single “frontend” that did everything: python setup.py <command> and for various reasons that led to the ecosystem not meeting the needs of users. poetry and hatch have won usage by solving users’ problems, which IMO is usually the best way to solve the xkcd: Standards problem

2 Likes

I didn’t want to split hairs about “pessimism”, but it seems the tone of my message came across badly, because it wasn’t pessimism, and certainly wasn’t doom-and-gloom on my part.

While one could say I’m putting the finger in an old wound, I wanted to surface:

… because this is IMO the decision that eventually needs to be made on a language level. Invest resources into fixing some very thorny problems, or explicitly declare them out of scope[1]. I think the world would be able to adapt to either decision, but obviously I’m in favour of doing the effort and achieving some degree of unification (I also don’t doubt that it can be done technologically), so if anything, I’d say I’m verging on optimism. :slight_smile:


  1. Otherwise we continue limping along on volunteer enthusiasm, which – while substantial – is dispersed into various pockets & niches, and is unlikely to organically coalesce around a unified goal by itself. ↩︎

1 Like

I considered distinguishing the various degrees of unification, but just left the target at “maximum unification”, along the lines of the survey comments à la I would blow it all away and replace it with one damn thing – perhaps that was naïve. :sweat_smile:

This is a great list, and will be very helpful I think. I can additionally think of:

  1. Unification on an environment model/workflow (e.g. don’t touch the base python install, put everything into an environment)

I’d also split 3. into “build tooling” (how do I turn this source code into a publishable artefact?) and “publisher-facing tooling” (how do I actually publish that artefact?).

1 Like

We’ve partly “solved” that with Marking Python base environments as “externally managed” I believe (it’s still in need-implementation state, rather than implemented state).

Convincing people that there should be no “base” site-packages (i.e. put everything in a venv, no user-site etc) is… not a fight I wanna pick up myself, even though I do agree it would be nice to have! :slight_smile:

I’m not sure I understand how you mean that solves the standards problem, except perhaps as in reducing the number of standards simply by winning through competition as measured through user numbers?

Because while that may be able to take us from 15 to 4-5 standards, the user base of each of these tools is fairly invested in the respective particularities.

The “last mile” to get to one standard is I think only possible by some centralized action – think how mobile phone manufacturers (and especially Apple) had to be more or less forced[1] to agree on a standard[2].

What I was trying to get at is that right now, a user has to ask themself the questions: “should I create an enviroment?” “do I need to?” “how do I do that?” “is it venv, virtualenv, poetry, conda, …?”

And what I meant by unification of that aspect is that these questions should disappear (by being implicitly “always create an environment, or if you have one[3] already, activate that”).


  1. no-one likes to be forced, but I think the situation would benefit a lot from channeling people’s effort into much fewer projects that can still tackle all desired improvements, but with much less duplication of effort. ↩︎

  2. Sidenote: USB-C is finally coming to the IPhone! ↩︎

  3. speaking of an environment in such a unified paradigm, not one of the currently existing ones ↩︎

I think R is the language we can learn most from, given how much overlap there is in terms of the issues encountered at https://pypackaging-native.github.io/. Notably, all the R users I’ve encountered (plus what is used by https://carpentries.org/) do not use conda (unlike Python, where conda is used by the Carpentries). I suspect this is down to how R and R packages are distributed/installed on the various OSes:

  • On Windows, the R project seems to provide a build service https://win-builder.r-project.org/, which includes Rtools (Rtools42 for Windows, see Rtools - what, why, when? - General - RStudio Community for more context), which provides both a msys2 build environment (i.e. GCC/GFortran plus other GNU build tools) and common non-R libraries that packages may depend on (I’m not familiar enough with the geospacial stack to know how complete the packaged list is, but there’s a few astronomy packages there that are core, and Python has practically won the astronomy ecosystem).
  • On MacOS, the project instructs users to install the mac build tools via xcode-select (which is fairly standard), and provides binaries of the other remaining libraries at https://mac.r-project.org/.
  • On Linux, the choices seem to be to use the distro-provided R ecosystem (and installing on top whatever is missing) where possible, and building from source where that is not an option.
  • On “Unix” (the docs don’t seem to provide any guidance as to which specific OSes are included here, but I’d assume BSDs are in-scope), build from source like Linux.

The majority of R users I know are Windows users (non-astronomers in this case), and I haven’t seen any installation issues there (unlike Python, where I know no-one who has used a non-conda, non-cygwin Python setup successfully).

To me, the biggest takeaway from this is there needs to be an easy way to get the required compilers on Windows (these are pre-loaded on CI systems, but I personally have no idea what’s the correct way to do this on a desktop). There used to be a MSVC installer for Python 2.7, could something like that be created for newer Python versions (or even better, come with the Python installer as an install option). I’m not sure if it’s worth pointing users at the Rtools installer, or trying to create our own Python specific version, but having something which provides a basis on which to build scientific packages from source would I think help with non-conda, non-cygwin Python installs (really, non-we’ve-done-the-integration-work-for-you Python installs, which is what system package installers do).

Caveat on the above: I’m not an R-user, so the above is from helping others install/teach/work with R and from reading the docs, so I’m likely missing out issues and subtleties that someone using R in anger would know.

This is more-or-less what Conda is, but the prevailing opinion (or perhaps just the loudest in this forum?) is that it needs to be torn apart to become more like the current “figure-it-out-yourself” ecosystem.

Worth expanding this survey - perhaps the easiest way is to ask Anaconda, who clearly saw enough value in distributing R packages to start doing it, but will also know why they haven’t seen the same success as they have with Python.

I’m working on this, just as I made the installer for 2.7 happen, but this time I’m trying to work with the compiler team rather than playing chicken :wink: Right now, the easiest way to get the compilers is through CI, which is usually free enough for most open source projects, and supported projects (i.e. anyone backed by a group like NumFOCUS) can easily arrange for more.

<sarcasm begins>Of course, the easier way to do this is to force everyone to switch to the same compiler. If we pick one that we can redistribute on all platforms, it’ll make things even easier for users, as nobody will have to use their system compiler or libraries anymore - they’ll get an entire toolchain as part of CPython that’s only useful for CPython and doesn’t integrate with anyone else’s libraries! (I hope the sarcasm is coming through, but just in case it’s not, this sounds like a massively user-hostile idea. But if you want to try this approach… well… it’s any of the Linux distros or Conda/Nix/etc.)


What I think is really the issue is that we haven’t defined our audiences well, so we keep trying to build solutions that Just Work™ for groups of people who aren’t even trying to do the same thing, let alone using the same workflow or environment/platform. @pradyunsg’s list of possible unifications above is heading in a much more useful direction, IMHO, and the overall move towards smaller, independent libraries sets us up to recombine the functionality in ways that will serve users better, but we still need to properly define who the users are supposed to be.

One concrete example that some of us have been throwing around for years: the difference between an “system integrator” and an “end user”. That is, the person who chooses the set of package versions and assembles an environment, and the person who runs the Python command. They don’t have to be the same person, and they’re pretty clearly totally distinct jobs, but we always seem to act as if end users must act like system integrators (i.e. know how to install compilers, define constraints, resolve conflicts, execute shell commands) whether they want to or not. And then we rile against system integrators who are trying to offer their users a usable experience (e.g. Red Hat, Debian, Anaconda) for not forcing their users to also be system integrators.[1]

But maybe we don’t have to solve that end user problem - maybe we can solve one step further upstream and do things that help the system integrators, and then be even clearer that “normal” users[2] should use someone else’s tools/distro (or we start our own distro, which is generally what system integrators end up doing anyway, because “random binary wheel off the internet” isn’t actually useful to them).

To end with a single, discussion-worthy question, and bearing in mind that we don’t just set the technology but also the culture of Python packaging: should we be trying to make each Python user be their own system integrator, supporting the existing integrators, or become the sole integrator ourselves?


  1. If they want to be, they can go ahead and build stuff from source. But that’s totally optional/totally unavoidable, depending on what problem you need to solve. Still, non-integrator users are generally within their rights to say “I can’t solve that problem without this software, can you provide it for me” to their boss/supplier/etc. ↩︎

  2. Those who just want to run Python, and not also be system integrators ↩︎

4 Likes

I’m glad to hear that you’re working on this as it would be useful. While it is easy to get the compilers in CI my recent experience of trying to build wheels in CI is that you absolutely have to get things working locally before attempting anything in CI. One of the difficulties I had was just locating the right version of MSVC. I didn’t necessarily need a bespoke “MSVC for Python” but just a clear “here is the link to the installer file for the one that you want” would have been very helpful. Ideally it should just be the compilers and not the full visual studio with many gigabytes of stuff that I don’t want (the MSVC installer GUI seems to be almost deliberately designed to make it difficult to do this). If some “unified build tool” could just download the right version of MSVC as needed then that would be amazing…

Another point of difficulty is that in my case the base of the dependency stack is GMP which cannot be compiled correctly with MSVC (this is similar to SciPy needing Fortran compilers etc). Being able to standardise on a single toolchain would be great if there was one that handled all cases. I think I could potentially build dependencies with mingw64 (linking against ucrt) and then use MSVC to build the final Cython-based extension module. I just haven’t managed to get that to work yet though because mixing toolchains like that is not something that the existing tooling makes easy (neither is using mingw64 in general but at least I have that part working now).

The Visual Studio installer is what you want, then select “Python” and then “Python Native Development”. I can’t make it any more straightforward than that, and even when I tell people that’s how to do it, they insist on making their own instructions for various reasons (e.g. by “needing” to use the Build Tool installer, which doesn’t have any explicit Python options because it doesn’t have any explicit Python features).

MSVC is the many gigabytes of what you need, not Visual Studio. Since none of the tooling or system libraries or headers are in the base OS, you’re going to get all of them. It’s not a small ask, and even in a simplified model, it won’t be a small download. Hence why I’d rather get publishers putting out compatible binaries, so that users don’t have to go through this themselves.

I thought this was an interesting approach to install a C compiler more easily / reliably. But I guess it doesn’t work for everything (e.g. C++)?

It is, but it doesn’t work anywhere you have a specific requirement for a specific compiler.

CPython has historically always been about integrating with the underlying system. This is why we don’t mandate a particular compiler (beyond what’s necessary to generate a compatible ABI) - our users will have their needs, and we want to meet them where they are rather than forcing them to rewrite their entire system in order to fit Python into it. This was the point of my sarcastic paragraph above about not forcing the compiler choice.[1]


  1. You could argue that we “force” the choice by what distutils detects, and I’d argue this is what I meant by we define the culture as well as the technology. Even distutils (which is now removed) allowed you to override the compiler, but the defaults often win with users who don’t have a strong preference (yet). Post-distutils, if someone were to create a build backend that defaults to a different compiler and it gains popularity, that compiler will eventually win. All the tooling is there, it’s just up to someone to build it and evangelise it until it wins by popularity. ↩︎