Python Packaging Strategy Discussion - Part 1

steve.dower · January 11, 2023, 12:47pm

The alternative is getting wheels that don’t work, either in obvious ways (fails to import) or subtle ways (crashes at runtime, silent data corruption, etc.).

Trust me, if there was a way to solve all of these problems within the current system, we would have. We’ve been trying, and we’ve gotten so close that a lot of people have the same concern as you do, believing that wheels are sufficient, but the problem is that the remaining gap appears to be uncloseable.

Uncloseable, that is, unless you have a complete stack that has been built in a known, consistent environment. Which either means you built everything on the target machine, or you built everything on another machine that is compatible with the target machine.^[2]

What it excludes is the possibility of building individual parts on different machines and trying to bring them together later on. But this is the social model of PyPI - people publish their own packages - and we don’t want to change that. It’s actually really important for distributors that PyPI exists and operates the way that it does… for sdists.

But the job that distributors are taking on is to provide a coherent set of binaries so that when their users install things, they are going to work. Mixing in binaries built without that coherence is what hurts users.

That is one possible way to make this work. I proposed other ways earlier to achieve the same result.

I think this thread is the place to agree that when a distributor has prebuilt binaries for their build of Python, those should be preferred over unmatched binaries. If we agree that the default workflows should somehow support that, we can do details in new threads.

I substituted “a distro” in the quote, because I know that’s what you mean, but I really want to avoid framing this as “PyPI vs Conda” when it’s fundamentally “individual pieces built separately vs everything built all together”. ↩︎
Initially, only Windows wheels were allowed on PyPI, because it was the only consistent enough environment. There’s been a huge amount of work to define consistent-enough environments for other platforms in order to allow them on, because fundamentally, there’s no way to predict how the build needs to be done when we don’t control the entire stack. ↩︎

rgommers · January 11, 2023, 12:56pm

That’s a nice and concise way of expressing a design principle on how to better support system integrators. It matches what I had in mind with the “new opt-in installer mode” - which is indeed one of a few possible ways to implement that principle.

pf_moore · January 11, 2023, 1:34pm

No, the alternative is that people currently building wheels which work for “most” users, stop building them (because it’s hard to even do an imperfect job) and direct people to get their binaries from a distributor (or build them themselves, but see above re “hard”…)

However, “get your binaries from a distributor”, as has been mentioned a number of times here, tends to include “get your Python from a distributor”. And that means “switch to a different stack”, which for many users is simply not acceptable (especially if they don’t know this is going to happen in advance, which they typically don’t).

If we were talking about distributors/integrators shipping binaries which worked with whatever Python installation the user had, there would be a lot less heat in this discussion. If I could install the Windows Store Python, and then find I needed to get (say) scipy from somewhere other than PyPI, and there was somewhere that shipped binaries that worked with the Windows Store Python, I wouldn’t have the slightest problem (well, the UI of “pip fails, find the right command” needs to be worked out but that’s a detail). But I’ve seen no sign of anyone offering that.

Maybe I’m being pessimistic, and we should assume the existing sitiuation will continue (with most packages having binary wheels on PyPI, and only specialist outliers opting out). But maintainers are human, and volunteers, and having the “Python Packaging Strategy” explicitly normalise the idea of expecting people to get hard to build packages from a distribution seems to me like a huge temptation to just say “sorry, we don’t support binaries from PyPI”.

What I’d like to see at a strategy level is a statement that Python packaging is for everyone, and our goal is to make it as easy as possible for every user, no matter where they got their Python installation from, to have access to all published packages (on PyPI - privately published packages are a different matter). And yes, there will always be exceptions - the point is it’s a goal to aspire to.

steve.dower · January 11, 2023, 2:05pm

That’s because this is the bit that doesn’t work Windows is a poor example here, because the ABI is much more reliable than other platforms (and the nature of DLL Hell on Windows is somewhat more amenable to these problems than on other platforms).

The reason that binaries don’t work with “whatever installation of Python the user has” is because you need to know about the installation in order to produce the binaries.

To fix that, we’d need to define a complete ABI (down to “how many bits in int and which order are they and how does it get written to memory when calling another function”) and then enforce every single library to also use that. And that only fixes it because “whatever installation” now only has a choice of one - the one that perfectly uses our ABI. This approach is never going to work out.

Most of the time on many modern desktop and server machines, the ABI is usually close enough that you can squint and get away with it (“wheels which work for “most” users”). Once you get into the territory where that doesn’t work, you do need to know all the details of the ABI in order to build a compatible binary for that interpreter. The easiest way to do this is to get the interpreter from the same builder you get the binary from, because they’ll have ensured it.

If it helps take some of the heat out of the discussion, I’m not in any way saying that package developers are doing things wrong, or doing a bad job. They’re trying to do a job that is literally impossible, because their own scope is too restricted to be able to do it. And many of them know it, or at least sense it, which is why they’ll feel frustrated. But it’s not their fault, and it’s not on them to solve it themselves. They’re doing an incredible job of making the most of the impossible situation they’re in, and all we want is to make that situation less stressful, firstly by acknowledging that they don’t have to carry the entire burden of solving it (and indeed, they can’t), and then by better connecting those who can solve it with those who need the solution (and who might be the ones applying the pressure to the original developers).

rgommers · January 11, 2023, 2:06pm

No one is offering this because it’s not possible, for very fundamental reasons (ABI, toolchains, etc.). It really is not one of the options. It’s the status quo, or some version of what Steve and I suggested, or something more radically different like actually trying to merge the conda-forge and PyPI approaches into a single new thing.

I think that has been the strategy of the crowd here until now, and it’s not working. I agree with Steve’s assessment that the remaining gap is uncloseable.

To rephrase your terminology, Python usage should be for everyone -with a more unified experience.

steve.dower · January 11, 2023, 2:08pm

It’s worked incredibly well, actually But every strategy has its limits, and this is it.

rgommers · January 11, 2023, 2:27pm

I’d like to make two points in response to this:

I am not worried that there will be a major drop in packages providing wheels, at least popular packages providing wheels for popular platforms. There’s enough maintainers who care, and enough users who care. So if it works today, it is highly unlikely that folks will pull the plug tomorrow (or in 2 or 5 years from now).
There are lots of OS’es, platforms, Python interpreters, etc. for which PyPI allows wheels but no one is building them today anyway. A few examples: PyPy, PowerPC, armv7 (lots of Raspberry Pi users), musllinux (Alpine Linux users, very popular in Docker), etc. Making things better for system integrators will make things a lot better for any of these user groups.

pf_moore · January 11, 2023, 3:22pm

OK. If that’s true, and if the expectation is that there will be little practical change in the availability of binaries on Windows, then I have no problem. I don’t know enough about the other platforms we’re discussing to say anything beyond broad generalisations. So I’m happy to leave the “not-Windows” side of the problem to the experts in those environments.

Quote from @zooba in a footnote^[1].

I substituted “a distro” in the quote, because I know that’s what you mean, but I really want to avoid framing this as “PyPI vs Conda” when it’s fundamentally “individual pieces built separately vs everything built all together”.

The problem is that for Windows users, there simply isn’t another example of a “distro” apart from Conda. So in that context, at least, it genuinely is about “PyPI vs Conda”. I understand, and agree with, the principle of not making this confrontational, but I think we should be very explicit about what options actually exist right now, in a practical sense. I’ll refrain from doing any sort of detailed “PyPI vs Conda” comparison, because that’s what we’re trying to avoid here, but IMO it’s really important to understand that when people with a purely Windows background see “you must use a distro”, they really only have “being forced to use conda” as a mental model to inform their views.

But that’s not the difficult part of the question here. (FWIW, I’d be fine with agreeing to that statement). The difficult question is when a distributor doesn’t have prebuilt binaries, what should happen? For example, a library that the distributor doesn’t package. Or a newer version of a package that the distributor has, but hasn’t upgraded yet. Or an older version of a package (because my script was written for the v1.0 API).

And does the answer change depending on whether the code is pure Python or not? Or C with no dependencies? Where do we draw the line?

As an aside, it’s really hard to quote footnote text in Discourse… ↩︎

petersuter · January 11, 2023, 3:30pm

I’m mostly care about Windows. What is a distro on Windows?

The only problem I run into regularly is missing wheels. If they exist they work; maybe I’m lucky, and/or relying on heroic efforts of the people providing them.

Or maybe that explains it.

Does that mean “of course on Windows everything will keep working as it does now” is silently omitted / implied?

sinoroc · January 11, 2023, 3:30pm

What should I think of when I read “system package manager” here? Is it something that already exists (if yes, what are examples) or something that still needs to be created? Are we talking about things like apt on Debian, winget on Windows, homebrew on Mac?

fungi · January 11, 2023, 3:55pm

Perhaps part of the problem with the perception that the ABI is
“more compatible” on Windows is that there aren’t hundreds of
different companies and communities providing their own versions of
Windows. There is one: Microsoft.

The ABI incompatibility across different Linux distributions and
different UNIX derivatives is only an incompatibility if you think
of, say, “Linux” as an operating system. Debian GNU/Linux is an
operating system, Red Hat Enterprise Linux is another operating
system, maybe sometimes you can run the same binaries on versions of
both, but expecting them to be “consistent” is like expecting to run
Mac OSX/Darwin binaries on Windows. If anything, Linux distributions
have a more compatible ABI with each other than Windows does with
any other operating system (emulation layers aside, of course).

steve.dower · January 11, 2023, 3:57pm

My hope is that the change will be greater availability on Windows, because we’ll solve the common-native-dependency problem (e.g. making sure two separate packages can agree on a single copy of libpng, or libblas, or whatever they can’t agree on), and/or because we’ll have more people involved in actually building stuff who can help support the upstream projects too (like Christoph Gohlke did).

But yeah, I doubt everyone who’s invested massive effort into getting their stuff to build will suddenly drop what they’re doing. The main problem is initial availability (getting it to build in the first place) rather than it breaking randomly once it’s (successfully) installed.

ActiveState, WinPython, Python(xy) (maybe still) are fairly siginificant distros. Arguably Cygwin too. Blender distributes their own build of CPython as part of their app (as do some other apps), and may benefit from deliberately compatible package builds.

But yeah, they’re not as prominent as Anaconda, who more-or-less started specifically to solve this problem on Windows. I’d love to see more distros available on Windows, but then the challenge with that kind of messaging is that basically nobody wants to see more distributors for Linux It’s tough to walk the line.

The way this currently works for me is that I go to the distributor, remind them about all the money we paid them, and ask them politely to add the package Failing that (and tbh, it hasn’t failed yet), we build from source, using the distributor’s environment as the target, so that everything is compatible with it.

The key part is using the distributor’s environment. Right now, PyPI implicitly requires builders to use the python.org downloads as the base environment - it’s no good building against Cygwin’s Python and publishing that to PyPI, because it won’t work when someone installs it. But it would be totally fine to set up a separate index specifically for Cygwin and require all packages published there to use that as their target.^[1]

I don’t think the answer actually changes based on native/pure packages at all, because a distributor’s repository is valuable for more reasons than “is it a wheel or an sdist”. But I agree with your point, that users will reach for a thing that seems to work no matter the source, and won’t buy the “if you want support you need to get it through the supported channels” answer (and we don’t have to sell them that line either - distributors need to self-promote). Same deal as if you download any other app for free - you get what you get.

But any way we make it easier for distributors to bring in new versions of packages makes it easier for their users to get the newer versions, and they’ll be more likely to ask for them in future, and the whole process gets better for everyone.

Definitely not. winget is an installer, not a package repository.

Platform tags are basically a way of doing this within a single index. ↩︎

rgommers · January 11, 2023, 4:21pm

This is getting into the details, but if you say pkg>1.0 then it’s probably a lot better to get the distributor’s 1.1 binary rather than build from source on 1.2. And building from source by default is something we should disable anyway, and only let the user opt into. But that’s for another thread.

I agree that it will likely improve things on Windows. Even for regular MSVC/python.org builds, we have to think about 3 architectures:

x86-64: the easy case, everyone has wheels for this
x86 (32-bit): this is already a problem today, effective the scientific stack (SciPy, scikit-learn, and anything that depends on those packages) has already dropped support for it. Not because we don’t care, but because we don’t have a compiler toolchain that works in CI (Fortran is the issue). When users come and ask, we tell them “sorry, please upgrade to 64-bit Windows. you could build from source but we strongly suggest you don’t even try - it’s too hard”. On the other hand, one person ala Christoph Gohlke or an organization could fairly easily build these packages with MSVC + Intel Fortran and host a set of wheels.
Windows on Arm: there are no wheels. For initial adoption it would make total sense for Microsoft or a third party to provide an MSVC/python.org compatible set of wheels for the most popular packages.

Maybe distributor is the better word. Homebrew, any Linux distro, Conda, Spack, ActiveState Python, Nix, etc. Please see Build & package management concepts and terminology - pypackaging-native.

jvolkman · January 11, 2023, 6:21pm

In this world is there still any benefit to PEP 517 at all? Once the build isn’t happening automatically, I as the package user have to go do something manually and that may as well be running meson or cargo or whatever the package README says.

Are there any specific examples that you know of where this happens? Or notable past discussions? Not doubting that this is an issue, but having not hit these types of problems with the Python ecosystem in the past, it’s difficult to relate. Granted I’ve generally use mainstream OSs and architectures, and don’t heavily use the scientific side of the ecosystem which seems to cause the most problems.

oscarbenjamin · January 11, 2023, 6:32pm

I’m not sure precisely what you are referring to that will solve the common-native-dependency problem. From the rest of your message it seems like you’re suggesting that this problem will be solved by users using Python from a “distributor” rather than the python.org Python. The impression I had of Paul’s comment is that it is specifically about wanting binary PyPI packages for install into a python.org Python on Windows.

As an example of the common-native-dependency problem I have been working on providing wheels for a project called python_flint which like another project gmpy2 depends on the C libraries gmp and mpfr. Currently both projects need to have all of the code to build both dependencies on all platforms. Both projects needs to patch GMP for OSX arm64. Both projects need to bundle name-mangled DLLs. Both need to wrestle with toolchains on Windows etc.

The maintainer of gmpy2 suggested making a project that just ships gmp and mpfr as DLLs. I don’t know if that idea can be made to work easily but if it were possible to have one project on PyPI whose sole purpose was to provide those C libraries and then to be able to build downstream projects against that then it would be a saving in terms of overall development work, CI resources, disk space, memory usage etc. Those are the benefits of sharing dependencies and that’s how conda does it.

Making that work though requires acknowledging the possibility of PyPI hosting wheels that are purely there to provide non-Python dependencies. Is that not how to solve the common native dependency problem? With that can we not think of python.org Python for Windows along with the ABI compatible PyPI wheels as a “distribution”?

pradyunsg · January 11, 2023, 7:29pm

[nvm, this wasn’t supposed to be a reply to Paul]

pradyunsg · January 11, 2023, 7:30pm

@rgommers On the point of “make metadata separately accessible from wheels” / “implement post-release metadata editing capabilities for PyPI”, how do you imagine this actually operating?

You’ve referenced the PEP about serving distribution metadata from pypi.org/simple/ for package installers, but notice that PEP uses the language (emphasis mine):

repository MUST serve the distribution’s Core Metadata file alongside the distribution

In other words, the mechanism as designed operate on the assumption that a file as-uploaded by the end user won’t be modified and will be served as-is. Having some sort of different metadata here would mean that it is different from metadata within the distribution file itself which can lead to all sorts of confusing behaviour differences.

Further, the expectation that nothing is modifying the distributions after the upload (as uploaded by the package author) is fundamental to any sort of artifact signing/protection as well (eg: PEP 458 – Secure PyPI downloads with signed repository metadata | peps.python.org).

rgommers · January 11, 2023, 8:38pm

Yes there are, lots - little difference from today. This change is a lot less invasive than it sounds. You can get your from source build back with a simple config setting or standard installer flag. This is probably not the best thread to go into detail, but see Unsuspecting users getting failing from-source builds - pypackaging-native for context and Speculative: --only-binary by default? · Issue #9140 · pypa/pip · GitHub for agreement in principle from the Pip team.

One way could be to have a separate “metadata patches” source of truth, that’s how conda-forge does it (see conda-forge’s guidelines on fixing broken packages and PRs at conda-forge-repodata-patches-feedstock. I’ll also note that these patches are time-bound. So they allow cheaply fixing install issues to help your users, but after you’ve done your next regular release cycles and the disruption has passed, the patch can be removed or expire automatically.

I have to admit that I know a lot less about this part of how PyPI, conda-forge and other repositories work than about build-related stuff. So there may be other/better ways. But it doesn’t seem impossible or in conflict with what you’re saying about artifact signing, if there’s a separate set of patches.

rgommers · January 11, 2023, 8:51pm

This was forbidden until very recently by auditwheel. It finally became possible very recently, but it should still be used with extreme reluctance.

You can do that hosting. And no, it’s not the way to solve things. The problem is not technical, it’s social. Once you start doing things like this at scale, you need a build farm and a common team / decision making capability. For your problem, say you go ahead with making these wheels. And then a couple more projects start relying on these gmp and mpfr wheels. Now you, caring about your two projects and being the owner of the PyPI names here, want to upgrade. So you do. And break those other users in the process. PyPI’s social model simply has no way to deal with the required coordination.

I’ll also note that we will probably end up doing exactly what you suggest here for OpenBLAS, needed by both NumPy and SciPy. But we’re going to be very explicit that it’s just to solve a specific problem for NumPy and SciPy, and anyone else relying on our OpenBLAS wheels is doing so at their own risk.

steve.dower · January 11, 2023, 8:53pm

Yeah, totally. As someone who builds libraries for an internal distro, I love that I already know what command to run to get a “default” build. I’d love to have something similar for tests as well. We’ve heard the same thing from distros, too. But it’s more about making the distribution builder’s lives easier so that they can make user lives easier, rather than going direct to users. (Indeed, if “normal” users have to care about PEP 517, you could argue that it failed )

If you manage to find the early manylinux discussions (possibly it was captured in the first PEP?), you’ll find plenty of examples. Depending on packages for which an ABI matters - pypackaging-native also covers a lot of the concerns, and since its first two examples are libc and libstdc++ (also their equivalents on Windows), I think it’s fairly clear that it isn’t just a scientific computing problem.

I see Ralf just replied with exactly what I was going to say and more, so +1 to that reply.