Hi all, I’m happy to be able to share this with you: https://pypackaging-native.github.io/. It’s a new website which attempts tot comprehensively describe the key problems scientific, data science, ML/AI and other native-code-using projects & authors have with PyPI, wheels and Python packaging. It is meant to serve as an up-to-date reference, which will hopefully help facilitate discussions around potential solutions and design changes in Python packaging related to any of these topics.
On purpose I have not included more than hints at solutions (at the bottom of pages on key issues). Some things are likely going to stay the way they are; for some others a solution or mitigation would be large efforts or PEP-worthy topics. I have some ideas of course (and I’m sure others do as well), but thought it’d be better to first share the current content and listen to feedback, see what other topics anyone would like to add, etc.
In this thread I pre-announced the release of this site, and there was quite a bit of relevant discussion already. It was a massive thread though, so I thought I’d start a more focused one here.
Last but not least: a big thank you to everyone who already provided feedback on their particular pain points and on content on the site. And in particular to @h-vetinari for a thorough review and adding a good amount of content over the past week.
This is awesome. I’m still working my way through the document, so I don’t have much to say yet (in fact, I doubt I’ll have much useful to say even once I’m finished - personally, I’m squarely in the “end user who just wants to install things and write programs” camp). But one particular thing did strike me.
You speak a lot about managing native dependencies with the system package manager. As a Windows user, I don’t tend to think of Windows as having a “system package manager”. So I’m not sure where that leaves Windows users in this context. Quoting from the document:
(A) System package managers:
Single version of each package (so may not need a solver for dependency resolution)
All libraries needed for a functioning system (sometimes modulo libc)
Multi-language
Single-platform (often, not always)
Examples: Linux package managers (APT, RPM, Pacman, YUM, etc.), Homebrew, Nix, Guix, Chocolatey
Of these, I believe only Chocolatey is for Windows, and I don’t believe it has any form of library management - it simply consumes existing Windows installers, which inherit the overall lack of any OS management of shared libraries.
I don’t expect answers here - the document is setting out the problems, not proposing solutions. But I do think that it would be worth pointing out that the idea of a “system package manager” is not a well-known concept for the majority of Windows users, at least.
Good point, thanks Paul. I think I put Chocolatey in the wrong place (I’ll update that soon). I was going by the little experience I have with it - it does provide compilers like Mingw-w64 and MSVC runtime redistributables, and we use it in SciPy CI. But looking at it more, yes those are self-contained.
For Windows, the two most mature options are Conda and using WSL (where you can use apt, or anything else that runs on Ubuntu). Spack Windows support is in development.
But I do think that it would be worth pointing out that the idea of a “system package manager” is not a well-known concept for the majority of Windows users, at least.
On macOS there is also not a builtin system package manager; more users are familiar with Homebrew only because it provides a lot. The “system” part doesn’t mean “has to be provided by the OS”.
That said, yes the solutions for native Windows support at the moment are sparse - I believe it’s only Conda right now. Conan and vcpkg probably also have some capabilities that could put them in the system package manager bucket, but they’re focused on C/C++ and will be pretty foreign for Python users. That said, anything to do with native code tends to be pretty foreign to our average Python on Windows user.
Thanks. One other distinction that might be worth making is whether the (system) package manager integrates with tools not supplied by the package manager. But maybe that’s my Windows background speaking - I don’t know enough about “system package managers”. Do they by definition expect to be working with a Python that they installed? If so, then the other big-picture question that should be mentioned is “what do we do about users who don’t install their Python with a system package manager”? That likely includes python.org installers, Windows Store Python, pyenv, people who build their own Python, and maybe others. Which is a non-trivial sector of Python users.
Anyhow, I said I wouldn’t comment more before finishing reading the document, so I’ll stop here for now.
Edit: I should have waited. The ABI section has some discussion of this.
Thanks for this Ralf. It gives an excellent summary of many of the problems.
Having recently been working on making wheels for a project that does not have them I have found it to be much more work than it ideally should be:
That project is much simpler I think than many others. It just builds 4 standalone shared libraries and then a Cython extension module that uses them. By far the most time consuming part has been making Windows wheels to bundle up C dependencies that cannot be built (correctly) with MSVC. The cibuildwheel etc infrastructure is good but getting it to use mingw-w64 in CI has been very difficult and I still don’t have it correct (I think it’s incorrectly using the chocolatey mingw that is already in the GA Windows build and linking against msvcrt.dll) although for now the wheels seem to work. Even when I get it working I’ll have to rewrite everything to not use distutils by the time Python 3.12 comes around and it’s not clear what the replacement should be.
It seems like anyone who embarks on the process of attempting to provide wheels has to go and relearn the same lessons that every other project had to learn. This includes a load of stuff for e.g. for various OS that I don’t normally use or routinely have access to. As you say every project needs to have someone spend the time to become a packaging expert which is a serious drain on resources.
For my part I would be a lot happier if there was just a guide that clearly explained one sane way of getting this to work but explained it in full detail. Looking at the build/CI scripts for various projects I see that they all had to learn the same lessons but those are not clearly written down anywhere. I could document what I found but I haven’t yet found a complete solution and I am not really expert enough to be able to say exactly what is best or why, just that so far what I have seems to work “okayish” for now.
First off, this is awesome. Thank you for doing this! I’ve made my way through most of the site and it’s really well written overall (there’s a few things here and there, but I’ll take those to the issue tracker).
Also note that it’s recommended to upload wheels even for projects that are pure Python, because installs are faster (metadata in a wheel is static, no need to run `setup.py) - TODO: find reference
This looks really useful! I don’t have much time to contribute, but if you think it’s helpful, I can link to the most common issues users have with Poetry and its interactions with the wider ecosystem. Some are further delving into the implications of the issues already outlined (GPUs, the inaccessibility of metadata without fetching distfiles, the implications of dynamic metadata), and some are novel/specific to Poetry (e.g. the global package namespace).
https://pythonwheels.com also has a nice succient listing of the advantages of wheels. I often time send people there when I want to explain this topic to them:
Faster installation for pure Python and native C extension packages.
Avoids arbitrary code execution for installation. (Avoids setup.py)
Installation of a C extension does not require a compiler on Linux, Windows or macOS.
Allows better caching for testing and continuous integration.
Creates .pyc files as part of installation to ensure they match the Python interpreter used.
More consistent installs across platforms and machines.
There is some discussion on that page indeed. More could be added elsewhere. I’d say in general package managers are, with a few exceptions, not aware of other tools/libraries/applications outside of their own field of view. It’s usually implicit, like a statement in the docs that things expect to work with the OS libc. One example where there’s explicit knowledge of external tools is how Spack treats compilers - they come from outside its own package repository, but are then explicitly modelled and kept track of.
That would be super useful @neersighted, thank you.
That is great, I opened an issue to keep track of this and will link to that within the next couple of days. Thanks Pradyun!