Python Packaging Strategy Discussion - Part 1

steve.dower · January 9, 2023, 9:12pm

I doubt a “default” cargo invocation gets it all right, so you have some Python code in the build backend to pass the right flags in.

Does hatch know where to get the file from? And if it’s the mediator from the frontend, how does it know to trigger the cargo invocation? Again, it has some Python code in the backend to get this right.

The backend must know that it’s building an extension module and how to include it in the wheel. If it uses some other tool to do the build, that’s great, but it still has to know how to invoke it (which could be trivial like conda or complex like setuptools).

dholth · January 9, 2023, 9:15pm

What I would expect a Pythoneer to ask for would be a way for a PEP 517 thing to accept a list of source files and produce an extension module - a replacement for setuptools’ Extension class; but a lot of tools don’t work that way.

steve.dower · January 9, 2023, 9:20pm

Sure, but pymsbuild works that way (there’s an example right near the top of the page), so if you choose that as your PEP 517 backend then it works.

And it’s not the only one, nor does it have to be, but it works. PEP 517 isn’t supposed to replace all other configuration files (nor is 621), so backends can still require you to use their own file to specify it. All PEP 517 means is that frontends can query the sources themselves to figure out what command to run. That’s the point.

ofek · January 9, 2023, 9:56pm

The right paradigm I think is this: GitHub - ofek/extensionlib: The toolkit for building extension modules

So there would be a specific builder for each thing that can compile e.g. CMake, Meson, Rust/maturin, Cython, etc. The build backend would then trigger each configured extension module builder (config section/table would be standardized) and the builder would just implement methods for inputs and outputs which would be shipped in the source distribution and wheel, respectively.

mattip · January 9, 2023, 10:25pm

There may be additional steps required to make a binary package (wheel, conda, …?) that is suitable to be deployed to a distribution network (PyPI, ananconda.org, …?). For instance, linux wheels uploaded to PyPI must comply with the manylinux standard, macos wheels need to be sure they adhere to the MACOS_DEPLOYMENT_TARGET. The various licenses must be collected, and packages that are the basis for other packages need to be able to find headers and support libraries.

rgommers · January 9, 2023, 10:33pm

A library like that would be quite useful. either to depend on or as a reference. It’d be great not to have to reinvent some of those wheels. Writing a build backend is still more work now than it should be.

For the rest: it all seems a little bit misguided. You seem to be under the impression that it’s a matter of finding some compiler, pointing it at a few source files, and producing a .so/.dll. In reality, dealing with compiled code is (at least) an order of magnitude more complex than anything that any pure Python build backend does. That’s why it can’t be just an afterthought - the details actually matter.

I’m fairly sure it’s not. It may be a reasonable way of addressing the use case of “user uses my pure Python backend already, and now wants to add a single Cython file - how do I help them out?”. But that’s about it. You have pretty much misunderstood PEP 517’s intent (no one had anything like this in mind - the point is to have multiple build backends), and as @dholth pointed out, many tools won’t work like that (out of the ones you listed, probably only Cython does). I suspect you’ll find out as soon as you try to implement a plugin like that on a non-trivial project. There’s many more issues, from UX ones (which of hatchling and the plugged-in build system uses which --config-settings?) to more fundamental ones like passing source files being wrong in general (tools need their own config files, where sources are listed) and editable installs for an out-of-tree build tool needing unified Python/native-code support. And then we haven’t even started on non-PyPI dependencies, code generation, cross-compilation, etc.

If you want Hatch to be general then it should work with multiple build backends. For this discussion, that’s probably the key thing. Making build backends easier to write by providing a library is great too.

pradyunsg · January 9, 2023, 11:30pm

I think I should point out that GitHub - scikit-build/scikit-build-core: A next generation Python CMake adaptor and Python API for plugins has the following in the README:

Other backends are also planned:

Setuptools integration highly experimental

The extensionlib integration is missing

No hatchling plugin yet

ofek · January 10, 2023, 1:28am

I don’t think I have

My interpretation is that the point is to have multiple build backends with full functionality. If other build backends don’t have a standardized way to build extension modules then I think we have failed.

brettcannon · January 10, 2023, 1:32am

Some of us are working on the concept of a lock file, albeit slowly (the last attempt got rejected just shy of a year ago).

Sure, if you view seeing if we can agree on what a single front-end UX should like.

That’s getting ahead of the conversion. We have to first see if the topic of this thread ever reaches a consensus/conclusion to even communicate out.

If we can reach that level of recommendation, I agree.

Yes?

I think once we have those recommendations and intentions, we should figure out how to communicate those things out (even if it’s as simple as we all blog and post about in as many places as possible). But honestly, up to this point a lot of the work has been on stuff the vast majority of folks don’t care about (e.g. who cares about the fact that the simple repository API now has a JSON representation?). The move to pyproject.toml was probably the biggest shift in general UX in a while and that’s only two years old since the PEP 621 was accepted and I would argue a year since enough tooling was around to actually suggest people use it.

Otherwise I think the only unified message we have going is, “we are working on putting stuff behind standards instead of convention”.

steve.dower · January 10, 2023, 2:06pm

There’s no point having multiple backends if they have exactly the same functionality, we’d be better off building exactly one tool with all of it. And yes, this is what a lot of people want, but we’ve already figured out isn’t feasible given the range of different functionality people actually need - distutils2 was supposed to be the “does it all” tool.

The approach we’ve been doing for a few years now is to have multiple backends with different functionality but a consistent interface so that the user can invoke them all^[1] without having to learn how to get or use each one.

For basic scenarios, in this case, converting a source repository/directory into an sdist and wheels. ↩︎

ofek · January 10, 2023, 2:46pm

I’m talking about standards allowing full functionality which is different than just multiple carbon copies as you’re talking about.

That is precisely what I am advocating for here with extension modules.

I would agree it is infeasible for a single extension module builder to support all these cases which is why my proposal (and the one Henry is in favor of and will help with) simply provides the interface for such builders and assumes that there will be multiple.

I’m actually quite confused as to why many here are saying that this is an intractable problem. It is possible I am just not elucidating this idea adequately, which I apologize for if so.

steve.dower · January 10, 2023, 3:03pm

I’m also quite confused why you think PEP 517 hasn’t already solved the problem. What else does the interface require besides what that already provides?

henryiii · January 10, 2023, 4:40pm

(Maintainer of: PyPA: build, cibuildwheel; Scikit-build: scikit-build-core, scikit-build, cmake, ninja, a few others; also pybind11 and it’s examples, bunch of Scikit-HEP stuff, also plumbum, CLI11 (C++), and other stuff, also frequent contributor to nox, also some conda forge and homebrew recipes).

First point: I don’t think the current situation is terrible - I think it’s a great step forward from the past setuptools/distutils monopoly, especially for compiled backends^[1]. Making extension modules with setuptools was/is really painful, and requires up to thousands of lines (14K in numpy distutlis, IIRC) to work, and is very hard to maintain. Setuptools/distutils supports extension builds more from necessity and its original use building CPython, not because it was designed to build arbitrary^[2] user extensions originally. We are just now starting to see good options for extension building backends built for PEP 517 (scikit-build-core & meson-python are recent additions that wrap two of the most popular existing build tools, cmake and meson). I don’t think finally seeing multiple usable options for build backends is bad!

On unification: I think unifying interfaces and providing small, modular libraries to help in that goal is a fantastic step forward. Certainly, in the compiled space, many/most users will want a build system like CMake or Meson - building a compiled extension from scratch is really hard, and not something I think we want to compete on. Reusing the years of work and thousands of pre-existing library integrations is valuable. I’d love to see more helper libraries though - the public API for wheel would be really useful, for example. packaging and pyproject-metadata are great; I’d like to see a bit more of this sort of thing, it would make building custom backends easier. I’d also love to see more usage unification; config-settings in pip matching build for example (at least for -C and lists, --config-setting vs. --config-settings unification might be too far gone).

On extensionlib: In my opinion, this must be an “extensions” PEP. I want both meson-python and scikit-build-core to work as PEP 517 builders first, so we have a good idea of everything required to make an “extensions” PEP. I also think we ideally should have a proof of concept (in extensionlib or as a hatch plugin) of the idea. Also for some projects, a native PEP 517 builder will probably remain ideal even after this. If your code is mostly (or in some cases, entirely) a compiled extension/library/app, then it likely would be best to just use the PEP 517 backend provided by your tool of choice. However, if you do have a mixed project, especially one that mixes compiled extensions (Rust compiled with cargo and C++ compiled with cmake or meson, for example), then being able to use these tools per extension would be highly valuable. It also allows the author to take advantage of things like Hatch’s pretty readme plugin or vcs plugins, etc. Source file collection is not unified, so it someone already knowns hatchling, reusing hatching and just adding a compiled extension via the extensions system would be nice. The key issue is handling config-settings - this would probably be the bulk of the PEP; for the toml settings, this is pretty easy, but we’d need a good way to pass through extension settings. You’d not pass in a list of files; you’d get out a list of produced artifacts and maybe a list of consumed files (for SDists). Things like cross-compiles are handled by the extension backend; it’s no different than cross-compiling as it is today. Another important thing to handle is get_requires_for_build_*, which is very important for compiled extension building, as they often have command-line app requirements that optionally can be pulled from PyPI.

On conda vs. PyPI: I think both approaches have merits, and I don’t think one should be jettisoned in favor of the other, but we should do what we can to help these work together, and maybe learn from each other. Giving the library author the ability to produce their own wheels has benefits, such as better control over your library, and rapid releases - sometimes conda releases get stuck for a while waiting for someone. Providing good tools to do it (like cibuildwheel & CI) has been huge, and I think the situation is better than Conda’s layers of tooling that makes tooling that injects tooling that duplicates tooling into tens of thousands of repositories. This has been patched so many times that it’s really hard to fix things that are clearly broken, like CMAKE_GENERATOR, which is set to “Makefiles” even if make is not installed and Ninja is, etc. Also, I spent several days trying to get the size of a clang-format install under some amount (500 MB, I think?) so it could be run with pre-commit.ci’s limits - and then I found the other pybind11 maintainers had deleted conda a year or two ago and had no intention of reinstalling it. Then someone produced a scikit-build/cibuildwheel binary for clang-tidy for PyPI - it was 2 MB and installed & ran pretty much instantly, and didn’t require conda preinstalled. The CMake file was less than a page, and the CI file was less than a page. Also, due to the custom compiler toolchain, if a user wants to build compile something locally, conda’s a mess. We get a pretty regular stream of users opening issues on pybind11 just because they are using the conda Python and don’t know why they can’t compile their own code. Conda’s designed to be pre-build via conda-build, and not build on the user’s system via standard tools. On the flip side, Conda can package things that can’t be done as wheels (at least as easily), it can handle shared libraries without name mangling, and it has a uniform compiling environment (mostly). And the central nature does allow central maintainers to help out with recipes a bit more easily. (Though, I should mention that many of the “thousands” of maintainers are really just the original package submitters, just like PyPI).

Even for non-compiled backends, we wouldn’t have things like hatchling if the playing field hadn’t been opened up to multiple backends so the best could win out. And there’s a clear use case for flit-core, too, for building things that hatching itself depends on, for example. ↩︎
It was “able” to because it had to be - there was no way to compete, but wasn’t intended to be full featured. Things like selecting a C++ standard are missing. ↩︎

rgommers · January 10, 2023, 6:19pm

Thanks for clarifying a few things @henryiii. Regarding this particular point, I suspect it’ll be pretty niche - only a handful of users probably. You’d have at least two more solid options that avoid mixing multiple build systems together. Build the Rust and C++ parts as separate wheels (one with Maturin, one with scikit-build-core/meson-python). Or you can just use Meson for everything, it supports Rust too.

Your main use case / audience for this is probably still “was pure Python, now wants to add a little bit of Cython, don’t want to move build systems”. Either way, it’d be good to see a prototype at some point. A PEP feels quite premature at this point, you can just build it if you want and find some early adopters.

pf_moore · January 10, 2023, 7:18pm

(Mostly off-topic bit of history here, intended for context rather than contributing anything concrete to the discussion. Also, this is from my memory of events, so I may be misremembering things - if the details matter, please check the mailing list history directly).

That’s not actually true. Distutils was originally developed specifically to replace the various (non-portable) custom makefiles and build scripts that were previously used to build C extensions for Python. Being able to build and install pure Python libraries in a standard way was a side benefit, but I suspect that if compiling C hadn’t been involved, people would have been pretty happy with "just put your code on sys.path for a lot longer.

In fact, distutils wasn’t used by Python itself to build core C extensions initially - that was added later, I think because it seemed silly to have an extension-building library and still build the stdlib extensions by hand.

In addition, distutils was developed at a point when most people did build for themselves from sources. That probably alleviated a lot of the complications we have now, as the same build stack gets used for everything (and distutils handled the basic details of how to find the compatible C compiler, and pass it the right settings). That got us quite a long way, but when we add publishing binary builds, and far more complex C extensions, we now start to see the cracks showing

oscarbenjamin · January 10, 2023, 11:35pm

I wouldn’t say that PEP 517 solved “the problem”. It is a key enabler of future solutions to many problems by removing the necessity for project authors to use setuptools just so that users/downstream can install/build from source. PEP 517 makes it possible to use alternatives to setuptools but doesn’t actually provide those alternatives and does not directly solve any of the problems that were difficult to solve while still using setuptools. The backend side of the PEP 517 interface was deliberately left as a Wild West at the time but that doesn’t mean that there isn’t any potential benefit from future standards and interoperability in the things that backends do.

I think it’s important to recognise the limited (although not unimportant!) nature of the problem that PEP 517 did solve. It specifically concerns the way that a tool “like pip” will interact with source packages in a future where projects might use build systems that pip doesn’t know about. While that is crucial for projects on PyPI and elsewhere to be able to use different build systems it also doesn’t really address the other contexts where we might want to do different things especially on the development (rather than distribution) side or perhaps on the more “manual” rather than “automatic” interaction with source code.

The premise in many comments above seems to be that the PEP 517 interface makes it possible to have a unified frontend that is completely agnostic about backends. The purpose of PEP 517 was precisely to enable backend-agnostic frontends but specifically for automated consumers of source code. When I imagine my ideal frontend for development use it absolutely needs to have better knowledge of what’s going on in the backend than PEP 517 affords. I would probably want it to understand the relationship between my extension modules and source code, to make something like editable installs, to have some support for managing C dependencies, choosing between different toolchains and so on. I can see why that’s all out of scope for PEP 517 but many of the problems to be solved are still there.

steve.dower · January 11, 2023, 12:15am

Agreed, but I don’t think there’s any benefit from making this a unified frontend. Once you’re in this level of development mode, it’s totally fine to use the backend directly (AFAIK, all the major/active ones have their own interfaces).

They’re all going to have their own configuration formats, or even just their own quirks, which means you can’t develop the project without knowing about your particular backend. Trying to optimise this away feels like an unnecessary unification project.

I’m somewhat more sympathetic to the “I had a complex pure-Python project already defined in a backend that can’t do native modules and I don’t want to rewrite it into another backend just to add a single native module”, but I’m not convinced it outweighs the ability of backends to innovate in this space.

Basically, I think unifying the definition of builds is a distraction and we shouldn’t invest in that yet. Let’s flesh out the functionality that’s actually needed in a range of backends, then let usage gravitate towards the “best” option and eventually that one will expand to handle all the things that matter.^[1] Trying to design that interface preemptively really isn’t possible yet.

I’m aware as I say this that it means we’ll likely converge to a thin wrapper around an existing tool, and I’d personally bet on CMake. ↩︎

rgommers · January 11, 2023, 9:53am

I agree with both Oscar’s comment and Steve’s reply. With the minor note that I don’t think build system usage will ever converge. It’s conceivable build backends do though, since they’re a pretty thin layer in between a couple of pyproject.toml hooks and invoking the actual build system. So it’s not inconceivable that, for example, scikit-build-core and meson-python would merge in the future and have a configuration option for whether to use CMake or Meson.

Overall we’re in decent shape here - there’s work to do on build backends and build systems, but nothing in the overall Python packaging design for that is currently blocking or in clear need of changes.

Ralf Gommers:

steve.dower:

To end with a single, discussion-worthy question, and bearing in mind that we don’t just set the technology but also the culture of Python packaging: should we be trying to make each Python user be their own system integrator, supporting the existing integrators, or become the sole integrator ourselves?

I like @pf_moore’s answer a lot. The “want to be their own integrator” users are important (and over-represented on this forum), and that should continue to be supported. However, the average user doesn’t want to do this, they want things to work well without too much trouble. So I’d also go for supporting the existing integrators better.

Like @pradyunsg, I also have more thoughts than fit in a Discourse post - will go write a blog post too:)

Getting back to the big picture strategy discussion, here is that blog post: Python packaging & workflows - where to next? | Labs. It’s an attempt at a comprehensive set of design choices and changes to make yes/no. There’s a long version, and a short version with only the key points. I’ll post that short version below.

The most important design changes for Python packaging to address native code
issues are:

Allow declaring external dependencies in a complete enough fashion in
pyproject.toml: compilers, external libraries, virtual dependencies.
Split the three purposes of PyPI: installers (Pip in particular) must not
install from source by default, and must avoid mixing from source and binary
features in general. PyPI itself should allow uploading sdist’s for
redistribution only rather than intended for direct end user use.
Implement a new mode for installers: only pure Python (or -any) packages
from PyPI, and everything else from a system package manager.
To enable both (1) and (3): name mapping from canonical PyPI names to other names.
Implement post-release metadata editing capabilities for PyPI.

Equally important, here are the non-changes and assumptions:

Users are not, and don’t want to become, system integrators,
One way of building packages for everything for all Python users is not feasible,
Major backwards compatibility breaks will be too painful and hard to pull
off, and hence should be avoided,
Don’t add GPU or SIMD wheel tags,
Accept that some of the hardest cases (complex C++ dependencies, hairy native
dependencies like in the geospatial stack) are not a good fit for PyPI’s
social model and require a package manager which builds everything in a
coherent fashion,
No conda-abi wheels on PyPI, or any other such mixed model.

On the topic of what needs to be unified:

Aim for uniform concepts (e.g., build backend, environment manager, installer) and a multitude of implementations,
Align the UX between implementations of the same concept to the extent possible,
Build a single layered workflow tool on top (ala Cargo) that,
- allows dropping down into the underlying tools as needed,
- is independent of any of the tools, including what package and environment
  managers to use. Importantly, it should handle working with wheels, conda
  packages, and other packaging formats/systems that provide the needed
  concepts and tools.

petersuter · January 11, 2023, 12:00pm

The discussions about “integrator” always seem a bit vague and make me worry it means in the future there will be even fewer binary wheels on PyPI, and you will be forced to use Conda to use pytorch etc. Is that really what it means? That sounds very undesirable to me.

Users that are happy with PyPI as-is don’t have to change a thing, and are unlikely to be affected by the hardest to build packages no longer providing wheels.

Doesn’t this propose they will now be forced to build the hardest to build packages themselves or forced to use Conda?

oscarbenjamin · January 11, 2023, 12:29pm

I’m not sure if I understood what is intended by this or not. Concretely would this mean that if I’m on Ubuntu and I do pip install stuff then pip might install some things using apt-get and some things from PyPI?

(If the answer is yes then I have many more questions about how that would work in general but perhaps that’s for another thread somewhere.)