Python Packaging Strategy Discussion - Part 1

Just to dump a list, my build backend needs to support:

  • code signing at multiple stages during build
  • SBOM generation during build (pre packaging)
  • data files that are generated during build
  • arbitrary specification of wheel tags
  • arbitrary environmental overrides of version numbers (during sdist creation)
  • cross-compiling
  • Cython
  • in-place incremental rebuilds
  • integrate existing MSBuild extension packages
  • … more that I don’t remember right now

I know this is specialist stuff, but I can’t be the only one. So I built my own and use it regularly, because it fits my needs. The previous iteration was a setup.py that was longer than a custom build script would have been, and without PEP 517 I would have just ended up with a custom script. There’s no way I would’ve been able to get all these feature requests into an existing backend, or any that have been built since then, but I didn’t have to.

So yeah, I think there’s a ton of innovation out there, and we should keep encouraging it. It’s totally fine to have a recommendation or a requirement within a specific BigCorp context, but the current setup is as good as we can get for the ecosystem at large.

Let’s not mess with this bit - it’s front-ends and environment management that people don’t like, the backends are fine.

4 Likes

To be clear, I’m not advocating for disallowing custom backends, and even if a common backend were chosen (and possibly made the default), it should have an API that would take care of much of the dirty work of building your own backend, and/or expose hooks where you could customize some step(s) as you’ve described.

I agree that we’ll never get to the point where one blackbox backend handles everyone’s needs, but I do think we can get to a place where a default backend could serve 80% and provide hooks for another 15% of users.

1 Like

I agree 100%. IMO, PEP 517 was a huge success, and we should build on that, not undo it. And I’d go further - we should do more of the same. Encourage innovation and flexibility, and write standards that allow people to tailor packaging to their needs, not force people into a model that doesn’t work for them.

Yes, I know, we’re hearing strong demands that packaging is too confusing, and needs to be simplified. Fine, let’s make it less confusing. I’m all for that. But let’s not do that by making it less capable. We should be offering clear, easily accessible defaults that meet the needs of the majority of users. We should be providing consistent messages about what a “normal Python project” should look like. Users should be able to ask (via Google and on forums and mailing lists) how to set up a Python project, and get the same answer from everyone. And documentation should support that.

But we can do all of that, and still have a flexible system that supports other options. We just don’t push them in people’s faces. Have them there for people who know they need them, and have the understanding of their own use case to look for them and understand what the trade-offs are. Make them available for people who want to innovate - either just for the heck of it, or because they want to be the next standard solution. But do that without confusing the basic message that applies to the vast majority of users.

The trouble is, we’re good at the second part (look at PEP 517!) but really bad at the first part (consistent messaging and leadership). We need help there. And I still hope that @smm can find some way of getting that help for us, without stopping us doing what we do best.

5 Likes

The hooks are there, in PEP 517. We could have one backend that meets the 80% - I’m not sure it exists yet (in spite of @ofek’s strong advocacy for hatchling :slightly_smiling_face:) but it’s certainly possible. But instead of forcing that same backend to handle everything else as well, via its own (non-standard) plugin mechanism, it would be better (IMO) to have libraries that make building your own backend as simple as possible. Then you have PEP 517 as the interface, library code to do all the grunt work of assembling wheels, etc, and you just write your own business logic - just like you’d do with a backend-specific plugin, but without the lock-in.

So, would you be in favor of my extension module builder idea if I simply renamed it to something about build hooks? Because you basically just described that, which many folks seem to be either against or not care about or think it’s too hard of a challenge

That indeed seems fairly logical. It’s touched on in PEP 621, as a rejected idea - but only rejected because the authors thought it’s out of scope for that PEP, not because the idea is bad.

I’d imagine this would use test, doc and/or dev groups under [project.optional-dependencies]. The PEP 621 text is a little fuzzy, but the way it describes synonyms like tool.poetry.extras for it, I interpret that as it being for this kind of purpose - standardizing tool-specific dependency groups.

Maybe, yes…? I haven’t looked into it, to be honest. Is it structured as a library, so that I write my backend and import and call functions from the “backend builder” library? If not, then I think we might be misunderstanding each other.

For example, if the backend builder library included functions that would let me replicate this bit of old distutils-specific code I had, that would be interesting.

from distutils.ccompiler import new_compiler
import distutils.sysconfig
import sys
import os
from pathlib import Path

def compile(src):
    src = Path(src)
    cc = new_compiler()
    exe = src.stem
    cc.add_include_dir(distutils.sysconfig.get_python_inc())
    cc.add_library_dir(os.path.join(sys.base_exec_prefix, 'libs'))
    # First the CLI executable
    objs = cc.compile([str(src)])
    cc.link_executable(objs, exe)
    # Now the GUI executable
    cc.define_macro('WINDOWS')
    objs = cc.compile([str(src)])
    cc.link_executable(objs, exe + 'w')

I didn’t get the impression that was the sort of thing you had in mind, though.

It wouldn’t replicate the logic you have there but rather expose an API build backends can call to execute the logic you have there. Essentially, a way for things to happen before wheels and source distributions are built.

You already know about where lock files stand (and if I’m wrong and you don’t, message me). As for more standardization, it might be interesting to look at what Poetry, Hatch, pipenv, PDM, and even Flit offer out-of-the-box and see where their feature sets overlap. That’s probably the most telling as to what we might be able to to get the community to rally around standardizing.

This was proposed at Adding a non-metadata installer-only `dev-dependencies` table to pyproject.toml .

That stems from Core metadata specifications - Python Packaging User Guide which explicitly reserves test and doc for this sort of thing (and which I believe @barry has said he wished were plural :wink:). That makes it the closest we have to a standard around specifying development dependencies

1 Like

OK, then it sounds like it might be useful. I agree that the consensus seems to be that something general is only likely to help in simpler cases, so maybe people are simply pointing out that you’re over-selling how broadly applicable it might be. I never expect to write a complex backend, so limited scope would be fine for anything I’d ever need, but I’m not typical. And honestly, the number of people interested in writing a build backend that needs to compile native code, and who don’t have specialist needs, is likely extremely small (most people with simple needs will just use setuptools). So I’m not surprised there’s limited interest. But that’s no reason to not build it - niche libraries are still of use.

1 Like

It’s very tempting to double down like that, because it broadly validates the previous direction and choices made in this space, rather than face the increasingly loud feedback that the outcomes[1] of those choices have produced a state of affairs that users are very unhappy with.

No-one is talking about making anything less capable. But I argue that overall we do want less tools doing the same thing (like build backends), rather than more. Tools want to be used, and in the case of build backends, those users are library authors[2], who pass on their choices to all downstream users[3], which just perpetuates the situation that is being criticised.

I’d say that people with roles like @steve.dower and @barry (absolutely requiring their own backends for internal BigCorp distributions) are an absolutely miniscule portion of users[4]. Those use cases are also those which will make things work regardless, because of BigCorp paying its employees to solve the (internal) problems at hand, but it’s not a great stand-in for the needs of the average user.

It’s been cited a lot of times in this thread already, but “write standards that allow people to tailor packaging to their needs, not force people into a model that doesn’t work for them” is just going to lead to more of the “XKCD: standards” type outcomes, rather than do any halfway meaningful consolidation[5].

It means that, despite people’s good intentions, the overwhelmingly dominant inertia[6] will all-but-ensure that everyone continues in their respective groove / niche / bubble, continues optimising for their own use cases and user base, and leaves the people farthest removed from these discussions with a wild zoo of tools to deal with (at least I strongly doubt that any new default without substantial weight behind it would make dent in this).

If that is the outcome of a 250+ post “strategy discussion” with the explicit aim to reduce this proliferation of tools, I’d be pretty disappointed.


  1. I’m distinguishing this because while choices are intentional, their outcomes are often not. Also, if we postulate that it’s still the right path to a “promised land” solution, then the speed at which we’re (not) getting there is one of those outcomes. ↩︎

  2. the most opinionated inhabitants of the ecosystem, which will have no qualms to go with something that’s not the newly-defined default if it doesn’t solve their problem ↩︎

  3. and all roles in between, from authors of reverse dependencies, to distributors, to administrators ↩︎

  4. which is not to downplay the outsized role they play for their user base, much less in the ecosystem ↩︎

  5. If we accept that we cannot solve 100% of all use-cases with a single tool, then some degree of forcing people to make changes is unavoidable, even just for enforcing a new default. I’d say the case would be pretty clear that this would be a good trade-off if we could cover 99% and have the remaining 1% jump through hoops. It’s less clear at 90%:10%, and even less so at 80%:20%. ↩︎

  6. of the “it would be nice to interoperate, but I need to get this to work now” kind ↩︎

5 Likes

Yes, that’s the “other way 'round” approach I was implying [1] and it works for me. You need to get the right division of labor and functionality in those libraries to make it possible to write your business logic efficiently, but the same is effectively true for a single backend with hooks and APIs.


  1. or at least had in my mind when I wrote my reply ↩︎

Yes, although of course I wouldn’t want to hard code test, doc and dev. Despite my initial cognitive mismatch connecting Hatch’s features and [project.optional-dependencies] settings, the way Hatch does it is exactly what I was thinking about here.

Yes, and I’m confident we’ll [1] get there!


  1. well really, you’ll :smile: ↩︎

Yes, exactly [1]. Hatch’s [project.optional-dependencies] + [tool.hatch.envs.<blah>.features] is functionally what I was thinking about, but I agree that pulling these out of optional-dependencies into a dev-dependencies section is is a long term better way to go.


  1. he says without reading the whole 58 message long thread ↩︎

Perhaps, and of course I do also wear my “random guy who maintains some open source Python stuffs” hat, for which the current situation (regardless of the specific tool choice) is miles and miles better than the setuptools way. But then, wearing that hat I don’t do a lot of complicated stuff anyway so I’m just trying to make things a little more resilient and standardized.

That said, the needs of BigCorp users are underrepresented here, and probably in public forums because they just deal with what they’ve got, and throw resources ($/time/people) at the problem, grumbling to themselves in Slack/Teams/Zoom/Breakrooms. It’s not made easier by the really diverse choices made by upstreams either, all of which require their own one-off hacks that have to be redone every time something upstream changes. I do think that some of the issues encountered internally by enterprises contribute to the sentiment that “Python packaging is hard” because there can be a lot of Python consumers inside those organizations, and their only encounter with Python is within a corporate environment.

2 Likes

Just in case it needs to be said: I’m not proposing to go back to that[1], but forward (in a direction that’s more cohesive than the current state of affairs; however we’d end up shaping that).

I cut the quote short but I agree with that whole paragraph – I think corporate users (not just Big ones) are both underrepresented in these discussions, and also highly exposed to the many different ways that Python packaging works just within all the internal uses (it’s something I see all the time in my $dayjob). Though in contrast to BigCorp, not everyone has the means to set up their own distribution much less build backend, so the grumbling is very loud also in SMEs.


  1. at least, people will have a lot of associations with that term ↩︎

2 Likes

I want to thank everyone who has participated in this thread. It is quite encouraging to see an engaging discussion such as this one.

I am in the process of writing a blog post summarizing this thread and setting out a way forward. While there has not been a clear consensus on what unification will look like, I do feel there is enough consensus that this should be taken further to flesh out the details.

As a heads-up, the remaining strategy discussions will be about-

  • Better support for Packaging users
  • Phasing out legacy systems
  • Supporting each other in the community
  • Encouraging long-term contribution

The second part of the strategy discussion is now live. I would like to invite everyone who has engaged in this thread to continue the discussion with the second part of the strategy discussions.

5 Likes

Amen to that. I have to say though that after following this thread I have some amount of pessimism. Unless people from all sides of the problem are willing to consider everything on the table — changes to pip, changes to Python, changes to PyPI, everything — I’m doubtful that anything will come out of it that really resolves the issue.

2 Likes

It was indicated by the author that the discussion would close on January 20.