Python Packaging Strategy Discussion - Part 1

pradyunsg · January 25, 2023, 12:08pm

I’m wary of stipulating timelines to hold volunteers against.

I’d much rather we focus on (1) a direction for where we wanna go and (2) a bunch of things/initial steps that get us a meaningful distance along that way. In other words, set up a roadmap.

It’s a clearer way of signalling progress, which is what we want here IMO, and doesn’t lock us out of mechanisms we could use to expedite the efforts (eg: fundraising toward specific stages/features in that roadmap, getting interested contributors etc).

pf_moore · January 25, 2023, 12:32pm

Agreed. One of the problems here is that we have a clear request for a better, more unified packaging ecosystem. And yet in a very real way, that’s actually what we’re already doing, and have been for a number of years^[1]. It’s just that the timescales are terribly slow, for a lot of reasons - including, but not limited to, the fact that we’re basically all volunteers and work on whatever we enjoy working on.

So the survey message is, in an important sense, basically saying “hurry up!” - which unfortunately has the unintended negative effect of making the people doing the work less enthusiastic about it, because no-one likes to be nagged…

And in our enthusiasm to try to deliver what the users are asking for, let’s be very, very careful not to make that problem worse

The mechanisms we’ve put in place to allow multiple tools to interoperate add choice, yes, but the expectation was always (at least in my mind) to give tools the chance to innovate so that “best of breed” solutions can emerge naturally. ↩︎

barry · January 27, 2023, 1:38am

I’ve finally caught up on this thread ^[1]. There’s no way I can or will comment on everything said here ^[2].

I will add one perspective that I haven’t seen addressed. I work for BigCorp and we are currently in the process of revamping our Python build system, in order to ditch some painful legacy wrappers and adopt native Python tooling. We have probably a handful of senior Python experts with varying biases and experiences, so while we are converging on some decisions, we each prefer the tools we’re familiar with.

And that’s okay because frankly, any of pdm/hatch/poetry would be just fine ^[3]. One distinction made in this thread that I agree with is between “package manager” and “environment manager”. I.e. we could adopt say hatch and not need tox ^[4], although with pdm and poetry, we’d probably have to match that with tox. We’re angsting over the importance of features like lock files and the UX (or lack thereof) for managing dependencies; they are important, but eventually all the tools will have them, so how critical are they right now?

Essentially we act as a “distribution” (in Steve’s words) because we manage our own builds of CPython, and we internally mirror all PyPI packages we depend on, with a process for importing that includes things like license and security scanning. We build extension modules (both internal and external) against our CPython builds exactly to ensure they all work well together. Managing system dependencies is a challenge, but our plans for containers should hopefully make that workable for now. Even so, some external packages are (currently?) just too difficult to build from source, but we have a process for those handful of exceptions ^[5].

Yes, I would prefer to have a single tool that the entire community can help improve, but I’m very skeptical that will happen any time soon, if ever. In lieu of that, what I really want is more standardization around pyproject.toml settings and lock file formats. I’m okay with pdm/hatch/poetry/pip innovating on the UX but what is really painful is having to modify pyproject.toml if we (or one of our teams) chooses a different project manager. Those kinds of migrations and divergences are costly and time consuming. It’s bad enough for the open source projects I personally maintain.

So in practice what does this mean?

Please standardize on the settings for dynamically calculating the version. I need to be able to call some custom Python code to calculate this at build time (which hatch supports), or grab it from the file system, or query the SCM. This seems like a perfect candidate setting to converge on.
Please standardize how to specify what goes in my sdist and wheel. More generally, please at least standardize on the common settings for the build backends. Again, just start with things they all have in common today and we can deal with the things they innovate on later, as that all shakes out organically.
We desperately need a lock file standard format.
Please add some support for pip to read some settings out of pyproject.toml, most urgently PIP_INDEX_URL, PIP_EXTRA_INDEX_URL, PIP_TRUSTED_HOST and PIP_CERT. Yes I can use pip.conf but the more configuration files I have to manage, the more complex my problem becomes for migrations and such. PDM is nice here, hatch is trickier.
This one’s harder because of the philosophical differences (see my comment about hatch vs pdm+tox), but standardizing on dependency specifications would be a big plus.
Another difficult one, but having standard plugin settings and APIs would go a long way to improving things.

TL;DR - concentrate less on converging the tools and more on standardizing the settings and APIs. Ironically, the PEP process was originally designed to be very lightweight ^[6], but it hasn’t turned out that way, so come up with a really lightweight way to evolve the standards. It’s okay to innovate around the edges and on the UX. Make it less painful to switch tools.

not only was it completely exhausting, Discourse is orders of magnitude underestimating the time it would take with it’s “241 replies with an estimated read time of 139 minutes” ↩︎
and apologies for adding to the length ↩︎
and light years better than the “G” build system we currently use ↩︎
fewer things to manage, configure, mirror internally, and bootstrap ↩︎
discussions about supporting/encourging/allowing more binary wheel only packages on PyPI make me nervous ↩︎
compared at the time to its model, IETF RFCs ↩︎

steve.dower · January 27, 2023, 10:42am

May as well just standardise on a single backend at this point. The point of being able to choose is getting to choose how these things are specified, so that they can be appropriate for the project you’re building.

Certainly in the BigCorp scenario, we have clear recommendations (which on my side of our corp is currently “flit if you can, else setuptools”) and people just want to get their jobs done so they go with it. But we also need to keep the ability to write our own backend solely because the “standards” for these options are insufficient.

In agreement with the rest of Barry’s post. I too would like some way to specify index URL in sources (rather than just environment/user settings), though I’m currently leaning towards simpleindex for the teams I support.

rgommers · January 27, 2023, 11:27am

Great post, thanks @barry. This is the one point I didn’t quite understand. We have dependencies/optional-dependencies for runtime dependencies, and build dependencies under [build-system] already. And, given your hatch/pdm/tox reference, I don’t think you’re referring to external (non-PyPI) dependencies. Can you elaborate?

barry · January 27, 2023, 6:28pm

this

How much innovation do we need on the build backend after all? I’m not trolling, I’m genuinely interested in opinions. I’m thinking for our BigCorp needs, we may end up writing our own build backend too (perhaps leveraging and calling into something like flit to do the dirty work).

ofek · January 27, 2023, 6:35pm

Why, as opposed to for example extending Hatchling with your own plugins?

ofek · January 27, 2023, 6:41pm

Please whatever we decide to do, carefully consider copying exactly what Hatchling does for file inclusion. I created it after using setuptools, flit-core, and poetry-core for years to be easier to configure with better defaults.

barry · January 27, 2023, 6:41pm

Yep, sorry for not being clear. I wrote this with PDM’s [tool.pdm.dev-dependencies] and Hatch’s [tool.hatch.envs...dependencies] in mind. This is why I mentioned it might be difficult, because the various front-ends think about environments, tasks, and their dependencies in very different ways.

Let’s take test dependencies for example. I’ve experimented with a few different ways of specifying them with pdm+tox and alternatively with hatch. For the former, you could define your test dependencies in tox.ini or (as I currently prefer), in a dependency section in a tool.pdm.dev-dependencies.testing list, installed in tox.ini via pdm install -G testing. With hatch, I’ve landed on creating a [tool.hatch.envs.test] section and a dependencies list inside there.

Either way works fine. But the solution does more or less lock you into the front-end choice. Because this is a philosophical difference between the front-end tools, maybe it’s okay that they are innovating here. I could imagine however that you could standardize on something like PDM’s approach to dependency groups and hook them into Hatch easily. Something like this (by way of example, not proposal):

[project.dependencies]
testing = [
    # all my test dependencies here
]

# PDM and others would map `pdm install -G testing` to use that group.
# Hatch might want something like
[tool.hatch.envs.test]
dependency_group = 'testing'

barry · January 27, 2023, 6:44pm

That’s a possibility too.

ofek · January 27, 2023, 6:45pm

Hatch already supports that with the features option of environments.

barry · January 27, 2023, 6:48pm

Ah, I see. Thanks for that hint. So maybe this idea isn’t really that far off.

steve.dower · January 27, 2023, 8:55pm

I created my own backend because none of the others would let me generate an entire directory structure as part of the sdist->wheel conversion and then include it using wildcards. I assume that because nobody else supports it, nobody else wants it, but I needed it. A range of backends allows me to use that myself, while everyone else gets stuck with packaging exactly what they checked into git (or whatever variation their backend uses).

In short, I’m -100 on standardising on a single backend. If that were to happen, I would ignore it and encourage others to ignore it as well. The setup we have now is by far the best it’s ever been. It doesn’t affect my users, and my contributors should have to follow the tool selection for my projects.

The most I would accept is to have a recommendation for a default that is preinstalled.

steve.dower · January 27, 2023, 9:03pm

Just to dump a list, my build backend needs to support:

code signing at multiple stages during build
SBOM generation during build (pre packaging)
data files that are generated during build
arbitrary specification of wheel tags
arbitrary environmental overrides of version numbers (during sdist creation)
cross-compiling
Cython
in-place incremental rebuilds
integrate existing MSBuild extension packages
… more that I don’t remember right now

I know this is specialist stuff, but I can’t be the only one. So I built my own and use it regularly, because it fits my needs. The previous iteration was a setup.py that was longer than a custom build script would have been, and without PEP 517 I would have just ended up with a custom script. There’s no way I would’ve been able to get all these feature requests into an existing backend, or any that have been built since then, but I didn’t have to.

So yeah, I think there’s a ton of innovation out there, and we should keep encouraging it. It’s totally fine to have a recommendation or a requirement within a specific BigCorp context, but the current setup is as good as we can get for the ecosystem at large.

Let’s not mess with this bit - it’s front-ends and environment management that people don’t like, the backends are fine.

barry · January 27, 2023, 9:33pm

To be clear, I’m not advocating for disallowing custom backends, and even if a common backend were chosen (and possibly made the default), it should have an API that would take care of much of the dirty work of building your own backend, and/or expose hooks where you could customize some step(s) as you’ve described.

I agree that we’ll never get to the point where one blackbox backend handles everyone’s needs, but I do think we can get to a place where a default backend could serve 80% and provide hooks for another 15% of users.

pf_moore · January 27, 2023, 9:38pm

I agree 100%. IMO, PEP 517 was a huge success, and we should build on that, not undo it. And I’d go further - we should do more of the same. Encourage innovation and flexibility, and write standards that allow people to tailor packaging to their needs, not force people into a model that doesn’t work for them.

Yes, I know, we’re hearing strong demands that packaging is too confusing, and needs to be simplified. Fine, let’s make it less confusing. I’m all for that. But let’s not do that by making it less capable. We should be offering clear, easily accessible defaults that meet the needs of the majority of users. We should be providing consistent messages about what a “normal Python project” should look like. Users should be able to ask (via Google and on forums and mailing lists) how to set up a Python project, and get the same answer from everyone. And documentation should support that.

But we can do all of that, and still have a flexible system that supports other options. We just don’t push them in people’s faces. Have them there for people who know they need them, and have the understanding of their own use case to look for them and understand what the trade-offs are. Make them available for people who want to innovate - either just for the heck of it, or because they want to be the next standard solution. But do that without confusing the basic message that applies to the vast majority of users.

The trouble is, we’re good at the second part (look at PEP 517!) but really bad at the first part (consistent messaging and leadership). We need help there. And I still hope that @smm can find some way of getting that help for us, without stopping us doing what we do best.

pf_moore · January 27, 2023, 9:44pm

The hooks are there, in PEP 517. We could have one backend that meets the 80% - I’m not sure it exists yet (in spite of @ofek’s strong advocacy for hatchling ) but it’s certainly possible. But instead of forcing that same backend to handle everything else as well, via its own (non-standard) plugin mechanism, it would be better (IMO) to have libraries that make building your own backend as simple as possible. Then you have PEP 517 as the interface, library code to do all the grunt work of assembling wheels, etc, and you just write your own business logic - just like you’d do with a backend-specific plugin, but without the lock-in.

ofek · January 27, 2023, 9:57pm

So, would you be in favor of my extension module builder idea if I simply renamed it to something about build hooks? Because you basically just described that, which many folks seem to be either against or not care about or think it’s too hard of a challenge

rgommers · January 27, 2023, 10:09pm

That indeed seems fairly logical. It’s touched on in PEP 621, as a rejected idea - but only rejected because the authors thought it’s out of scope for that PEP, not because the idea is bad.

I’d imagine this would use test, doc and/or dev groups under [project.optional-dependencies]. The PEP 621 text is a little fuzzy, but the way it describes synonyms like tool.poetry.extras for it, I interpret that as it being for this kind of purpose - standardizing tool-specific dependency groups.

pf_moore · January 27, 2023, 10:37pm

Maybe, yes…? I haven’t looked into it, to be honest. Is it structured as a library, so that I write my backend and import and call functions from the “backend builder” library? If not, then I think we might be misunderstanding each other.

For example, if the backend builder library included functions that would let me replicate this bit of old distutils-specific code I had, that would be interesting.

from distutils.ccompiler import new_compiler
import distutils.sysconfig
import sys
import os
from pathlib import Path

def compile(src):
    src = Path(src)
    cc = new_compiler()
    exe = src.stem
    cc.add_include_dir(distutils.sysconfig.get_python_inc())
    cc.add_library_dir(os.path.join(sys.base_exec_prefix, 'libs'))
    # First the CLI executable
    objs = cc.compile([str(src)])
    cc.link_executable(objs, exe)
    # Now the GUI executable
    cc.define_macro('WINDOWS')
    objs = cc.compile([str(src)])
    cc.link_executable(objs, exe + 'w')

I didn’t get the impression that was the sort of thing you had in mind, though.