Versioning `pyproject.toml`

fungi · December 26, 2023, 3:48pm

What would be the real benefit of having versions? I’m guessing
the only reason is going to be support new features but this will
create the chaos of users using defining higher version than
needed or users using a lower version of the required by the
features using (so it will require the overheard of throwing
errors for some scenarios or people unable to use the
pyproject.toml because the maintainer is used to always define the
same version for all their project)

Versioning pyproject.toml can mean a couple of different things
depending on how you approach it. One possibility, which I think a
lot of people in this thread are focusing on, is declaring a
version number inside the pyproject.toml file so that tooling can
error if it’s too old to support the newer features implied by that
version. The other possible meaning (and where I, as a tool
maintainer, see more utility) is to version the pyproject.toml
specification so that I can document in my tools what versions of
the specification a user should expect to be supported in order to
help them make an informed decision about whether to use them or
which features of the specification are too new and should be
avoided when used with a particular vintage of tool.

Embedding minimum versions in files and then having tools emit
errors to end users, while effective, is behavior I see as somewhat
user hostile and at best a backstop. If I’m packaging a project and
wanting a particular version of a specific build backend, I need to
make sure that the build backend I’ve chosen supports the features
of the pyproject.toml specification I’m trying to package with.
Anything less is a disservice to my users. If the pyproject.toml
specification has a versioning scheme noting at which versions
certain features were added, then build backend maintainers can
document what the latest implemented version of the specification is
for a given release of their backend without having to enumerate all
of the individual pyproject.toml features that version of the
backend supports (or predict future features that version lacks
support for as released).

kknechtel · December 26, 2023, 4:21pm

Well, yes; but the flip side of this is something I’ve been trying to get at as well - I’m not sure I like the idea that the distributor should be the one choosing the build backend, when others might work just as well. Right now that assumption is encoded into the fact that it’s in pyproject.toml, plus the fact that sdists preserve that file, plus the fact that Pip automatically installs from sdists by default, without user intervention, when a wheel isn’t available.

jamestwebber · December 26, 2023, 5:41pm

I don’t understand how the build backend could be chosen by the user–two different backends would only be interchangeable if they support the same features and configuration, in which case why are they two separate projects?

Flexibility in build backend is specifically for the developers, a user shouldn’t need to care about changing it.

sirosen · December 26, 2023, 5:50pm

It’s strictly a necessity. This is all a bit OT, but…

discussion of specifying backends

If my project has its metadata declared in setuptools `setup.cfg`, then the build backend must be setuptools. flit-core, hatchling, etc won't work. If my package uses `[tool.hatch.build]` configuration, then only hatchling works. As for specifying the version, the general consensus is to specify only a lower bound or not specify the version at all.

There’s actually a bit of an issue in that different build frontends may behave differently even on the same backend. So, for example, pip install . and flit build; pip install ./dist/*.tar.gz may give you different results. But in practice pip is the only frontend which matters for sdists, so it’s not a huge issue most of the time. It can show up when building from source.

This thread kicked off around versioning of the [project] table, and specifically with respect to a feature discussed in the thread about Dependency Groups. It was regarding how we could ever safely introduce new syntactic forms into [project.dependencies] and [project.optional-dependencies].

I’ll show the syntax here, which was suggested as a way of including a dependency group in [project.dependencies]:

[project]
dependencies = [{include-group = "foo"}]

For the purposes of this thread, and to re-center this a bit, let’s assume that we’re all agreed that we want such syntax.

I see four ways for such a new syntax to be added:

the whole pyproject.toml file is versioned, and this is introduced in v2
the [project] table is versioned, and this is introduced in v2
the syntax is added with no explicit version number, and its presence implies that [project] is on v2
the syntax is added with no explicit version number, and its presence does not imply anything additional

I much prefer (4) to (3) here, since it’s not clear what “implies v2” would mean.^[1] I have no clear preference between (1) and (2) but they each seem worse than (4), at least at present, since their meanings are still unclear.

I think the choices at present for a new spec are (4) – add new syntax with an awareness of what the fallout may be – or don’t add it.

Not adding syntax of any new kind to [project] may be the right choice most of the time. But I doubt it will be the right choice all of the time. Sooner or later, there will be a pressing need to make changes, possibly very subtle ones, to that table.

In certain network protocols – I’m thinking of SSL/TLS – two parties, a server and a client, need to send information back and forth to detect what behaviors are and are not supported. It’s not just a protocol version (although TLS has that too). Often they’re sending feature negotiation messages, information about their supported behaviors (e.g. ciphers). I think our situation with pyproject.toml is really much more akin to this kind of feature negotiation.

Of course, computers are very good at this sort of thing and pyproject.toml is human-edited. I don’t want to write my files with

features = "dependency-groups,path-specifiers,external,..."
[project]
...

that sounds very unpleasant and easy to get wrong.

To @fungi’s point about version numbers being good for documentation, I agree. And I really like that there’s a clear use-case there for the version number, with a specific user-facing impact. But we don’t have a spec which evolves in discrete linear steps. Assume that [external] and [dependency-groups] are both accepted, around simultaneously. What version number should be assigned to communicate that a tools supports [external] but not [dependency-groups]? And vice-versa?

We can’t put a version number on this multi-dimensional thing, which is the pyproject.toml spec, and call it a day. Version numbering even the spec requires that changes are in some way serialized^[2]. Which might be a good thing, but then we’ll probably need to discuss which features are mandatory vs optional to support on different versions. The worst case scenario here is that we reproduce much of the current state, with a small core spec considered “mandatory” and the rest considered “optional”.

It’s not at all version negotiation that tool authors want. They want to be able to negotiate their supported features with the user. And specifically, it’s tool<->user communication! Not tool<->file! pyproject.toml is just in the middle.

I don’t know how that communication should be handled and mediated between tools and users. Perhaps if there were a single library which converted pyproject.toml contents into a list of feature flags, that would help the tool authors?

To indulge in an old joke, a Ruby engineer is enthusiastically expounding on the virtues of the language, and how “adjacency implies function application, it’s so clean!” His senior steps in to correct him: “No. Adjacency implies adjacency.” Implicit logic is hard, y’all! ↩︎
put in series, not encoded in a wire-format ↩︎

fungi · December 26, 2023, 6:36pm

I don’t understand how the build backend could be chosen by the
user–two different backends would only be interchangeable if they
support the same features and configuration, in which case why are
they two separate projects?

Flexibility in build backend is specifically for the developers, a
user shouldn’t need to care about changing it.

I don’t think I said that users should choose the build backend? But
also the terms “user” and “developer” are somewhat relative. I see
“users” of a build backend as the people writing the pyproject.toml
file (what you might call the “developers” or “maintainers” of the
project being packaged). Others may think of “users” of the packaged
software, i.e. those installing the packages rather than those doing
the packaging. Who is a user and who is a developer is a matter of
context.

fungi · December 26, 2023, 6:46pm

To @fungi’s point about version numbers being good for
documentation, I agree. And I really like that there’s a clear
use-case there for the version number, with a specific user-facing
impact. But we don’t have a spec which evolves in discrete linear
steps. Assume that [external] and [dependency-groups] are both
accepted, around simultaneously. What version number should be
assigned to communicate that a tools supports [external] but not
[dependency-groups]? And vice-versa?

We can’t put a version number on this multi-dimensional thing,
which is the pyproject.toml spec, and call it a day. Version
numbering even the spec requires that changes are in some way
serialized[^2]. Which might be a good thing, but then we’ll
probably need to discuss which features are mandatory vs optional
to support on different versions. The worst case scenario here is
that we reproduce much of the current state, with a small core
spec considered “mandatory” and the rest considered “optional”.

When developing support for network protocols, my projects’
documentation lists the set of IETF RFCs for which support has been
implemented. If only the Python community had a similar concept.

Once upon a time, I listed the packaging PEPs for which support was
implemented, but have been told more recently in no uncertain terms
that mentioning PEPs in documentation is very out of fashion, and I
should refer to (entirely unversioned) Python packaging
specification documents instead. This still seems like a massive
step backwards to me from the standpoint of clarity, but I have come
to realize that I hail from a different generation of engineers for
whom versions actually matter rather than just telling everyone to
“install the latest of everything and hope for the best” even if
there’s no real guarantee they can actually do so.

jamestwebber · December 26, 2023, 6:47pm

I was responding to Karl’s post, not yours.

sirosen · December 26, 2023, 7:00pm

I share some of your discomfort with this, since the PEPs are frozen but the packaging docs are not. It’s a double-edged sword: the ideal is that it improves clarity for users, since those docs are more actively maintained and can get (non-typo) fixes and improvements. But it also means that as an external consumer of the docs, you’re no longer referring to something precise.

I’ll note, since I’m currently working on a PEP, that the PEPs themselves now refer out to the packaging docs, so the demand for those to remain stable over time is relatively high.
It may be that the major harms that we’re concerned about, with the potential for those docs to change in significant ways, are mostly theoretical.

To look at another angle on this issue of how to document features, I think the consensus to describe features by name, rather than by PEP number, is a significant, if recent, improvement. I call the PEP “Dependency Groups”, and I’ll give you the number, 735, if you want it. You may note that the hypothetical “feature flag” list I shared above did not contain PEP numbers. All of this is an improvement to clarity at no cost to our precision, if we do it right.
But where should the link behind Dependency Groups lead you? So for a doc author it’s not entirely independent from the choice between PEPs and packaging docs.

jeanas · December 26, 2023, 7:12pm

Since a few days, all PyPA specs have a changelog at the bottom. It is trivial to write a PR to add version numbers as well.