Subtle backward incompatibility in PEP 725?

I couldn’t find a specific topic about PEP 725 – Specifying external dependencies in pyproject.toml on DPO [1], but in reading it over recently, I wonder if there is a latent backward incompatibility.

From the specification:

The lack of an [external] table means the package either does not have any external dependencies, or the ones it does have are assumed to be present on the system already.

Isn’t it true that today, if there is no [external] table, then it undefined whether the package has external dependencies? If so, then the PEP as currently written assigns incompatible semantics to the missing table. There’s a difference between “we don’t know what the external dependencies are” and “we know there are no unassumed defacto present external dependencies”.

Why this is relevant: if you wanted to say explicitly that you have a pure Python package, you would, under the definition in the PEP, leave out the [external] table all together.

But then, you’re probably already doing that, however the ecosystem doesn’t know that you are positively asserting that you have a pure Python package, and thus can’t know that your package is probably “safe” to build. Hence, the backward incompatibility.

The way out would be for the PEP to ascribe undefined behavior for the lack of an [external] table (maintaining backward compatibility), and to add a key or value in the table to positively assert “I have no external dependencies”. While that might imply pure-Python, it still leaves room for ambiguity; are you a pure-Python package or a package with dependencies “assumed to be present on the system already”?

If my analysis is correct, and we want the PEP to be both backward compatible and unambiguous, then one possible way out would be to define a PURL that means “there are no external dependencies” and allow those in (optional-)build-requires and (optional-)host-requires. I don’t see any relevant PURL type with the meaning of “nothing”, so possibly pkg:generic/:none: might be used. Or the explicit shorthand :none: would be allowed.


  1. if I missed it, then DPO search might be lacking ↩︎

1 Like

In theory, I would say yes.

In practice, “assume they are present, if any” is the only practical way for an installer to respond.

But this would (if I’m thinking and reading straight) also make my proposed enhancements non-conforming, unless the prompting options claimed that legacy projects definitely don’t have any missing external build dependencies. Which seems misleading, and counter to the intended purpose of prompting the user.

So I lean in favour of what you’re saying.

The blithe answer is that anyone putting out an sdist for a pure Python package should be getting pressured to publish wheels anyway.

But then, the same problem could also apply to a project that needs to be built, but where the build system is expected to bundle all the needed build-time dependencies. (Although the examples I can think of for this sound… rather strained.)

A package that wants to advertise that there are no external build dependencies, can explicitly put empty lists for those keys in [external]. Or at least put an empty [external] table in the pyproject.toml. I don’t think that a separate mechanism is required to communicate this. “Explicit is better than implicit”, but “special cases aren’t special enough to break the rules”.

A project that assumes a dependency is present, and explicitly declares not having that dependency on the strength of that assumption, is doing so at its own risk - and it should be considered a bug in the project’s packaging.

I don’t think the “pure-Python package” distinction is particularly useful to the install process. Such projects are only a subset of those which “cause no problems”. (I’m not even sure it’s well defined. May a “pure-Python” have runtime dependencies that themselves use C extensions, for example?)

In general, if we’re going to have backwards-compatibility mechanisms surrounding pyproject.toml tables, then “the table is completely missing” really needs to mean “this project predates the standardization of the table, or hasn’t been updated to that standard” - and thus the values that such a table would have are necessarily unknown.

By my understanding of the PEP, PURLs only get used in lists (TOML arrays), so the natural value to signal this is an empty list.

1 Like

Sure, but as (I think) Paul pointed out, a significant number of packages only publish sdists.

I think it could be, for two reasons. One is that often, even if a package containing native code extension modules is possible to build, it still may not be desired. Compiling native code is slower and often mysteriously so to the end user. I’ve worked at places where the (cold cache) installation process just sits there for a while, and while it eventually succeeds, we still got complaints that installing package X “took too long”.

Second, I could imagine that PyPI could implement a build farm solely for pure Python packages. No need to manage a plethora of platforms to cover every conceivable ABI, just something that compiles .py files and you’re good to go. You could even do that on demand for older distributions. That seems much more manageable and still useful to the community, as the number of sdist-only published packages seems to support [1].


  1. although, we’d probably want a deeper analysis looking at the number of pure-Python sdist-only uploads across all distributions, ranked perhaps by package popularity ↩︎

1 Like

… Actually, I’ve proposed this before, too. It seems like there aren’t resources available for it, though. (It’s not even necessary to precompile to .pyc, although some might find it a nice service.)

… yet? :thinking: :smile:

This can probably be a standalone thread, but I’ll just point out: there is a link to the Discourse thread in the Discussions-To part of the PEP header. :joy:

2 Likes

Yes, that is correct. That is also what “the ones it does have are assumed to be present on the system already.” meant to say. It’s the status quo, or “if it has any, they are assumed by frontends to be present, and backends will try to use them and they will error out if a required one is missing”.

There’s no incompatibility here, perhaps just a minor change in wording needed to make more clear that it means what you think it should be.

That is an orthogonal concept. It is very well possible for pure Python packages to have external dependencies. The PEP even has an example of that, namely Spyder. It only has non-Python runtime dependencies, but is itself pure Python. There is no example of a pure Python package with a build-time dependency, however even that is possible (e.g., it uses a code-generation tool not packaged on PyPI).

If we want a “is pure Python” field in pyproject.toml, then we should just add one - and I agree that that would be quite useful. I’ll also note that it’s kinda the same as (but not identical to) Root-Is-Purelib in the WHEEL spec. A build backend now has to apply annoying heuristics to determine whether Root-Is-Purelib should be true or false.

It is a good idea I think to allow explicitly. I agree that an empty list is the way to go - that’s how dependencies in the [project] table works as well. It isn’t explicitly stated yet in the PEP, so I’d say let’s add it.

I’ll make sure to cross-link from that thread to this one. And a few other threads/comments where PEP 725 has come up recently. I’ve been very short on bandwidth, but it looks like it’s high time for one more push.

5 Likes

Relevant to this part of the discussion, I was only referring to build dependencies.

That feels like a different dimension to me, and perhaps one that should be called out directly. If all I’m doing is lumping the .py files into a zip and adding some metadata, I wouldn’t need those code-generation tools. If however I need to do some pre-processing to regenerate those .py files, then yes, I’d need them, but I’m not sure if I’d lump them into build-requires though maybe optional-build-requires since regenerating .py source files might be considered an optional build step[1].

:+1: to the other PEP clarifications suggested here. Thanks!


  1. likely project specific ↩︎

If examples are needed, anything that uses versioneer or setuptools_scm as a build-time dependency implicitly depends on git being available.

Not if building from a sdist (which is by definition not a git checkout).

I think we might need to distinguish between building from a source tree (which may be a VCS checkout, or have files that get preprocessed) and building from a sdist, where the build backend can be assumed to have done a bunch of preparatory work to “freeze” the distribution (storing the version number extracted from VCS, building C sources using cython, etc.)

Is it the build backend’s responsibility to determine, for example, that when pyproject.toml in a source tree says “depends on git”, the sdist built from that source tree no longer depends on git and so that dependency should not be present in the sdist metadata? If not, isn’t the sdist metadata wrong?

2 Likes

Just the other day I saw a case of an sdist that seemed to be broken along those lines. As I understood it, the build backend was expecting to use a version from a file that cached a setuptools-scm result, but the sdist tarball didn’t actually appear to contain the file.

I would say the metadata is wrong, and further that the tooling is working wrongly. In my view, because pyproject.toml is included in an sdist, it’s about explaining how a wheel is produced from an sdist. If there’s something interesting about producing an sdist from the repo, then that should be handled separately - and it’s fine if that produces an sdist containing a pyproject.toml that differs from the repo’s pyproject.toml, as long as the packaged version correctly describes the sdist and allows for building a correct wheel.

1 Like