Supporting sdists and source trees in PEP 665

OTOH if we don’t include them in the lock file that might lead to gaps in the dependency graph in the lock file… If we include them the installer would need to protect against path traversal issues / attacks.

Wheels are inherently specific to the platform they are built for, and so the dependencies they list are known to work for that same platform; a wheel built for Windows will list what it needs to run on Windows and no other platform. This is why we list dependencies on a per-wheel basis.

Sdists and source trees, though, inherently have no specific platform. That means when you use PEP 517 to gather metadata on what an sdist or source tree will require to be installed, you must somehow know what platform the information applies to because PEP 517 has not concept of cross-platform compilation. So if you get the metadata for an sdist on Windows then you must either be told how generic the metadata is via WHEEL or assume that the metadata is Windows-specific and does not apply to other platforms. This means that unless the sdist produces a pure Python wheel your lock file will inherently be locked to the platform you are building on to some degree based on what WHEEL says or it’s the strictest wheel tag for the current platform.

Does anyone think they can make a concrete proposal to add sdists to PEP 665 by Thursday, Nov 25? It’s okay if nobody can as adding support to sdists in a subsequent PEP is totally fine and I think somewhat expected at this point.

Sorry for the late reply.

Is that not true also for the root project for which the lockfile is generated ?

In practice that might not be an prevalent issue. Many projects use environment markers in their dependencies, so they have the same dependencies in their metadata, irrespective of the platform for which the wheel is built.

I personally will not have time for specification writing before the end of the year.

To sum up my suggestions above, I think the changes required to the current PEP 665 to make it sdist and source tree friendly are these:

  • removing filename from package entries
  • adding instead environment markers and/or tags to package entries
  • incorporating VCS urls with commit id using the PEP 610 fields
  • adding build dependencies scoped to a package entry

If these don’t make it, well… 80% of my projects will not be able to use PEP 665, and I’ll stick to requirements.txt which work fine in practice. Maybe the use cases I see in my practice are outliers, after all, I don’t know.

Can I ask what you mean by “sdist and source tree friendly”? If you mean “lockers will be able to specify something that an installer can install, but only if that installer supports installing from sdists” then I’d consider that a problem - if the user can’t use a PEP-compliant locker in confidence that a PEP-compliant installer will be able to do the install, that doesn’t seem like a usable interoperability standard, to me.

Having said that, the current draft of PEP 665 has sdist support as an open question. If that question is resolved as “sdists are not supported” I would fully expect the main PEP text to be clarified to state that only wheel files can be specified - i.e., to state that url must point to a file conforming to the wheel spec.

If the PEP allows url to be a source tree/sdist, it would have to mandate that installers support source installs (and how to do that is what this whole thread is about, surely?)

[…]

Can I ask what you mean by “sdist and source tree friendly”? If
you mean “lockers will be able to specify something that an
installer can install, but only if that installer supports
installing from sdists” then I’d consider that a problem - if the
user can’t use a PEP-compliant locker in confidence that a
PEP-compliant installer will be able to do the install, that
doesn’t seem like a usable interoperability standard, to me.
[…]

Thinking through it, seems like supporting sdists in a locker would
require they be able to (call out to something which can) at least
build wheels from those as part of the resolution process.
Otherwise, there’s no guarantee they’ll have access to sufficient
package metadata to even know what versions will work in a given
environment. Wheels provide this, but the resolver may have to try
multiple versions of sdists until it can find one which builds on
the target platform.

For source trees, it’s potentially more complicated since you can’t
necessarily even know what the local version number is without first
calling its build backend to create the necessary metadata (most of
my projects infer their versions at build time from Git tags and
similar information, after all).

My understanding is as follows. For the locker to generate wheels that means (a) to potentially constrain the lock file to the locker’s build environment/platform, (b) to distribute built wheels with the lockfile and (c) to be able to encode the wheel’s location relative to the lockfile. The first two are not trivial limitations to impose on the user but the lockfile will be kept “pure” and the locker and installer will not share responsibilities. On the other hand, for the installer to be able to install wheels that means it has to double up as a build frontend. The lockfile as a format will be diluted - it won’t enumerate strictly installable artefacts and the package metadata in the lockfile will be incomplete. In essence the installer will have to perform a second pass over the lockfile and “re-lock” it. This is more practical but also a lot more complex to implement.

Could a lockfile perhaps be marked as “not finalised” (or “impure” or whatever) and in its finalised state, it must not contain sdists? When distributing a lockfile with sdist links it will be finalised = false and will require the use of a locker (not an installer) to finalise it for installation in the user’s environment. The responsibility will remain with the locker to re-lock the lockfile and we can keep the two components, locker and installer, separate. Lockers could also choose not to support (or produce) lockfiles which aren’t finalised.

This is echoing Brett’s assumption above that sdist support is going to be optional for installers.

As how to make that predictable for users, this is part of UI/UX questions that will probably have to be resolved during a testing period. For instance, the locker could warn the users if it meets sdists or source trees, or it could have a --binary-only mode that errors-out if any dependency is not available as a wheel.

That said, independently of sdist support, UI/UX for multi-platform support is also unclear to me. For instance, a lock file may well not work on some platforms because some dependencies are not available as wheels for that platform. Should the locker attempt to warn the user about that ? Or can it be left to the installer to error-out.

I think the use-case the PEP is targeting is when the locker has a known set of platforms which it can produce a dependency set for, not every platform

OK. Brett’s comment doesn’t expand on how tools would communicate whether a particular lockfile needed sdists, which I’d consider part of “sdist support”.

To be clear, if PEP 665 makes sdist support out of scope, I’d expect it to say that all URLs must point to wheels. If it puts sdist support in scope but optional, I’d expect some sort of metadata in the lockfile that explicitly says if consumers need to support sdists (so that consumers can fail early if they don’t have that optional support, and producers can clearly state what features are needed).

I’m saying “expect” here with my PEP delegate hat on, by the way :wink: In particular, I’m saying it without any sort of judgement as to whether sdist support should go in or not - if anyone wants my personal feelings on that I can provide them, but I’ll do so in a separate message.

If we ignore sdists, no, because since wheels provide all of their data statically you could theoretically create a multi-platform lock file (which is by design based on feedback from PDM and Poetry).

I think that’s a fair assessment of where things stand. It’s a question of whether sdist support should be in this PEP or consider an out-of-band thing (at least for now).

But then what’s the initial point of the lock file? If you have to recompute things then that waters down the supply chain security by dynamically adding things in an unsupervised manner. Even if you make it only additive to the initial lock file, what the sdists suddenly require as dependencies will lead to supply chain attacks.

To me, this suggests that sdist support is very much a development process issue and not something to deploy to production with.

Probably both. If you ask a locker to support a platform and it’s not possible then the locker should tell the user that. And the PEP already says that if a dependency graph resolution doesn’t work out then it’s an error, and so the latter case is already covered.

It’s a little tricky as even if an sdist is listed in the lock file it isn’t necessarily going to be required for a successful install. You could have an sdist listed as a fallback only and manage to make a complete install w/o an sdist ever coming into play.

So if you required tools to opt into using sdists then that could be the mechanism as to whether to even include the sdists in the dependency graph or not. Otherwise we could have an unsafe = true key in the [metadata] table to signify that there are sdists to take into consideration and act as a marker for wheel-only installers to quickly error out.

I think the spec runs the risk of collapsing from too much flexibility if you take that argument to its logical conclusion.

A locker could, by the same logic you describe here, include any one of the following:

  1. A distribution specific file like a .deb or a .rpm package.
  2. A conda package.
  3. A raw .py file or shared library.

It’s no more obvious to me that lockers should be allowed to include sdists than any of these (or indeed, anything else they feel would deliver the same environment).

If PEP 665 wants to allow lockers to specify non-wheel sources “for future expansion” or “as an installer-specific fallback”, then I think it needs to define that mechanism¹. Otherwise I think that it should require that only wheels are specified, and make the whole mechanism including letting lockers specify non-wheels be the non-standard extension.

¹ It doesn’t have to be complex. A single key, “extensions” which holds a list of extended features required to install the lockfile, with no valid values being defined by default, would be sufficient. The sdist support proposal could then simply define a “sdist” extension, with appropriate changes to the rules. You could even reserve all extensions starting with “X-” for experimental use, saying that no standard is allowed to use such an extension name, if you want to.

Couldn’t you use the [tool] table for that, though? Whatever extension you come up with is going to require specific tool support anyway since it isn’t a standard.

No, because a consumer that only supports wheels (i.e., it supports what the PEP defines and nothing more) should be able to reject lockfiles that it can’t handle before doing a bunch of work trying to install stuff. Such a consumer can’t check the [tool] section for that, because the PEP explicitly assigns no semantics to that section.

To be honest, I don’t see why you’re trying so hard to avoid just saying “the PEP requires all sources to be wheels”. Doing so doesn’t make it any harder to write a follow-up PEP adding sdist support, and people wanting to experiment before developing a sdist support PEP can do whatever they want, they just can’t claim that the lockfiles they use follow the standard.

I think this is the most reasonable position to take at this point. If no-one offers a proposal by tomorrow (25th), then I assume you’ll declare sdist support out of scope for PEP 665, and update the PEP accordingly, and this discussion can move back to the main PEP 665 topic (where I’ll be happy to make the case that the PEP should explicitly require sources to be wheels, on the grounds that the PEP has deliberately rejected supporting non-wheel sources :wink:).

I’ve asked @pradyunsg and @uranusjr privately if they are okay with this, but I have not heard back from them. But yes, what you outlined in my assumption of what will happen if no one steps forward.

1 Like

Oh hai.

I’m back to being subscribed on all things in the Packaging category now. :sweat_smile:


TL;DR: I think it’s a good idea to exclude sdists from this PEP. There are complicated tradeoffs here, and I’d prefer that those get discussed in a separate standardisation effort.


Adding sdists to a lockfile exposes us to a fairly large set of complications, that I don’t think we should even try to solve, certainly not in the first iteration of this format.

Not only does it require wrangling with build environments on a per-sdist basis, (which is somewhat tractable but not really, as discussed earlier in this thread and later in this post), it also quickly diverges to no longer be merely a Python environment management concern – the build system configuration and possibly even system configuration matter for package builds (eg: packages do compile differently based on what’s available on the system, or based on the platform, or can even be doing random.choice(["foo", "bar"]) for the built artifact’s contents). I don’t think there’s any way we’re solving this in general with the existing structure of our ecosystem, not without adding additional constraints or making additional assumptions here.

As far as I can tell, the only way to solve this in general would be to have some mechanism to validate that a wheel built from an sdist has “the right contents” [1] and having some way to communicate validation information about this between the locker/installer in a platform-agnostic manner. I consider this to be a non-tractable problem.


At the level that our existing tools and standards operate on, the only things that we can reliably control for determinism in a cross-platform manner are incoming sdist artifact (i.e. URL + hash; same for VCS). As for everything else:

  • We can’t reliably pin the build-time dependencies of a package (thanks to get_requires_for_build_wheel).
  • We can’t reliably check that the build system behaved/will behave the exact same way, especially across platforms.
  • We can’t reliably check the generated wheel matches what the locker “intended”.

That said, there are relaxations/assumptions that we can make here, which expands what sdists can be considered acceptable.

  • Assuming that there won’t be any dynamic behaviours in get_requires_for_build_wheel enables pinning the build-time dependencies.
  • Making the lockfile limited to a certain platform makes it feasible to require a specific wheel filename from an sdist.
  • Requiring the build systems to respect reproducible builds enables adding in hashes for the generated wheels.

FWIW, I think we definitely have projects that are available as sdists, that fit all of these assumptions [2]. Those would effectively be usable without diluting any of the installation determinism and reproducible guarantees of this lockfile (you still lose the no-Python-code-executed semantics, but… that’s a lost cause once you’re using sdists anyway).

The thing is, I don’t think these are safe assumptions to make in general. There are lots of tradeoffs to be considered here, since we’re exchanging {security, determinism, reproducibility} for {workflow flexibility, compatibility with more packages}. I don’t think this PEP locks out the potential for exploring these later, especially since we all agree that any sdist-consumption semantics should be opt-in anyway.

Figuring out the security and usability implications of such assumptions, and making an opinionated choice of which set of tradeoffs to go with here… that is the problem I’d like to punt over to a follow up effort. :slight_smile:


FWIW, I think it’s also self-evident that we have users who are already happy with the workflows where the only pinning happening on sdists is that the files’ hash/VCS hash and figuring out entire build story is deferred to the installer. This is provided today by “locked requirements.txt” [3] files and is also what Poetry does in its lockfile format.

So… the answer here might just be “we pin only what we can pin for every sdist (URL + hash), and all other bets are off”. If that’s really what we want, that PEP might have fewer words than this post. :stuck_out_tongue:

I’m happy that we’re all on the same page here. :grin:


  1. Ideally, this is something that can’t be “spoofed” easily. Expecting reproducible builds would allow using the hash of the final artifact for this. ↩︎

  2. For example: an sdist that has no dynamic build dependencies, respects reproducible builds and generates a single platform-agnostic wheel is basically something that we can “lock” to generate the exact same wheel each time. This sdist will basically always generate the same wheel with reproducible builds, which is true for all flit-based projects and all (pure-Python) setuptools-based projects without custom build steps. ↩︎

  3. As generated by pip-compile – pinned with hashes. ↩︎

5 Likes

I just updated the PEP with sdist support as a rejected idea. I’m going to lock this topic and have it point back to the “take 2” topic so that any future discussion can be done from the perspective of PEP 665 accepted/rejected.

Thanks everyone for discussing this!

sdists in PEP 665 have been rejected. See PEP 665, take 2 -- A file format to list Python dependencies for reproducibility of an application - #93 by brettcannon .