Supporting sdists and source trees in PEP 665

stewartmiles · November 15, 2021, 11:18pm

I really don’t see how you can hard disagree here. When using an environment manager like pipenv an editable package is simply a reference to your source tree. For example, if I checkout the tree following tree at commit 1337B33F:

Pipfile # References -e ./mypackage
Pipfile.lock # Reference -e ./mypackage
./mypackage # Only refers to locked packages

This is yields a reproducible environment on any machine where mypackage is referenced via a link under .venv/lib/python3*/site-packages/easy-install.pth etc. So the Editable package in this context is a source package where the code is already trusted since the developer curates it themselves.

A potential implementation of a locked package is name+version+hash but when you consider what you’re trying to actually achieve (i.e validate a package is exactly what you ask for) a trusted source tree fits the definition of a locked package.

brettcannon · November 16, 2021, 1:04am

Lockers will also have to make sure that the lock file is scoped appropriately for the platform that the requirements returned are for (I guess by reading WHEEL, otherwise whatever the current platform is). Otherwise you can’t assume the requirements will work on other platforms since you are executing code that could operate differently on other platforms.

Correct, I’m concerned about “running code during installation steps” both from a security perspective and simplicity for installers.

You would have to build all sdists that get pulled in as dependencies to get a full lock file.

I’m used to that; I have never come up with something that pleased all the people all the time.

If you specify a source tree down to the commit and the build is reproducible, then yes. But “editable” to I think most of us is not a source tree at a specific commit, but what PEP 660 supports, which can very much be simply a directory that you ran pip -e . on and edited a bunch compared to whatever copy of that code someone else may have. If what you mean by “editable install” is “source tree”, then clarifies a lot, but if you mean pip -e ., then that’s does mean the door is open to the code differing from what the locker saw when it created the lock file.

uranusjr · November 16, 2021, 5:17am

I think there are two separate things here. Keeping build dependencies match with runtime dependencies does not necessarily mean they must exist in the same resulting graph—they just need to have matching results. So keeping build dependencies separated with runtime ones is the more flexible approach since it fits both use cases, and the consistency can be handled by the locker.

Disabling build isolation can also be handled even if build dependencies have their own dedicated section, since it is, from dependency management’s perspective, merging build and runtime dependencies into the same graph. So this does not need to handled specially by the lock file format—the locker implementation can either just merge the dependencies at lock time according to the user’s discretion (and force the user to also disable build isolation on install time to maintain that build-runtime dependency unification), or simply keep build and runtime dependencies separated but matching (which allows the lock file to always work no matter the user enables or disables build isolation). The only combination this wouldn’t work is if the locker generates without disabling build isolation, but the resulting lock file is used by the installer with build isolation disabled, which is pretty clearly a user error, so the installer can choose to exit quickly when the merged dependency graph is not satisfiable.

So the bottom line is, I believe build isolation can entirely be a concern handled by locker and installer implementations, and the lock file format does not need to contain special knowledge about it.

sbidoul · November 16, 2021, 8:18pm

About local directories, and editables.

To me, editables are just a variant of local directories. Historically they have been linked to VCS references in pip, but that is not true anymore, since PEP 610. So the question we need to address is if it makes senses to lock local directories. Then editables will follow naturally.

The first kind of local directory is the main project directory (as in pip install -e .). That one is naturally not part of the lock file.

Other local directories would make sense in a monorepo organization where several projects are closely related. But if they are in the same repo, they are naturally locked together by the git commit of which the lock file is part. I.e. when you check out the project, you get the lock file and associated projects in the same checkout.

So with this use case, local directories would only help in the lock file as a convenience. But I’d be inclined to not include them in the lock file as they kind of stand on the same foot as ..

Thoughts ?

sbidoul · November 16, 2021, 8:20pm

Can you elaborate this or give an example ? I’m not sure I understand if it is something that is different when working with wheel or sdists or VCS references dependencies.

That said, the whole UI/UX for generating a multiplatform lock file is unclear to me. But I suppose this can be left as an open research topic for tool authors to explore and does not necessarily influence the design of the lock file format.

sbidoul · November 16, 2021, 8:50pm

OTOH if we don’t include them in the lock file that might lead to gaps in the dependency graph in the lock file… If we include them the installer would need to protect against path traversal issues / attacks.

brettcannon · November 16, 2021, 11:24pm

Wheels are inherently specific to the platform they are built for, and so the dependencies they list are known to work for that same platform; a wheel built for Windows will list what it needs to run on Windows and no other platform. This is why we list dependencies on a per-wheel basis.

Sdists and source trees, though, inherently have no specific platform. That means when you use PEP 517 to gather metadata on what an sdist or source tree will require to be installed, you must somehow know what platform the information applies to because PEP 517 has not concept of cross-platform compilation. So if you get the metadata for an sdist on Windows then you must either be told how generic the metadata is via WHEEL or assume that the metadata is Windows-specific and does not apply to other platforms. This means that unless the sdist produces a pure Python wheel your lock file will inherently be locked to the platform you are building on to some degree based on what WHEEL says or it’s the strictest wheel tag for the current platform.

brettcannon · November 20, 2021, 12:39am

Does anyone think they can make a concrete proposal to add sdists to PEP 665 by Thursday, Nov 25? It’s okay if nobody can as adding support to sdists in a subsequent PEP is totally fine and I think somewhat expected at this point.

sbidoul · November 20, 2021, 11:07pm

Sorry for the late reply.

Is that not true also for the root project for which the lockfile is generated ?

In practice that might not be an prevalent issue. Many projects use environment markers in their dependencies, so they have the same dependencies in their metadata, irrespective of the platform for which the wheel is built.

sbidoul · November 20, 2021, 11:39pm

I personally will not have time for specification writing before the end of the year.

To sum up my suggestions above, I think the changes required to the current PEP 665 to make it sdist and source tree friendly are these:

removing filename from package entries
adding instead environment markers and/or tags to package entries
incorporating VCS urls with commit id using the PEP 610 fields
adding build dependencies scoped to a package entry

If these don’t make it, well… 80% of my projects will not be able to use PEP 665, and I’ll stick to requirements.txt which work fine in practice. Maybe the use cases I see in my practice are outliers, after all, I don’t know.

pf_moore · November 21, 2021, 11:30am

Can I ask what you mean by “sdist and source tree friendly”? If you mean “lockers will be able to specify something that an installer can install, but only if that installer supports installing from sdists” then I’d consider that a problem - if the user can’t use a PEP-compliant locker in confidence that a PEP-compliant installer will be able to do the install, that doesn’t seem like a usable interoperability standard, to me.

Having said that, the current draft of PEP 665 has sdist support as an open question. If that question is resolved as “sdists are not supported” I would fully expect the main PEP text to be clarified to state that only wheel files can be specified - i.e., to state that url must point to a file conforming to the wheel spec.

If the PEP allows url to be a source tree/sdist, it would have to mandate that installers support source installs (and how to do that is what this whole thread is about, surely?)

fungi · November 21, 2021, 1:37pm

[…]

Can I ask what you mean by “sdist and source tree friendly”? If
you mean “lockers will be able to specify something that an
installer can install, but only if that installer supports
installing from sdists” then I’d consider that a problem - if the
user can’t use a PEP-compliant locker in confidence that a
PEP-compliant installer will be able to do the install, that
doesn’t seem like a usable interoperability standard, to me.
[…]

Thinking through it, seems like supporting sdists in a locker would
require they be able to (call out to something which can) at least
build wheels from those as part of the resolution process.
Otherwise, there’s no guarantee they’ll have access to sufficient
package metadata to even know what versions will work in a given
environment. Wheels provide this, but the resolver may have to try
multiple versions of sdists until it can find one which builds on
the target platform.

For source trees, it’s potentially more complicated since you can’t
necessarily even know what the local version number is without first
calling its build backend to create the necessary metadata (most of
my projects infer their versions at build time from Git tags and
similar information, after all).

layday · November 21, 2021, 3:04pm

My understanding is as follows. For the locker to generate wheels that means (a) to potentially constrain the lock file to the locker’s build environment/platform, (b) to distribute built wheels with the lockfile and (c) to be able to encode the wheel’s location relative to the lockfile. The first two are not trivial limitations to impose on the user but the lockfile will be kept “pure” and the locker and installer will not share responsibilities. On the other hand, for the installer to be able to install wheels that means it has to double up as a build frontend. The lockfile as a format will be diluted - it won’t enumerate strictly installable artefacts and the package metadata in the lockfile will be incomplete. In essence the installer will have to perform a second pass over the lockfile and “re-lock” it. This is more practical but also a lot more complex to implement.

Could a lockfile perhaps be marked as “not finalised” (or “impure” or whatever) and in its finalised state, it must not contain sdists? When distributing a lockfile with sdist links it will be finalised = false and will require the use of a locker (not an installer) to finalise it for installation in the user’s environment. The responsibility will remain with the locker to re-lock the lockfile and we can keep the two components, locker and installer, separate. Lockers could also choose not to support (or produce) lockfiles which aren’t finalised.

sbidoul · November 21, 2021, 6:46pm

This is echoing Brett’s assumption above that sdist support is going to be optional for installers.

As how to make that predictable for users, this is part of UI/UX questions that will probably have to be resolved during a testing period. For instance, the locker could warn the users if it meets sdists or source trees, or it could have a --binary-only mode that errors-out if any dependency is not available as a wheel.

That said, independently of sdist support, UI/UX for multi-platform support is also unclear to me. For instance, a lock file may well not work on some platforms because some dependencies are not available as wheels for that platform. Should the locker attempt to warn the user about that ? Or can it be left to the installer to error-out.

EpicWink · November 21, 2021, 10:39pm

I think the use-case the PEP is targeting is when the locker has a known set of platforms which it can produce a dependency set for, not every platform

pf_moore · November 22, 2021, 8:42am

OK. Brett’s comment doesn’t expand on how tools would communicate whether a particular lockfile needed sdists, which I’d consider part of “sdist support”.

To be clear, if PEP 665 makes sdist support out of scope, I’d expect it to say that all URLs must point to wheels. If it puts sdist support in scope but optional, I’d expect some sort of metadata in the lockfile that explicitly says if consumers need to support sdists (so that consumers can fail early if they don’t have that optional support, and producers can clearly state what features are needed).

I’m saying “expect” here with my PEP delegate hat on, by the way In particular, I’m saying it without any sort of judgement as to whether sdist support should go in or not - if anyone wants my personal feelings on that I can provide them, but I’ll do so in a separate message.

brettcannon · November 23, 2021, 12:00am

If we ignore sdists, no, because since wheels provide all of their data statically you could theoretically create a multi-platform lock file (which is by design based on feedback from PDM and Poetry).

brettcannon · November 23, 2021, 12:25am

layday:

My understanding is as follows. For the locker to generate wheels that means (a) to potentially constrain the lock file to the locker’s build environment/platform, (b) to distribute built wheels with the lockfile and (c) to be able to encode the wheel’s location relative to the lockfile. The first two are not trivial limitations to impose on the user but the lockfile will be kept “pure” and the locker and installer will not share responsibilities. On the other hand, for the installer to be able to install wheels that means it has to double up as a build frontend. The lockfile as a format will be diluted - it won’t enumerate strictly installable artefacts and the package metadata in the lockfile will be incomplete. In essence the installer will have to perform a second pass over the lockfile and “re-lock” it. This is more practical but also a lot more complex to implement.

I think that’s a fair assessment of where things stand. It’s a question of whether sdist support should be in this PEP or consider an out-of-band thing (at least for now).

But then what’s the initial point of the lock file? If you have to recompute things then that waters down the supply chain security by dynamically adding things in an unsupervised manner. Even if you make it only additive to the initial lock file, what the sdists suddenly require as dependencies will lead to supply chain attacks.

To me, this suggests that sdist support is very much a development process issue and not something to deploy to production with.

Probably both. If you ask a locker to support a platform and it’s not possible then the locker should tell the user that. And the PEP already says that if a dependency graph resolution doesn’t work out then it’s an error, and so the latter case is already covered.

It’s a little tricky as even if an sdist is listed in the lock file it isn’t necessarily going to be required for a successful install. You could have an sdist listed as a fallback only and manage to make a complete install w/o an sdist ever coming into play.

So if you required tools to opt into using sdists then that could be the mechanism as to whether to even include the sdists in the dependency graph or not. Otherwise we could have an unsafe = true key in the [metadata] table to signify that there are sdists to take into consideration and act as a marker for wheel-only installers to quickly error out.

pf_moore · November 23, 2021, 9:13am

I think the spec runs the risk of collapsing from too much flexibility if you take that argument to its logical conclusion.

A locker could, by the same logic you describe here, include any one of the following:

A distribution specific file like a .deb or a .rpm package.
A conda package.
A raw .py file or shared library.

It’s no more obvious to me that lockers should be allowed to include sdists than any of these (or indeed, anything else they feel would deliver the same environment).

If PEP 665 wants to allow lockers to specify non-wheel sources “for future expansion” or “as an installer-specific fallback”, then I think it needs to define that mechanism¹. Otherwise I think that it should require that only wheels are specified, and make the whole mechanism including letting lockers specify non-wheels be the non-standard extension.

¹ It doesn’t have to be complex. A single key, “extensions” which holds a list of extended features required to install the lockfile, with no valid values being defined by default, would be sufficient. The sdist support proposal could then simply define a “sdist” extension, with appropriate changes to the rules. You could even reserve all extensions starting with “X-” for experimental use, saying that no standard is allowed to use such an extension name, if you want to.

brettcannon · November 23, 2021, 10:44pm

Couldn’t you use the [tool] table for that, though? Whatever extension you come up with is going to require specific tool support anyway since it isn’t a standard.