PEP 751: now with graphs!

I agree - I consider it as acceptable in the context of pip as well. But the PEP describes the operation as “syncing a pre-existing environment to match the lock file” and I’m not comfortable with claiming that pip’s behaviour corresponds to that description.

So it sounds like neither pip nor uv (in its uv pip install form) will implement either of the SHOULD requirements of the PEP. @brettcannon does this affect your thinking on those two points?

I’ve had a detailed read through and see two issues detailed below as
well as an odd duck I feel uniquely qualified to call out as such.

Side note: I only speak for Pex. Pants is a big user of Pex lock files, but I can’t speak for Pants.

Specification - File Format

packages.source-tree

IIUC both of these can be locked:

  1. “foo @ file:///this/loose/project/here”
  2. “foo @ git+https://github.com/org/repo@<branch|tag|sha>”

Right now 2 is ~directly addressed with mention of the
@<branch|tag|sha> being required to be an immutable commit (no tags
or branches).

How does one hash source trees like 1? This is how Pex does it, and
it does the same for VCS requirements which lifts the immutable commit
restriction, allowing tags and branches as well:

source-tree -> build sdist -> unpack -> standard for hashing a directory

*.url

As far as I can tell, there is no facility for dealing with
file:///right/here/a.whl right now. A common real world case I’ve run
into:
A --find-links repo is part of lock and that find links repo is an
NFS mount on dev machines in their home dir. Each dev has a different
home dir and each may have the repo using the lock checked out at an
arbitrary location in relation to their home dir.

Pex uses “path mappings” to handle this. WHen creating the lock you say
... --find-links /right/here --path-mapping "FL|/right/here" ... and
“file://${FL}” is then used as a placeholder in the lock file in place
of all occurences of “file:///right/here”. To use the lock file later,
you must specify --path-mapping "FL|/over/there" where /over/there
is the appropriate path on the install machine hosting the find links
repo.

Expectations for Lockers

Lockers MAY want to provide a way to let users provide the information
necessary to lock for other environments, e.g., supporting a JSON file
format which specifies wheel tags and marker values.

This feels like it comes out of left field in this spec. Perhaps worse,
it goes so far as to soft-specify things including format and fields!

Sorry - forgot about this one:

Specification - File Format

hash-algorithm

This seems like big trouble for a universal lock with multiple indexes where the indexes use different hash algorithms. Say the lock gets all the big boys from the index with the un-desired hash algorithm. This means the locker must download 10s of wheels per locked big boy just to re-hash them to the single hash-algorithm. I’m thinking torch + nvidia-* here. This is alot of downloading!

1 Like

This is my concern as well. It might be mitigated by the fact that I should be able to make a lock file where everything in [[packages]] is installed, provided:

  • only one entry in [[groups]]
  • this group entry has requirements, not project (as the latter must support all extras)
  • no requirement specifies marker

It’s this correct? If so, can that be stated explicitly in the PEP as an invariant? It would simplify my lock file reader.

In addition, this boils down to the file-lock from the previous proposal of there’s only an sdist or one entry in wheels for each package. This is my primary use case for using lock files.


The PEP isn’t too explicit about this, but the requirement to list each package’s requirements from Requires-Dist metadata makes the package entry very verbose as it includes requirements for extras which aren’t even referenced. This is obvious in the PEP’s example lock file.

Is it possible to allow lockers to drop requirements for unused (especially unreachable) extras? This would make the lock easier to visually audit.


Having version in requirements be a version specifier means that installers and lock file readers need functionality to parse and compare version specifiers. It makes sense to support multiple package entries across different platforms with the one requirement entry, but it does mean I will have to use packaging to read the lock file.

(Aside: in uv we ship tooling to make this easier. You can run uv tree or even uv tree --python-version 3.11 to view the packages that would be installed for a a given Python version or platform, etc.)

If this is allowed, then we need some way for the lockfile to encode the platforms that it supports. Otherwise, users can no longer assume that a successful install from a lockfile gives them the intended set of packages for the current Python platform. (Aside: we record this in uv with an environments field, which we validate against at install time.)

1 Like

OK, so we seem to be aligned in general on that then. That’s actually why I had the difference in project and dependency groups in [[groups]]; to signal which packages had all of their extras resolved.

But here’s a question: I have only ever seen a single uv.lock, which suggests everything goes into a single file. So, how do you encode potentially different results when using uv lock --resolution? E.g., if I want to have my CI test both the newest and oldest versions of my dependencies, is that all in one file, and if so how are the packages separated from each other?

If I remember correctly you and @pf_moore had a back-and-forth on that since it’s a bit different than how the rest of packaging does things. I think to make such an approach work you would need the dependencies/edges embed the source part as well as the version (the name is inherently taken care of).

Very quickly thinking about this, you could either create a unique key per item in [[packages]], have a bunch of different keys in the inline table to represent the possible source types, some single key that specifies the type and then option details in other keys to help differentiate, or you don’t use an inline table for dependencies and you use an inline table for the source type details.

I’m not sure how uv does this or what Poetry/@radoering had in mind when they mentioned they like this idea.

The extra that’s used.

Yes, I believe so.

Yep!

That’s actually been in the PEP since the start to avoid using what the newly built wheel lists as dependencies and thus going outside the lock file or unexpected results (as well as pushing wheels more, having less surprises when an sdist build fails, etc.).

Anyway, it’s a “SHOULD” for a reason. But if it truly bothers people that much I’m open to taking it out as long as the PEP somehow says that tools should not ask the sdist at install-time what to install and installers fully rely on the lock file.

To be clear, it’s in the “deferred ideas” section, not rejected. And it’s because I think groups can handle a similar use-case as multiple files. But we can bring it back if others want it.

It’s a direct copy-and-paste from my proof-of-concept, so it seems to work. :grin:

Sure, although that came from you. :wink: I think you’re reading “sync” differently now than when you suggested I put that in. You originally meant for “sync” to mean" delete everything in the environment and then install from the lock file", i.e. it’s like a new environment, just without the environment creation step. I can clarify that’s what the statement is meant to mean. I believe uv calls this uv sync.

See above.

You don’t as there’s no standard way on how to hash a directory of files.

PEP 751 – A file format to record Python dependencies for installation reproducibility | peps.python.org covers that case.

Suggestions are allowed in PEPs and it clearly says it’s an idea, so there is no soft “spec” here. It might seem slightly odd because it came from when there was more concern about locking for other platforms. But I also don’t want to take it out as I don’t want to start people worrying about it either. Plus some people liked the idea, so I can’t win here.

If you go back a few hundred messages on a previous version of this PEP you will find the discussion on this one; it’s deliberate and consensus was reached on it.

As in you want a bullet list of the expected steps an installer would go through? I can add that in once I think we aren’t going to be making massive changes. I primarily skipped it because I needed the pseudo-code to make the PoC work anyway, so it was easier to reuse that than write the same concept in two different forms.

Yes, as long as you didn’t artificially make that be the case (i.e. you didn’t strip out markers).

I guess, but I don’t know how that helps much.

I went back and forth on this one. I think I’m okay with having the PEP say dependencies for unused extras in the lock file MAY be left out for brevity.

Yeah, you can’t strip out markers you don’t use, else we are back to listing upfront all known platforms that a lock file supports (which was a contentious bit when it came up due to how to appropriately represent the support, handling the boolean logic, etc.).

Ok … but the spec doesn’t say you can’t have those sorts of source trees in the lock fwict; so you must either outlaw those sorts of source trees or define the standard for hashing a directory. Pex choses the latter.

It refers to:

packages.sdist.path
  • Required if url is not set
  • String
  • A path to the file, which may be absolute or relative.
  • If the path is relative it MUST be relative to the lock file.

Which decidedly does not cover that case. An absolute path makes no sense in a lock file in the general case and yet the --find-links example I explained is very real and used by users of Pex locks. To record the absolute path of the find links repo as it happens to be on the locker’s machine leaves folks wanting to use the lock with a broken lock.

Well, I clearly like the idea - it’s ~“mine”! But … ok. I fundamentally have a problem with the PyPA being a standards-creation-mill for front ends instead of working towards 1 front end, and I think that frustration is just niggled by this seemingly gratuitous spec sprawl.

1 Like

It doesn’t bother me as such, it just seems odd to say that installers shoud do something when there’s a clear indication that none of them will…

I don’t know what that statement means. Installing a sdist means running the build backend to generate a wheel and then installing that wheel. Are you trying to say that the installer should ignore the dependencies of the wheel generated by the sdist? That seems obvious to me (in the sense that it’s what we do for wheels as well) but sure, that’s OK (assuming it’s worded a bit more clearly in the PEP).

Ah, yes, I’m OK with that - but I definitely wouldn’t describe it as “syncing” (which I feel has too many other implications).

We’ve moved a long way from a lockfile being a way to replicate an environment at this point, so I’m not sure how useful it would be to have a specific option to uninstall everything not in the lockfile[1]. I feel like some form of pip uninstall-all command might be better. Also, there are some tricky design questions - would syncing[2] remove pip and/or uv from the target environment?

Basically, yes. But feel free to leave it until things are settled down. Ultimately, my point is that I don’t think we should rely on implementation-defined behaviour, even if the (reference) implementation is in the specification :slightly_smiling_face:


  1. and I’m not sure how easy it would be to implement in pip - scanning a potentially large environment just to uninstall everything is potentially costly ↩︎

  2. I’ll use that term for now, until we have something better ↩︎

1 Like

Hmm. I think I would be free to create a tool which filters a lock file for a single platform (and record that platform in the [tool.my-tool] table, for documentation). It would remove all platform (maybe more?) markers and then dangling packages, and unmatching wheels. This would allow me to have a lock file with the precise files to be installed.

This too would have to ensure the same guarantees as a locker, but also guarantee that the lock file can be scanned linearly when visually inspected (by a human).


While this tool sounds fantastic and can help with quickly reviewing dependencies during development, I don’t want to rely on any tool (including any I write!) for security auditing, as the tool may have bugs (eg omitting a package which would be installed).

I suppose cat and diff may have bugs but that’s less likely. I also suppose using the same tool which outputs what would be installed, and which would actually perform the install, makes sense.

Yeah we don’t support this right now. You’d have to, like, create two separate files and rename them back to uv.lock whenever you want to use them (obviously bad).

I do think that locking for two “lowest vs. highest” is a bit different than locking for “a project alongside a set of dependency groups”, because in the latter case, we actually want to resolve the whole thing as a single cohesive set of dependencies (at least in uv). Like if your dev dependencies and production dependencies both depend on packaging somewhere, we’d like to use the same version if we can.

(This is in reference to the example lockfile I constructured.)

I think the challenge for me here is: what would the CLI look like to install the test group from the root project? Today in uv, that’d be like uv sync --package root --only-group test. Under this standard, under the hood, we’d then construct "root~test" (which would be a uv-specific thing) to look up the right group.

That seems fine so far, but how would a Poetry user install that same group from the uv-generated lockfile? The mapping from group to name is sort of “proprietary” to the tool. What if Poetry calls those groups something else?

Would we be required to expose a CLI like uv sync --group "root~test"? That could work as an escape hatch (I think it’d be a step down if that were the only way we exposed groups in the CLI), but it kind of hurts the interoperability.

I don’t know what a good solution looks like here. I could imagine a solution whereby we get rid of the name field on [[groups]], and a group is either a project (unique) or a dependency group of a pyproject.toml (like a file path combined with the group name). So we encode those concepts directly into the spec, rather than making it flexible. But then we lose the ability to do things like “include a resolution for the lowest and highest versions of each dependency in the same lockfile”, which I believe is a goal right now.

1 Like

I don’t know what “fwict” means.

And you’re right, the PEP says source trees are allowed because when I tried to leave them out people didn’t like that.

And it still doesn’t address the fact we don’t have a standard to hash a directory of files. You have:

But that doesn’t state how you got a hash from that. Are you hashing the sdist?

Sorry, I read too fast and misunderstood what you were asking for.

It seems you’re wanting some way to have almost a placeholder file path or something relative to as-of-yet specified path that is only know at install time. So far no one has asked for that. If you want some wording to allow you to have relative paths anchored to something other than the lock file as specified by the locker I would be open to that.

I would argue it depends on the purpose of your lock file.

I’m going to ignore this as it feels off-topic and a bit of a dig at me (although I don’t think you meant for it to be taken that way).

Yes, that’s what I mean.

I’ve already clarified the PEP.

How so? It’s been a lot of talking, but the goal is the same to me.

I believe the original discussion was to avoid having to fully delete an environment in case it was viewed as costly. I’m also fine in saying that “syncing” can entail deleting, recreating, and then installing into the “new” environment.

But I’m also fine with taking that out. As I said, it wasn’t my idea to specify this in the first place so I’m not attached to it.

Sure, I can’t stop you from doing anything. But that lock file will also not be portable to any other installer as it wouldn’t understand that the lock file can’t specify on its own what platforms are supported.

It’s tricky because I know you want to support multiple projects in a single lock file for your “workspaces” concept which often complicates things as the starting point is no then no longer singular thing but several possible things. It does explain why you want dependency groups attached to a package. But dependency groups also do not have to be specified in a pyproject.toml that defines [project], so they can be independent concepts.

I think to resolve this we need to decide what “self-contained” means for lock files since that’s why I introduced [[groups]]. Here some key scenarios for what the inputs into a locker to create a lock file:

  1. A project and its extras (e.g., [project] from pyproject.toml)
  2. A group of dependencies (e.g., [depencency-groups] from pyproject.toml or freezing what’s already installed in an environment)
  3. A specific way to generate the lock file (e.g., oldest versions of everything, newest of everything)
  4. What environments are supported

And there can be combinations of any of these things.

Now, given only a single lock file, what are people expecting to be able to do? Is the lock file supposed to tell the user about the inputs it was locked on upfront? Or does the user specify what they want to be able to install and “self-contained” in this instance simply means pyproject.toml isn’t required as long as the installer can be told what root(s) in the lock file to install for? And at what point do we say, “that requires a separate lock file” (and yes, that would require bringing multiple lock files back into the PEP which I’m fine with it that’s what we want)?

The answers to those questions determines what we write down in a lock file beyond the packages that have been locked.

1 Like

I went ahead and dropped this and just left installer must support installing into an empty environment.

1 Like

I meant FWICT (from what I can tell), the lock spec does not forbid locking source trees. And in order to lock a source tree, which may have ignorable garbage that should not affect the hash of the source tree (.pyc files, pytest caches, mypy cahces, etc), the way I found to get a reproducile hash was to 1st build an sdist, then unpack that (since the sdist tarball will generally have non-reproducible timestamps embedded), then hash the dir the sdist was unpacked to. For that I was not specifying the standard that Pex happens to use internally, I figured you’d come up with a standard as the PEP author. For the record though, Pex uses:

  1. os.walk the unpacked sdist dir and collect all file paths in a sorted list.
  2. hash the bytes of that sorted list of names, relativized to the sdist unpack directory, joined by “” and encoded in utf-8
  3. update the hash with the hash of each of the sorted files contents in order.

I just want to be able to handle the example I gave. That is, a lock who has at least some locked packages coming from a local file-system --find-links repo, and consumer of the lock who will have that find links repo be mounted somewhere other than the absolute path it was found at when the lock was created. The case I gave, --find-links repo somewhere under the user home dir, the code repo using the lock somewhere else, is, again, real and repeated across different orgs. How that is handled I have less concern over, but I gave the example of how Pex handles it today, with absolute paths being able to have placeholders in a lock file.

I’ve misunderstood something. I thought that if I have a lock file valid for only one environment (platform, OS, Python version, etc), then any installer will be able to use it, and all installers would install all packages listed in the file. What happens when trying to use this lock file in another environment, I’m not too concerned (I suppose I would prefer the installer to fail, eg if some package’s marker-conditional requirement isn’t satisfied, or a wheel’s platform tag doesn’t match).

Also I’m ignoring sdists here (internally we build and host wheels for all of our sdists anyway).


I’ve noticed an incompatibility with our workflow: sdist URL (or path) and wheel URLs (or paths) must be specified, if specifying sdist/wheels. Our file download locations are subject to change (and in fact, the query parameters which include authentication change basically every time).

Is discovering the download location of an sdist or wheel (when provided an index URL) considered part of resolving? If so, we can’t use this lock file as we need to provide file hashes. If not, could we make URL/path optional (but mutually-exclusive)?

Correct.

Well, it depends on how stringent you want to be in the face of source trees. The fact you’re using a directory is already dangerous as you’ve pointed out. And getting a reliable hash of the contents would be a bit finicky.

But step 1 – creating an sdist – is not consistent if the build back-end changes on you between hashing/creating the lock file and validating/installing (e.g., a newer build back-end version could change the order of the fields in PKG-INFO). How do you handle that?

Yes, IFF the files can be installed on the platform and there aren’t any files missing for that platform. That’s why you need to record all dependencies per package even if you don’t lock for them; to detect when you are missing something and thus can error out.

That’s currently what would happen.

You mean locking and creating a lock file (which is the only time resolving occurs by design)? I don’t know how else to make it work if you can’t find the files to lock against (unless you’re talking about something else like literally just freezing the state of a virtual environment so there is no searching). Technically, how the locker makes it work isn’t specified by the PEP.

They are already mutually-exclusive (see e.g., packages.sdist.url and packages.sdist.path). And sure, we can make them optional.

1 Like

That means either pyproject.toml changed (you literally changed it to point to a new backend) or pyproject.toml did not change, but the constraints on the build-backend requires were loose enough to allow for such a change over time or there was no change at all and the build backend produces non-reproducible sdists. In all 3 cases Pex simply fails at lock install time with a hash mismatch as it should.

An example of this 3rd case in action is found here:

In that case Pex locks caught a non-reproducible VCS requirement lock at a fixed commit sha.

1 Like

This is the case I’m thinking about. I don’t have insight into how common of an issue this is, especially as the default semantics are unpinned.

I also don’t know how common this occurs either, but I suspect it isn’t as common as it could be since dicts started having expected iteration order.

What do other people think of hashing source trees as Pex does?

Or do people not care about worrying about this (right now in this PEP)?

I think I’m a little confused. Isn’t it the case the PEP must either outlaw source trees in locks or else describe how they must be locked? If a lock has hashes for 99 packages but not for 1 source tree it seems to me the security guarantees of the lock are all for naught.

For example, before Pex gained the ability to lock source trees, it would fail fast for any input requirements that were source trees explaining why the source tree requirements were not allowed.

1 Like

Personally I don’t think the PEP should require hashes for source trees, in part because the problem of “how to hash a source tree” seems outside the scope here – I feel like that would merit its own PEP entirely (though I’m fairly new to the PEP process so that may be wrong). I don’t think any of the other candidate users of the PEP compute source tree hashes today nor expect it to be a requirement in the format.

Would it not be sufficient to:

  1. Make “flag to (dis)allow source trees” an expectation for installers, like it is for source distributions.
  2. In Pex, write these source tree hashes to [packages.tool]? I don’t think that would be a spec violation since others would still be capable of installing Pex-created lockfiles.
1 Like

I’m not sure I fully understand the first question, but personally I don’t think URL / path should be optional. There’d now be a requirement that installers support the Simple API to look up available distributions, whereas they don’t have that requirement right now. In some cases we have to expect workflows to accommodate the standard rather than the other way around.