PEP 751: now with graphs!

pf_moore · November 15, 2024, 11:43am

I think there’s another aspect here, that I (for one) have been missing. I’ve been thinking of a lockfile as an independent object, that you can hand to someone on its own and say “install from that”. That’s the situation when replicating an environment, distributing something like a webapp, or specifying how to build a standalone application.

But Hynek seems to be describing a different scenario^[1], where the lockfile is merely one component of a project, and is used as part of the workflow to “sync”^[2] a project’s workspace. In that context, there is clearly a distinguished “current project”, but the lockfile isn’t really being used as an interoperability tool.

I think this is an important distinction, precisely because the “part of the project workflow” lockfile is directly tied to the workflow, and hence the workflow tool that the project uses. As such, this type of lockfile isn’t actually used for interoperability (except in the very broad sense of “make it easier to switch workflow tools”).

Maybe we actually need to take “project workflow lockfiles” out of scope here, and limit the standard to only being about lockfiles for the purpose of reproducing an environment? That could leave us in a situation where tools retain their own proprietary extensions to the lockfile format (whether via the [tool] namespace, or by having their own custom format extensions that aren’t intended to be interoperable) and simply allow exporting a standard-format environment definition lockfile. I know this feels like a significant step backward, as we seem very close to a universal format here, but at the same time, a significant part of the complexity in the spec seems related to “project workflow” aspects of lockfiles (and it appears we’re not done with that case yet…), whereas the “environment reproduction” scenario has been (as far as I can see) covered and stable for a long while now.

On a related but separate note, I’ve been thinking about how I’d accept this PEP once it’s finished, and I’m pretty sure that as things stand I’d want to defer acceptance until all of uv, poetry and PDM have switched to using the new spec as their internal lockfile format, before formally accepting it as a standard. That in itself says to me that there’s a lot of this spec that isn’t actually related to interoperability.

and this may also be how uv and poetry are viewing lockfiles ↩︎
whatever that means - it’s not a workflow I use so I’m a little hazy on the details ↩︎

steve.dower · November 15, 2024, 11:56am

I’ve been considering this situation as well, but I don’t think it actually matters that much here (perhaps because I dislike the idea that you’re supposed to “install” your current project in order to run it… no, you should just run it ).

The lockfile is as much a part of the current project as the sources are, so there’s no reason to specify that collection of files within the lockfile. There’s no reason to “lock” them, because you then end up with a recursive lock that has to lock the file containing the lock. The lockfile is for locking the other files you need, and it works just fine for that using package versions/hashes.

In the side discussion about directory hashes, I pointed out that monorepos have some genuine scenarios where you may want to reference a source tree outside of the lockfile’s directory tree/cone, but that’s really only because the root of the repository^[1] isn’t the root of the lockfile’s scope.^[2] But it really only means that there are cases where we can assume more files are already locked.

But if I just clone a repository and want to reconstruct the exact development environment, the lockfile does not need to lock that repository as well. It can lock everything external to it, but it doesn’t need to lock the current bit of code.

And if you’re in a situation where you have multiple repositories that flow into each other, and you don’t want to lock them, then you don’t actually want a lockfile. You want a different tool, because you’re doing a different thing.

(Now, people who want a workflow that is based around pip install -e .[dev] in order to get the dev dependencies aren’t going to get it from the current proposed lockfiles. But I’m pretty sure most are only doing it that way because it works today, and would switch to an alternative that worked. Those who prefer this approach don’t need to switch.)

For want of a better term to describe the “project” as a whole. ↩︎
Again, for want of a better term to describe the directory containing the lockfile, that the lockfile applies to, as specified in the PEP (File Name section). ↩︎

pf_moore · November 15, 2024, 2:20pm

I agree with what you’re saying here, but it feels like we’re getting perilously close to saying that people shouldn’t want what they are telling us they want

I think that because we’re one step removed from what the actual end users want here, we’re at risk of falling into an XY problem, where we’re trying to develop a lockfile that solves the challenges workflow tools have with locking, rather than solving the problems that users have, which the workflow locking solutions are trying to solve. In theory, those two things are the same, but in practice they might not be (and worse, different workflow tools might be solving the underlying user problems in incompatible ways!)

The difficulty is that there’s no way that uv, poetry and PDM are all going to change the model they have for locking just because we’re trying to create a standard lockfile format. And there’s no reason that they should - that would be a clear case of standards dictating UI/UX decisions. This is why I think that we need to be more strict about only specifying the interoperability aspects of the lockfile format, and encourage tools to use the tool-specific sections of the format to handle workflow-related aspects. Yes, this may mean tools need an “export portable lockfile” command for interoperability, rather than simply being able to share their normal lockfile, but I don’t see why that’s so bad, to be honest.

Stealthii · November 15, 2024, 4:24pm

To clarify one point I made over on uv #7533, it’s worth noting that although version can be defined dynamically in source control, builds of the package generally provide the version statically (for both source tarballs and wheels) as part of a reproducible build (the main purpose of being able to produce and publish these on test PyPi):

❯ ls -1 dist/*
dist/mypackage-0.5.0.dev5+g5d056c1-py3-none-any.whl
dist/mypackage-0.5.0.dev5+g5d056c1.tar.gz
❯ unzip -p dist/*.whl '*.dist-info/METADATA' | grep -E '^Version: '
Version: 0.5.0.dev5+g5d056c1
❯ tar -Oxf dist/*.tar.gz '*.egg-info/PKG-INFO' | grep -E '^Version: '
Version: 0.5.0.dev5+g5d056c1

The expectation for a local project package that has a dynamic version (source = { editable = "." } specifically) would be simply not defining version in the lock file for the local package (or as others have suggested, providing a dynamic reference or omitting presence entirely). This by nature isn’t a lockable reference (especially when set by VCS signatures for an editable package) whereas other uses of lock files such as multi-project mono repos or static versions uphold the current behaviour.

I am not aware of common use cases where a source tarball provides a dynamic version on build, making it unreproducible, and arguably can’t be supported at all by lock file usage (especially when used outside of the context of developing said package). Whilst I work on mypackage with a dynamic version in metadata during development, I produce builds that have a static version (the default behaviour for builds from hatch-vcs, setuptools-scm, poetry-dynamic-versioning).

charliermarsh · November 15, 2024, 7:01pm

Paul Moore:

Maybe we actually need to take “project workflow lockfiles” out of scope here, and limit the standard to only being about lockfiles for the purpose of reproducing an environment? That could leave us in a situation where tools retain their own proprietary extensions to the lockfile format (whether via the [tool] namespace, or by having their own custom format extensions that aren’t intended to be interoperable) and simply allow exporting a standard-format environment definition lockfile. I know this feels like a significant step backward, as we seem very close to a universal format here, but at the same time, a significant part of the complexity in the spec seems related to “project workflow” aspects of lockfiles (and it appears we’re not done with that case yet…), whereas the “environment reproduction” scenario has been (as far as I can see) covered and stable for a long while now.

To speak openly, I’m kind of bummed about it but I find myself nodding along to this. As we get deeper into the conversation… I more and more worry that it’s the correct answer – that we should focus on installer interoperability, and therefore on a more limited format that’s closer to a standardized, fully-featured requirements.txt (in the pip-compile sense). I think such a standard would still achieve basically all of the interoperability goals (dependabot could analyze it, cloud runtimes could support it, installers could support it), except that users would have one more format to worry about and understand (a major downside! One that I pushed hard against in my previous posts, and that I think is worth trying to avoid – but it’s a hard problem).

(In the uv context, this would likely be something like: uv export can export to this format, and uv can install from this format. But it would be downstream of uv.lock.)

I know I advocated for pushing for a single standardized lockfile that all these tools could use, but the requirements keep increasing in scope (I’m partly to blame), and I’m more and more worried about how much coupling we’ll need to introduce between the lockfile standard and the CLI / concepts of the tools that integrate.

I suspect that if we limit the focus to installers, (1) it will become much easier to define spec compliance / expectations, (2) we can sidestep a lot of the hard questions that we’re being faced with here (e.g., I suspect we can avoid dynamic metadata entirely?), and (3) we can simplify the format a fair bit.

Perhaps to put it differently: in its current form, I’m having to think a lot about whether uv can use this, what specific requirements we’d have, and how it would fit into our CLI. On the other hand, if we limited the format to installers, most of those concerns disappear and I can enthusiastically adopt / push it.

brettcannon · November 15, 2024, 11:38pm

Yep, I get it. You’re still want to record the lock file, but it isn’t expected to leave the project and be distributed in any way.

What’s the artifact you’re installing then? If you’re pointing e.g., pip at a directory with a random wheel and saying, “install that file” via a glob pattern or something then there isn’t something like that currently in the PEP. We would have to introduce an entirely new mechanism to say, “install whatever makes sense in this directory of files, and all without a clear selection criteria beyond hoping there’s a single wheel file”.

Sorry if you thought I wasn’t supporting that discussion! I totally think it’s worth having that conversation, just not in this topic as its got enough going on.

Well, based on subsequent comments it sounds like you may have time to start that conversation now and beat me to being ready to submit the PEP for consideration w/o having to even ask I wait.

Sort of, but potentially not in a critical way. I’ll outline two ideas I have below that may save this.

So going back one revision of the PEP to the “set of packages” approach like PDM?

Before I throw in the towel on trying to have a universal lock file and ditch the work I have done the last few months (and yes, I know about the sunk cost fallacy), I can think of two ways of potentially solving this. One is to simply not record the project under development as part of the lock file. If you view what’s in project.dependencies and project.optional-dependencies as what you’re locking then you don’t need to really worry about the project itself. It’s somewhat like making implicit dependency groups. That does mean your installer would be doing “lock file + local install”, but I don’t think that UI is complicated as dependency groups already probably make installers specify whether to include the project itself or not. I can see someone asking how to handle extras, though, and I don’t have a clean answer off the top of my head where installer UI doesn’t come into direct play (i.e. the point @charliermarsh brought up about synthetic dependency groups for his root~test example).

The other potential option is to stop thinking about source code and instead think about source locations (which I think @charliermarsh has suggested is the way uv thinks, but I’m not sure if it’s quite as extreme as what I’m proposing). I think because the common case is installing a wheel file that can be easily hashed, I have been thinking about what source gets installed. But if you take a step back and think about the where you get the source as what you lock then suddenly things like version numbers are not universally important for all the ways you can get source code. So for things you get from an index, the package name and version is important to figure out where you can get an sdist and/or wheels to satisfy that requirement (and conveniently you can hash that stuff easily). But what about an editable install or a source tree? In those instances you really care about where you get that code as that’s what’s required to get it installed, but the version doesn’t have to play into it. Even in the VCS case you’re thinking about a commit instead of a package version in the end. So identifying which node an edge points is really about where you plan to get the source to satisfy that requirement; that sometimes requires a version, sometimes it doesn’t.

Now this latter option does mean the “hash everything” folks may not be happy as it makes hashing an optional thing that people include for security only instead of a first principle on how to specify what source to install (but I would still push for including hashing when it doesn’t hinder things, e.g., hashing specific files).

Anyway, those are the ideas that come to mind.

h-vetinari · November 16, 2024, 12:08am

This may be a bit out of left field, but it sounds very much like the “virtuous collapse” phase after reaching the peak of complexity. These terms come from a recent keynote by the principal language architect of Java^[1], whose thrust is general enough to apply here as well.

In other words, this process should be embraced, even though it feels like “giving up” hard-won design work. It leads to a much better place longterm. I really recommend to check it out.

I know that the Java and Python spaces barely overlap, but even for those who have some sort of mutual dislike, I think it’s hard to deny that Java’s language evolution and stewardship in recent years has been exemplary. ↩︎

dhduvall · November 16, 2024, 12:56am

The project is an app which ultimately gets deployed in a container. The CI build does a non-editable install of the project into the venv, and the venv is copied wholesale to /app. There is no single-file artifact of the build at all, never mind that we wouldn’t be using it to install, distribute, or deploy anything.

Having the commit hash of the project in the image used for a deployment be in the version number is important for ops, and I think that has to be done with some form of dynamic versioning.

If any of this is an anti-pattern, I can certainly do something different.

bluss · November 16, 2024, 8:04pm

I think this is different from what lock implementations do today. Uv does not download and compute hashes for all wheels and sdists that it records in its lock file (I can’t see that it would have time to do that with how it operates).

I have a concern that this unintentionally mandates a lot of work lockers don’t want to do (If I’m not missing something?).

mitsuhiko · November 16, 2024, 8:26pm

Hard problems should be solved, but hard problems also take time. In the same keynote you can also see that they did not have a problem with it taking a decade to solve a problem. Slow is smooth and smooth is fast.

I have started a new topic for the dynamic metadata brainstorming: Brainstorming: Elliminating Dynamic Metadata

prophile · November 17, 2024, 2:55pm

It sounds from the outside like the two use-cases basically differ on “should the root project (assuming that’s relevant) also be locked”? To replicate an environment with the root project installed yes, to replicate enough of the environment for a CI no. If a decision about it can be made at the time of generating the lockfile, would it be enough for tools to just (“just”) allow not including the root project itself without other changes to the format? Presumably the default would be to include it, but some equivalent of “–no-root” from Poetry’s install phase would open up Hynek’s use-case of building a lockfile to reproduce an environment which is logically a snapshot before installing the root project rather than after. The lockfile would still have the semantics of being enough instructions to exactly replicate an environment, but differing by whether that one package be included.

I’ve worked on quite a few projects where the standing instructions on Poetry were to use “–no-root” and perhaps that does suggest there’s two different use cases. YMMV.

charliermarsh · November 17, 2024, 9:32pm

Something like that, yeah. A flat list of packages, each with a marker to indicate when it should be included, would make sense to me. A lot of things get simpler if we narrow the scope – for example, I don’t think workspaces are relevant? (Users could just export a separate lock for each member.) So multiple entrypoints is gone, etc.

(I hope I’m recalling the format correctly; I’m personally a lot more flexible on the format if the scope is limited to installer interop. I’d mostly want to understand how / if we want the format to handle multi-platform, extras, etc.)

Yeah, I think something like this is reasonable. IIUC, it’d mean that we wouldn’t necessarily record versions in the lockfile for source trees? I’d note, though, that I think installers would still need to compute and “validate” versions at install time, since given a source tree that builds a package foo, it’s valid for other packages (even from remote registries) to declare requirements like foo==1.0.0. If some package in the lockfile requires foo==1.0.0, then the installer would need to error if the computed version of foo no longer satisfies the specifier.

This I’m a little less sure of, since I’m not sure how it would support things like workspaces / multiple entrypoints?

steve.dower · November 18, 2024, 4:31pm

The anti-pattern here (and I don’t think you suggested it) is that the lockfile be part of that commit hash, which I hope is fairly obviously impossible (you don’t know the commit hash until you’ve committed the lockfile, at which point updating the lockfile will change the commit hash).

If you have a separate repository containing deployment steps, the lockfile in there can (and should!) refer to a specific, hashed, package of your app. This makes sense, because your lockfile and app are “distributed” separately - different commits. I’m 99% sure that hasn’t been part of earlier suggestions (that the lockfile lock an adjacent directory that will be installed editable).

I’m okay with saying that after listening to them describe what they want But more likely I’m going to say that this isn’t what they want, and perhaps they’ve got a different understanding of what a “lockfile” is meant to do.

I don’t think I’m taking too much of a tool focus, as I don’t have a tool in play here, and I don’t really use any tools with a lockfile workflow. But I can still do the requirements analysis to see that some of the stated requirements (a) don’t fit the proposed scope, and (b) are easily covered in existing ways, likely for free (if the lockfile and the source files are in the same package/commit/source, they’re already equally locked).

Yeah, I’d be quite happy with a format that is interoperable for installs/reproduction only, and allowing tool-specific metadata so that using the right tool enables other scenarios (e.g. incremental updates, or automatic refresh, or whatever workflow steps they want). Tools can figure out how best to handle lockfiles that don’t already have their own metadata in them - we don’t have to specify that.

offby1 · November 18, 2024, 7:38pm

Certainly one of the approaches library maintainers could take is not to ship a lock file for their tools. Speaking for myself, that’s less desirable of an outcome, since it leaves me vulnerable to surprise upstream changes breaking my CI simply because my CI ran at a different point in time and Hyrum’s Law reached out and smacked my fingers.

If the PEP lock file format is adopted by tools like uv without a provision for dynamic versions then that’s going to be the approach I need to take, because dynamic versioning is too useful to give up for packaging testing (which is one of the few tests that I have that by definition have dependencies on services outside of my CI environment).

steve.dower · November 18, 2024, 8:00pm

Your scenarios makes perfect sense (I assume by “ship” you mean “check into my repo so that contributors/CI get matched versions”), and is what lockfiles are intended for. Why do you think this wouldn’t work?

offby1 · November 18, 2024, 8:24pm

What wouldn’t work? A lock file that doesn’t support dynamic versions means I can’t use it for CI without relaxing the restrictions that are the whole reason it’s checked in in the first place.

steve.dower · November 18, 2024, 8:26pm

I don’t understand why you want to use a tool for locking versions for a task that requires unlocked versions. Why not just use a normal resolver+installer?

pf_moore · November 18, 2024, 9:51pm

Isn’t the point here that Chris wants to lock some versions but not others, and for that situation, a lockfile that doesn’t support leaving some versions unlocked is of no use? The specific case seems to me to be “lock everything but the project” and I don’t understand why that can’t be done with a lockfile that includes precisely that (everything but the project) plus a normal install of the project.

But this whole use case confuses me, so I may be missing something.

steve.dower · November 18, 2024, 10:35pm

Right, and my point is that isn’t a lock file (by the definition of this PEP), but a specification file that has to be fed into a resolver first (specifications can, of course, specify one exact version and a required hash for each package if they want to).

A lock file (by the proposed definition) does not require any version resolution. If we’re saying that lock files may now require version resolution, then only resolvers are going to be able to install them.

pf_moore · November 18, 2024, 11:11pm

Agreed. We’re circling back to the point that the term “lockfile” means many things to many people, but I agree, the key distinguishing feature of this proposal is that no resolver is needed. And if that’s not sufficiently clear, we should be making it clearer, not changing the proposal.

Having said that, a dynamic version doesn’t necessarily mean you need a resolver. The cases I can think of where it would all feel very artificial to me^[1]. The natural case, though, is a project, whose dependencies don’t change, and where there’s no circular dependency on the project from one of its dependencies (specifically no version constraint on what versions of the “root” project a dependency works with). In that case, it makes no practical difference what version the project claims to be, as the same set of packages will work regardless. The canonical example here is when the “root project” is specified as a local source tree, to be installed in editable mode. Personally, I consider installing local source trees, especially in editable mode, to be out of scope of what a lockfile should be for. But that’s basically where the debate currently lies - if we declare them out of scope, does that mean workflow tools like uv and Poetry won’t be able to use the standard lockfile format for what they refer to as “lockfiles”?

Of course, to be theoretically sound, the spec has to handle the artificial edge cases as well as the obvious ones, but let’s put that to one side for now ↩︎