PEP 751: lock files (again)

Thanks Paul. That makes sense, good points all around…

I’m generally willing to go down this route. The problem is important enough to the ecosystem and to uv that it’s worth investing a few days to implement the spec, even if it’s ultimately rejected. (I’d prefer to have a little more consensus on the format before investing that time though.)

3 Likes

If that’s the case, is it an option to[1] split the PEP into two PEPs, tackling the “simple” file locking use case first?


  1. shudder ↩︎

So we’d implement a file-locking format initially, and later add package locking. That sounds worryingly like the previous iteration of the proposal, where the PEP was for file locking (restricted to wheels only in that case) with a follow-up that would add sdist support (there wasn’t really any talk of “package locking” except in the context of sdists, as far as I recall). And everyone basically said they wouldn’t bother with the proposal until the second phase was added.

What assurances do we have that this wouldn’t happen here? In principle, deferring package locking to version 2 of the spec is possible. But for that to work, version 1 (file locking only) has to stand on its own as a viable proposal. And file locking only was Brett’s original proposal in this PEP, with package locking added because people like Poetry, PDM and uv said they needed it. So who would actually implement a “version 1” lockfile spec?

4 Likes

As someone who uses two of these tools, if they need package locking, I think that should be prioritized over file locking. with package locking, we can always get the effective behavior of file locking (by locking to a single solution per dependency), but the other way around isn’t the case. In this way, 1 format can serve both needs.

2 Likes

We were interested in a file-locking specification to replace requirements.txt files and we’d implement it if it existed. However, I feel pretty strongly that it’d be confusing to have multiple standard lockfiles and it’s not clear that we can design a file-locking format that’s definitely compatible with the needs of a package-locking format.

(Feel free to correct me if you disagree, Charlie)

3 Likes

I didn’t worry about that detail honestly. If the format is designed appropriately then it shouldn’t be hard to support both aspects in the same installer.

Yep, and that might be fine depending on the circumstances.

By “expanded metadata” I mean recording the extras and the dependency group details.

:+1:

Thanks very much for that offer!

Which version was that? I looked at 0.2.35 last.

I’ve said this before and the offer still stands: I have no problem trying to standardize on what uv comes up with or using it as a starting point if we all think that’s a better result. I have no ego in this and I just want to drive us to a solution. I’m also fine if the initial solution leaves out any file locking and that comes in a later PEP with explicit input from the security experts (if it’s deemed necessary).

Charlie also raised a similar concern of doing things in that order.

3 Likes

Those changes shipped today in 0.2.37. The core of the format is unchanged, but we now write the requires-dist directly for packages that we consider “mutable”, separate from the dependencies array that points to specific entries in the lockfile.

1 Like

I think collectively we’ve come to a better understanding of the relationship between open package locks and fully resolved file locks since Brett’s original proposal, though.

If we were to revert back to an “only standardise file locking” proposal, we could keep the “require-resolved-environment = false” escape hatch, but require that the details be placed in a separate file referenced from a fallback specification table rather than attempting to describe that scenario in the standardised lock file.

For example:

[unresolved-install]
installer = "install-project-name"
installer-lock-file = "relative/path/to/tool/specific/lock"

Typical values in that table would then be pdm, poetry or uv (with a pointer to their respective lock files), or potentially even pip (with a pointer to a requirements.txt file)

The PEP could then go back to focusing on standardising resolved environment locks (which are what the deployment platform use case needs) without attempting to standardise an area that’s still under active research (adequately describing truly universal package locks).

Edit: this wouldn’t be the desired end state, it would just provide a structured way to postpone package locking to version 2 of the spec

2 Likes

Yeah, this is roughly how I feel.

I actually think a File Lock specification would be really useful (way better than the way we use requirements.txt for this right now), but as an export target for our lockfile (e.g., from a uv.lock, we can generate a pylock.toml for a given target platform). So if we want to standardize the Package Locking format (replace poetry.lock, uv.lock), we’d need a second PEP with a second format or amendments to the File Locking format.

I worry that shipping File Locking first will make it harder to support Package Locking as a standard. That worry might be misplaced… maybe it’s not so bad, if we design the File Locking format with Package Locking in mind, and generalize it later?

Interestingly, I think there is a case to be made that the motivating use-cases for standardization can be captured by a File Lock standard alone. Like, with a File Lock standard, you would still have an interoperable format for installers. You could still have automated installs by cloud providers. You would still have standardized Dependabot support. (I don’t think Dependabot will be able to update Package Locking lockfiles anyway – it will need to invoke the resolving tool, like uv and Poetry, as it does today, since it needs to run a resolver, and those tools will write tool-specific metadata to the lockfile.) The format could also be much simpler. But, we’d still need uv.lock.

So we’d end up in a world where tools have their own tool-specific lockfile (poetry.lock, pdm.lock, uv.lock) but can export to File Locks (pylock.toml). Is that good? I think it’s an improvement, but users would still have to understand two formats (but only one of them standardized), understand why there are two formats, etc.

Personally, I think it’s in the best interests of users to strive for one lockfile format, and I think it’s possible for us to make it work.

5 Likes

Thanks Brett. I also have no ego (in the other direction) about using uv’s format as a starting point or otherwise. But if we want to explore this, I am willing to put in work here.

4 Likes

I’d mostly prefer only file locking mainly because all package locking files rely on a very key assumption. That resolution only depends on markers/static metadata. And my experience/trust for ml ecosystem/pytorch to satisfy that is weak. Pytorch is a complex library, but it’s also very popular ml library, so lock format that has struggles with it is yellow flag. Code like this in setup.py that decides based on environment flags/other hardware what dependencies to use. I expect that if you implemented package lock today and tested with pytorch across linux with gpu/no gpu/mac you’d get incorrect results.

If packages were required to exclusively use markers to define conditional dependencies I’d be more supportive of package locking. With current ecosystem that allows arbitrary dynamic dependency logic across machines and sometimes right marker just doesn’t exist (gpus sadly), I’d be very wary that package lock for libraries I use would end up giving incorrect results on individual environments.

Please see the many threads related to variant support which is trying to better solve the “GPU problem”[1].

Specificially this comment where I suggest that a package lock + variants.toml file will be required to handle cases such as you identify here.


  1. as a simplified way of describing the general class of problems ↩︎

Does package lock depend/require that?

If package lock pep is independent of variant pep then that feels like we’ll have lock file that some parts of python ecosystem can use while others it will misbehave and likely lead to unusable environment. And it’s not even straight forward to detect when this will occur.

I’m ok with answer that the package lock file pep today without variants/other extensions only works for subset of ecosystem. I expect it will be confusing to bunch of users and average ml user will not know packaging well enough to expect this issue.

1 Like

PDM also has limited support for the 2nd case. Users have to tailor the dependency set themselves using environment specifiers(this is inspired by uv). He can run the following commands in sequence:

pdm lock --platform macos   # Pick and resolve the first `anyio` specification
pdm lock --platform windows --append  # Pick and resolve the second `anyio` specification, then **merge** the result with the first one

Therefore, every lock run must be targeted to a selected environments. At least, when neither platform nor implementation is specified, the lock is targeted to the requires-python in pyproject.toml.


There is another difficulty with cross-platform lock for lockers: partial/incomplete releases. For example:

kaleido 0.2.1post1

kaleido-0.2.1.post1-py2.py3-none-manylinux2014_armv7l.whl

kaleido-0.2.1

kaleido-0.2.1-py2.py3-none-macosx_10_11_x86_64.whl
kaleido-0.2.1-py2.py3-none-macosx_11_0_arm64.whl
kaleido-0.2.1-py2.py3-none-manylinux1_x86_64.whl
kaleido-0.2.1-py2.py3-none-manylinux2014_aarch64.whl
kaleido-0.2.1-py2.py3-none-win32.whl
kaleido-0.2.1-py2.py3-none-win_amd64.whl

A cross-platform locker like uv or PDM always prefer 0.2.1post1 because it is a newer version without noticing it only ships a single wheel for manylinux. And the installer will fail later because it can neither find a compatible wheel nor a sdist.

4 Likes

When i wrote the original comment about absolute vs. relative paths i hoped this would just be adding like two sentences and an example to the PEP on your favorite way to represent relative paths. Relative paths are critical for us and i want to avoid having the same bad place for relative paths as PEP 621, where every tool has another hack around the lack of support for them in project.dependencies.

I’m fine with any solution that is clear about how to serialize and deserialize relative paths. My preferred solution would be splitting origin into two mutually exclusive fields url and path, url being a URL as usual and path following the same rules as packages.directory.path. We’ve made good experiences using this in uv.

Now that i’ve read into the standards situation some more, i’d also like to request using “URL” instead of “URI” and linking to the WHATWG URL standard (https://url.spec.whatwg.org/).


As an example, in uv we support the following case:

[project]
name = "dummy"
version = "0.4.0"
requires-python = ">=3.12"
dependencies = ["vendored_build"]

[tool.uv.sources]
vendored_build = { path = "vendored/vendored_build-0.1.0-py2.py3-none-any.whl" }

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

Corresponding to this lockfile (ignore the duplication, the interesting line is package.source):

version = 1
requires-python = ">=3.12"

[[package]]
name = "dummy"
version = "0.4.0"
source = { editable = "." }
dependencies = [
    { name = "vendored-build" },
]

[package.metadata]
requires-dist = [{ name = "vendored-build", path = "vendored/vendored_build-0.1.0-py2.py3-none-any.whl" }]

[[package]]
name = "vendored-build"
version = "0.1.0"
source = { path = "vendored/vendored_build-0.1.0-py2.py3-none-any.whl" }
wheels = [
    { filename = "vendored_build-0.1.0-py2.py3-none-any.whl", hash = "sha256:fb6bd5addc5daffbc3201f49fed9a27cf25b1df735a89765d10041dbe4420654" },
]
1 Like

Strong +1 from me on having an explicit way of referencing relative paths, that does not rely on the unclear and inconsistent “support” that URIs/URLs provide for them.

2 Likes

75% of people who took the poll said they wanted a single file format. Also 3/4 of Hatch/Poetry/PDM/uv also said a single file format. So to me that speaks clearly to aim for a single file format.

Thanks to everyone who took the poll!

How in flux is the format? Are you ready for questions about it now?

I’m going to ask about this below.

Lock for specific platforms or Python versions - PDM suggests you just error out if requires-python is not enough to get a clean lock. Is that correct?

In that case I think you have to say the project messed up (i.e. I wonder if they meant to do a release using a build number for the one wheel to fix a botched wheel?). Otherwise you need constraints support.

If you feel there’s a gap in [project] then please consider proposing an update to the spec (in a separate topic please :grin:).

Sure, although I don’t know if any of the other packaging specs are that specific. I don’t think it will cause issues since I think it’s assumed when the specs say “URL”.

OK, I can update the PEP.


My next question is what are our ultimate goal for package locking and what do we want to be possible?

If I have understood @radoering correctly, Poetry currently requires a resolution both at locking and install time, but they are up/planning for trying to drop the resolving at install time.

@frostming said up above that PDM is moving away from trying to make universal locks to the point of deprecating the cross_platform locking strategy. It seems like PDM will at least try at a broad lock, but is willing to just say no and require specifying the appropriate details to make a lock successful (e.g., platform, implementation). PDM does not require any resolution at installation time; I believe you can read the lock file sequentially to figure out what to install.

@charliermarsh seems to have suggested uv can or is at least attempting to do universal/open-ended locks. They are doing it w/o requiring a resolution but there is a simple graph traversal.

I believe @ofek and Hatch are waiting to see what happens to this PEP.

Hopefully I’m representing everyone appropriately, and if I’m wrong then please let me know!

Assuming I’m not getting it wrong, though, the questions I have are:

  • @charliermarsh : do you think you have something that PDM lacks which gives you better resolves and thus can do broader locks than what @frostming seems to think is possible?
  • @radoering : do you think @frostming has it right that universal resolves are a fool’s errand?
  • @radoering : do you think a sequential reading like what @frostming is doing of the graph traversal that @charliermarsh is doing better? (You can say they both work and it’s just personal preference.)
  • @frostming : Since you have currently the most restrictive lock file I don’t have any questions for you, but feel free to share anything you want.

So if the line of questioning above doesn’t make it obvious, I’m trying to figure out what we think is reasonably possible so we know what we are aiming for.

2 Likes

Yes, Hatch is waiting on the outcome here to introduce locking. Am I correct in understanding that this single file format would still have a way to define environments (purely direct links to artifacts) such that resolution during installation is not required or is that gone now?

2 Likes

Yes happy to take questions. We only have one more change planned, which is to allow the user to constrain the set of environments that we solve for upfront (1). But it won’t affect the schema much.

Beyond that, we’ll just make changes in response to what we learn from users over time.

Yeah, I’m happy to talk through how the resolver works in detail. I think we can reliably produce these locks, though I agree that it’s a very hard problem, and you have to make a few assumptions (or, alternatively phrased, there are a few unsolved problems that we have to make assumptions to work around):

First: that metadata is consistent across platforms for a given package-version (i.e., all wheels built from the source distribution will produce the same metadata). This one is familiar.

Second: if a package-version is missing a source distribution, it’s really hard to determine the set of platforms on which it can be installed (so we just assume it’s installable ~everywhere). Like, given a set of wheels, it’s hard to determine which platforms are supported and which are not. So in some cases, we can produce resolutions that don’t successfully install, because we end up not being able to find a wheel for a given platform at install-time (despite finding a valid package-version).

2 Likes

Exactly.

I think it works good enough if you make some assumptions/restrictions. The two points Charlie mentioned came immediately to my mind, too.

Actually, I think that either I misunderstand something or PDM’s lock file is still “cross-platform” if you specify a target only with requires_python = ">=3.8". If the only restriction is that you need at least a requires_python and then it will work on all platforms within the specified Python versions than I would say it is still quite universal. Even though you can omit the Python constraint in Poetry, that is not a real-world use case in my opinion.

I think the graph traversal lock file is just a “less resolved” version of the sequential lock file (under some assumptions). If you only use markers to decide which nodes in the graph have to be traversed and you have a means to unambiguously identify package entries (name/version is not enough, you need name/version/source or similar), both approaches are equivalent. I think you can even transform a graph traversal lock file into a sequential lock file by doing some marker algebra. (However, you probably cannot do it the other way around, it is not a bijective operation.)

From a user point of view, I prefer the sequential lock file because when reviewing you can just grep one package entry and determine if it will be installed in a certain environment without having to look at any other package entries. In a graph traversal lock file, you probably always need a tool for this task.

4 Likes