PEP 751: lock files (again)

ncoghlan · August 1, 2024, 1:01am

For avoiding auth credentials in URLs, I thought you might be able to reference Version specifiers - Python Packaging User Guide or Direct URL Data Structure - Python Packaging User Guide

However, neither of them actually mention that risk.

BrenBarn · August 1, 2024, 6:55am

I don’t know, and if @frostming is right, there aren’t any. (Well, at least not any publicly available ones.) Like I said, it’s not something I need. I just mean that it seems like for the package-locking case, we can gauge the fitness of the PEP by looking to things like poetry and seeing whether the PEP can support at least that functionality. But if we have no existing examples of what people want for file-locking, it’s a little harder to know whether this PEP would be enough to do what people would want to be done. It sounds like right now we just have you as a data point, which is fine as far as it goes, but ideally more people who need this would chime in on that aspect. Otherwise it’s not clear to me what need that part of the PEP is meeting for a larger audience.

brettcannon · August 1, 2024, 9:05pm

We can look at other languages for that, though, so I’m not working from a position of no background knowledge or experience. I actually run into people all the time that are shocked Python doesn’t have a per-file locking solution.

But it’s the simplest part, so I don’t know if people feel the need. We also went through all of that for my last PEP which was per-wheel locking only, so this isn’t a new discussion point.

I’m afraid you might have to trust me on this one then based on my research and knowledge that it’s important.

ncoghlan · August 2, 2024, 1:36am

If there are formats in other ecosystems providing inspiration for the file locking side, it’s likely worth listing them in the PEP (similar to the short list of existing Python tools).

It may also be worth clearly stating the most basic form of a file lock that we discussed in the last thread (and inspired the current target enumeration design for file locks):

Lock each environment as a separate file, recording all environment markers and wheel tags so an installer can determine if the lock file is applicable
Clean up each lock file by dropping all environment markers and wheel tags that had no effect on the package selection process (or never record those in the first place)
Combine the multiple lock files into a single lock file, ensuring any common packages are only listed once

One way I think of the difference is that a package lock is more instructions on how new environments should be built, while a file lock is more a description of how a given set of environments are built. Both can technically be used for either purpose, but their primary intent is different enough to make it worth explicitly naming the target environments in the file lock use case (since that list of supported targets is the primary new information a file lock adds).

Hmm, that gives me a thought: do the file lock and package lock fields actually need to be mutually exclusive?

Couldn’t we instead say:

file lock installers ignore package locking fields
package lock installers ignore file locking fields
hybrid installers check for a matching file lock first, and if they don’t find one, fall back to using the package lock fields

If we’re open to repainting the bikeshed: I think calling file locks “target locks” would better express that they work backwards from an identified target environment to a specific list of files to install, while a “package lock” works forward from a defined set of package versions and environment markers to the corresponding artifacts for a given platform.

Alternatively, we could just slightly tweak the full name of “file locks” to be “named file locks” (without changing the syntax) to emphasise that their main benefit over pure package locks (beyond being able to use wheel tags as part of the selection criteria) is being able to give particular targets a name.

On that note, I also think I finally thought of an elegant way to allow overlapping file locks: use wheel tag priority order to pick the most preferred lock for a given target. That neatly allows a lock file to describe both C accelerated and more portable builds in the same file.

I also wonder if we should add a field to define required values for OS environment variables in file locks, otherwise the dev/staging/production use case seems difficult to express (since those should be using similar hardware and hence have the same environment markers and wheel tags). Such an escape hatch would also cover arbitrary selectors, like CPU and GPU capability details.

j-carson · August 2, 2024, 3:38pm

If your dependency requires funky flags to be passed to pip, is there a way to specify that somewhere? I don’t see it. This particular package is a pain point for me because I can’t put all the options in the dependencies list in pyproject.toml (or pretty much any other tool that is supposed to automate installs).

pip install --config-settings="--global-option=build_ext" \
            --config-settings="--global-option=-I$(brew --prefix graphviz)/include/" \
            --config-settings="--global-option=-L$(brew --prefix graphviz)/lib/" \
            pygraphviz

charliermarsh · August 2, 2024, 3:44pm

Do packages.files and/or packages.directory allow relative paths?

charliermarsh · August 2, 2024, 4:02pm

Relatedly, I feel like the motivation section of the PEP could actually be stronger than it is right now.

Other ecosystems have lockfile formats, but I don’t know of many that have a lockfile standard. (For example, in the JavaScript ecosystem, I believe that npm, pnpm, Yarn, and Bun all use their own lockfile formats.) Having a good format is important, but that’s really the responsibility of the individual tools. So why is it important to have a standard? The bulk of the motivation section is about improving the format, and there’s less on the purpose of a standard:

The lack of a standard also has some drawbacks. For instance, any tooling that wants to work with lock files must choose which format to support, potentially leaving users unsupported (e.g., if Dependabot chose not to support PDM, support by cloud providers who can do dependency installations on your behalf, etc.).

The Dependabot case makes sense (to continue with my example, I believe Dependabot supports npm, pnpm, and yarn, but not bun’s format – if they used a standard, bun could’ve gotten that for free). Are there other use-cases that we can expand on? For example: in our Discord today, one member mentioned that lockfiles could enable installers to perform locked/reproducible installs. Imagine you use Poetry to ship a CLI application, and you want your users to be able to do pip install --locked my-app. Is that kind of thing an eventual goal that this PEP is building towards? Or a non-goal? Would we alter the design at all if we knew that this was eventually intended to ship as part of a built distribution?

brettcannon · August 2, 2024, 10:40pm

I already did in PEP 665 – A file format to list Python dependencies for reproducibility of an application | peps.python.org .

What’s the benefit of that? One is more explicit in an upfront way while the other isn’t. It feels like you’re saying, “I want to be precise in these select cases, but otherwise YOLO” which to me goes against the purpose of per-file locking.

How? The highest tag? The total or average of all the tags? And whose tag order are you using to pick your priority since it could vary from lock file to lock file?

That’s between you and pip (if it supports this PEP). They could provide a way to do it on the command line or [packages.tool] (off the top of my head).

I was going to clarify that for packages.directory when I added editables support (and I am planning to say, “yes, relative to the lock file”), but I’m not sure what you’re after here for packages.files. Do you mean in packages.files.origin? I can clarify that if it’s a file: URI that it can be relative.

Funny you bring that up because I got editorial push-back for keeping that paragraph at all. But yes, I can expand on it.

It’s at least a hope of mine. Tool interoperability and portability is part of why I’m doing this. Much like w/ pyproject.toml, tool lock-in goes away and lets us focus in the artifacts we all work with when we have shared commonality.

brettcannon · August 2, 2024, 11:44pm

PEP 751: add `editable` to `packages.directory` (#3888) · python/peps@40e8ff8 · GitHub adds packages.directory.editable, clarifies relative path handling, and some other clarifications.

ncoghlan · August 3, 2024, 5:21am

Having just been burned on PEP 667 not repeating the rationale for parts of the design that it shared with PEP 558, it’s probably worth including at least a paraphrase of this bit:

Other programming language communities have also shown the usefulness of lock files by developing their own solution to this problem. Some of those communities include:

Dart

npm/Node

Go

Rust

The trend in programming languages in the past decade seems to have been toward providing a lock file solution.

That still isn’t quite what I was suggesting is missing, though. Instead, I’m curious whether the package lock formats in each of those ecosystems would correspond to package locks or file locks in PEP 751 terms (I genuinely don’t know, and I think it’s relevant which of them have an equivalent to named file locks).

It was primarily just a thought that struck me while pondering the question of why file locks and package locks are genuinely different things: “Wait, these fields are orthogonal, so they can happily coexist in one file (including the ability to check them for internal consistency), so why is the PEP forcing mutual exclusivity rather than allowing locking tool developers to make the decision between combined files and separate files as a UX design choice?”

That said, the first practical use case that comes to mind is situations where the file locks are being used as an optimisation tool by selecting for things that regular environment markers (and hence package locks) can’t express. Falling back to a less optimised package lock is then a way of providing graceful degradation for unknown environments rather than a hard failure.

Similarly, if environment variables were added to handle the dev/ci/staging/production use case, it would likely make sense to express ci, staging & production requirements as file locks, while leaving dev as a package lock.

By changing the meaning of the wheel-tags array from “all markers must match” to “at least one marker must match” (and typically only listing the most specific marker used in the artifacts that correspond to that file lock) and then using the following algorithm (which resembles the one for choosing wheels):

first, select all named file locks where all marker-value expressions are true for the current target. If no matching file locks are found, a file lock install is not possible. (if the ability to check OS environment variables when installing from a lock file is added, it would apply here)
If multiple file locks are found, only one is permitted to omit the wheel-tags array. The rest must include it. If multiple matching file locks without a wheel-tags array are found, that is an ambiguity error, and a file lock install is not possible.
then, iterate over the valid wheel tags for the current target in the usual priority order as used for selecting wheel files to download. As soon as a file lock is found that contains a matching wheel-tags entry, use that file lock and stop searching. If multiple file locks are found for the first matching wheel tag, that is an ambiguity error and a file lock install is not possible (at a spec level, the wheel-tags array entries for all file locks with matching environment markers must form disjoint sets, and any lock file not abiding by that rule is ill-formed)
if no matching file locks containing a wheel-tags array are found, but there is a matching file lock without a wheel-tags array, use that file lock
otherwise, a file lock install is not possible (there is no file lock defined with both matching environment markers and at least one compatible wheel tag)

As applies when choosing wheel files to install, installers may choose to allow users to override the wheel tag priority order when installing, but they’re not required to do so.

This is a good point. A concrete example I’ve run into recently (bootstrapping pdm in GitHub Actions) is that without a standardised lock file format, pretty much every CI process developer is forced to make a choice between:

export the actual tool-specific lock file to the baseline requirements.txt format, use pip to install that in CI (using a standard Python CI environment)
add a bootstrapping step to the CI process that gets the relevant tool installed (while respecting the locked requirements)
speed up option 2 by defining custom base images for CI with the relevant tool preinstalled

Option 1 becomes a lot more attractive if it isn’t restricted by the limitations of the requirements.txt format (and that not only saves the effort that implementing option 2 or 3 correctly otherwise requires, it also avoids the high risk of inadvertently introducing unlocked CI dependencies by attempting to implement option 2 or 3 and not getting the bootstrapping right).

pf_moore · August 3, 2024, 9:52am

Something relevant that came up for me recently - there is no standardised priority order for wheel tags. There isn’t even a standard for what tags apply on a platform. Packaging and distribution appear to give different answers, for example.

So I’d be a strong -1 on having the reproducibility of lockfiles depend on tag priority

DanCardin · August 3, 2024, 1:51pm

I know a couple of comments have been made about unifying file/package locks. So maybe dumb question, but can’t a locker just choose to produce a package-lock style file that’s described in a way that is a “file lock”?

…unless i’m missing something…is there an example of a file-lock scenario, where you couldnt produce an equivalent package lock that yields the same strictness?

The decision to lock down to specific wheel-tags/marker-values, a-la file-lock, seems like an install-time decision you could make, given a package-lock file, no?

charliermarsh · August 3, 2024, 4:52pm

I was able to spend some more time today reviewing the format in detail. I have some high-level thoughts and a few that are more specific. Apologies for the wall of text…

First, as a meta-point, I know I’m new to the PEP process and perhaps too naive, but I’ll just admit that it feels hard to commit to a format (and a standard) without more examples and real-world stress-testing. I appreciate that there are examples in the PEP, and I know it’s a lot of work to produce them, but the included examples are fairly simple and fall into the happy-path

I’m not at all suggesting that our format is perfect, but in building uv.lock, we’ve already iterated on it a ton based on filed issues and hard test-cases. The proposed schema is fairly different from anything that exists today, so it’s hard for me to know where it will and won’t work in practice. For example: what would this format look like when resolving transformers with all of its extras enabled?

As an example of something we only discovered after our initial implementation: version and package name alone are not sufficient to act as unique identifiers (1), and this PEP relies on that. E.g., imagine a pyproject.toml like this:

[project]
name = "foo"
version = "0.1.0"
dependencies = [
  "bar @ file:///home/ferris/projects/uv/debug/bar1 ; python_version >='3.11'",
  "bar @ file:///home/ferris/projects/uv/debug/bar2 ; python_version <'3.11'"
]

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[tool.hatch.metadata]
allow-direct-references = true

Where both bar1 and bar2 contain definitions for a package named bar, at the same version (let’s say 0.1.0). They’re not the same package, but they have the same name and version. I don’t believe the Package Locking format in the PEP is capable of representing this, since you can only include one node for bar==0.1.0, and that one node has to have a single marker. In reality, you need two nodes with distinct markers.

If you accept that name and version aren’t sufficient, then you need some other input to package identity. In uv.lock, we use the idea of a package “source”, which could be a registry, or a direct URL, or a Git URL, etc.:

source = { registry = "https://pypi.org/simple" }
source = { git = "https://github.com/astral-test/uv-public-pypackage?rev=main#0dacfd662c64cb4ceb16e6cf65a157a8b715b979" }
source = { directory = "../" }

Entries are thus uniquely identifiable by name, version, and source But if you go down that road, then the format and semantics have to change a bunch too, since you no longer support mixing multiple kinds of sources, whereas the PEP allows directory, vcs, and files all at once.

(Relatedly: what use-case is that designed to support?)

Second, I worry that attempting to support both File Locking and Package Locking in a single format adds a lot of complexity to the schema. For example:

packages.directory can be present in when File Locking is enabled, but is ignored.
packages.files.lock and packages.vcs.lock can be present when Package Locking is enabled, but is ignored.

(Can the current format be represented as JSON Schema, to enable in-editor validation? I’m not sure – maybe that’s a non-goal, but it’s kind of a helpful barometer for complexity.)

I know you’ll get critiques in the other direction, but personally, I’d rather see two totally separate formats and files. The use-cases and the things you care about in File Locking vs. Package Locking just seem really different to me, and the formats could be optimized to support those use-cases.

If we committed to separate formats, and delegated multi-platform support to Package Locking, the File Locking case could even be simplified to a flat list of entries for a single Python platform (i.e., commit to being a “receipt” of exact distributions to install). That would be maximally auditable – and extremely simple! Whereas now, the File Locking format is more complex than it needs to be (in my opinion), in order to support the Package Locking and multi-environment use-cases. For example:

Should File Locking care about extras? Probably not? But Package Locking might need to.
By looking at a single package entry, you can no longer tell if it’s going to be installed on your Python platform of interest, in the File Locking case – you have to cross-reference the locks entries with the table at the top.

You could even imagine generating File Locking “lockfiles” for specific platforms from a Package Locking “lockfile”. If that’s true, it seems odd that they would use the same filename, schema, etc., since one would effectively be a derivative artifact of the other.

For uv, there are a few possible outcomes here… If you’ll forgive me, I’ll speculate on what they might be (assuming the PEP is accepted in some form):

The PEP is accepted, but only File Locking is supported. In that case, we’d like to support PEP 751 as an export format. Seems straightforward.
The PEP is accepted roughly as-is. In that case, we wouldn’t be able to use PEP 751 as our “primary” lockfile, but we’d like to support both the File Locking and Package Locking formats as export targets.
The PEP is accepted, and the Package Locking design is modified such that it’s a functionally viable alternative to uv.lock (e.g., extras and dependency groups are solved in some way, etc.). In that case, we’d still need to decide whether we want to use it as our “primary” format (replace uv.lock), our as an export target for uv.lock. I’m not sure what we’d do there yet. It would take some testing and experimentation to come to an answer.

As an example of something that would matter for the last point, but not for the first two: it’s critical for us that we can “resolve” from a lockfile, to enable fast “Is this lockfile up-to-date with the requirements?” checks. In short, with uv.lock, we can validate that it’s “acceptable” for the current set of input requirements based on information that’s stored in the lockfile alone. Could we support this with PEP 751? I think so, but we’d have to try it out. (One example: it requires that we record the requested revision, not just the Git SHA. But perhaps we can put that in the tool section.) It’s not necessarily intended as a critique of the current format, but rather, an example of something that we’d need in order to fully adopt PEP 751 (but wouldn’t need to support exporting to these formats).

Smaller things:

Can packages.multiple-entries be marked as optional? In other words: it’d be nice if anything that isn’t required to resolve (i.e., it’s either redundant or purely informational) is marked as optional. (If we were implementing the format, we’d probably omit a lot of those fields, like the description, since one of our goals is to have a succinct format.)
I appreciate that the tool escape hatches exist. There are a variety of things that we could not support in the current format, but that the tool escape hatches would help with. (For example, the PEP allows writing dependencies, but they’re marked as optional and must use PEP 508 syntax, which doesn’t fully capture (e.g.) editables. Perhaps we could write to packages.tool for anything we’re missing there.)

Thanks as always for all the work that’s gone into the PEP and discussion thus far.

charliermarsh · August 3, 2024, 4:56pm

Separately: if anyone is interested, I asked Weihang Lo, a Cargo maintainer, if they had any design documents or RFCs from the initial Cargo.lock design. (The Rust ecosystem is of course very different, but Cargo.lock does have to support packages at multiple versions, optional features, VCS and path dependencies, etc.)

While he couldn’t recall any such document from the initial design, he kindly sent me a list of things that they would reconsider with hindsight:

Which dependencies should be included

platform specific

optional

Disabled optional weak dependencies end up in `Cargo.lock` · Issue #10801 · rust-lang/cargo · GitHub

Allow specifying a set of supported target platforms in Cargo.toml · Issue #6179 · rust-lang/cargo · GitHub

Don't require existence of dependencies not needed for the current --target · Issue #5896 · rust-lang/cargo · GitHub

Whether to respect lockfile from dependencies

For all build commands?

for install binaries? "cargo install" apparently ignores "Cargo.lock" as opposed to "cargo build" · Issue #7169 · rust-lang/cargo · GitHub

Handle checksum for reproducibility and security

Be sensitive to packages that shouldn’t have changed?

When and how

Allow registry configuration for disabling checksum validation · Issue #13858 · rust-lang/cargo · GitHub

Expected behavior if a crate version's checksum changes · Issue #10071 · rust-lang/cargo · GitHub

Allow registry configuration for disabling checksum validation · Issue #13858 · rust-lang/cargo · GitHub

How to encode local path dependencies

Unavoidable Cargo.lock package collision occurs if path dependencies share same name and version · Issue #10353 · rust-lang/cargo · GitHub

How to encode patching and vendoroing (or don’t at all)

vendor doesn't work with some patched dependencies from git · Issue #10617 · rust-lang/cargo · GitHub

Handle merge conflict in Cargo.lock

Also: Whether to allow manual edits

Git merge customer driver

If encoding in URL format, you’ll need some canonicalization: cargo/src/cargo/util/canonical_url.rs at d3e84f01ff48d19892035b77db7e9c4eda87bf1f · rust-lang/cargo · GitHub

Odd behaviour for non-precise git dependencies in Cargo.lock · Issue #13300 · rust-lang/cargo · GitHub

Add a custom Git merge tool for Cargo.locks and Cargo.tomls · Issue #1818 · rust-lang/cargo · GitHub

Optimize lock file format for git merge conflicts by alexcrichton · Pull Request #7070 · rust-lang/cargo · GitHub

Working on read-only system

Always write to a lockile? Support in-memory lockfile?

'cargo metadata' fails on read-only file system · Issue #10096 · rust-lang/cargo · GitHub

Add `--lockfile-path` flag by Ifropc · Pull Request #14326 · rust-lang/cargo · GitHub

Add a way to ignore existing Cargo.lock files. · Issue #8504 · rust-lang/cargo · GitHub

How to evolve the lockfile format without disturbing users’ VCS

Features and bugs that might need a bump of Cargo.lock version · Issue #12120 · rust-lang/cargo · GitHub

Handle yanked and pre-release

More like a resolver issue

Add flag to allow Cargo to create a lock file that depends on yanked crates · Issue #4225 · rust-lang/cargo · GitHub

Pre-release version numbers · Issue #2222 · rust-lang/cargo · GitHub

Handle MSRV

MSRV-dependent dependency version resolution · Issue #9930 · rust-lang/cargo · GitHub

pf_moore · August 3, 2024, 7:52pm

We don’t, though. Package identity is determined by name and version (more precisely, a package is uniquely identified by its name, and any package may have multiple versions, but any two distributions claiming to be the same package name and version are required to be functionally identical). I don’t think it’s explicitly stated in any of the existing standards, but it’s an assumption that is made throughout the ecosystem, and it’s extremely likely that it can be deduced from a sufficiently careful reading of the standards^[1].

In my view, your example is simply wrong - the two dependencies should have different names.

it’s pretty much required by the sdist filename specification, which states that the filename is uniquely determined by project name and version ↩︎

charliermarsh · August 3, 2024, 8:40pm

But as stated, the user would see different behavior between (1) running pip install or equivalent on a machine with Python 3.10 and Python 3.11, and (2) running pip install or equivalent from a lockfile generated by the same input requirements with Python 3.10 and Python 3.11. That seems not good to me.

I think the scenario I’ve described here is not entirely contrived. Imagine you’re using a Git dependency, and you tend to just leave the version in the repo at 0.0.1, but want to use different commits or different branches for different Python versions.

Or, imagine that you want to use requests from PyPI for Python 3.10 and below, but patch it with a Git fork on Python 3.11.

The response to these scenarios might be for the user to do something different, but I find it really unintuitive from a user perspective. Like, it’s surprising that this would universally pick one of a7919970 or 4fe0aeba (implementation-defined, I think?) regardless of the user’s sys_platform:

flask @ git+https://github.com/pallets/flask@a7919970 ; sys_platform == 'darwin'
flask @ git+https://github.com/pallets/flask@4fe0aeba ; sys_platform != 'darwin'

I apologize that I’m not as well-versed in the core concepts here – I know you’ve thought about this more than me and appreciate your insight.

pf_moore · August 3, 2024, 10:03pm

Indeed it’s not good. But only in the same way as any situation where the user has a bug in their system.

Your mistake is “leaving the version in the repo at 0.0.1”. You should do something like constructing the version number with a local identifier of the commit ID.

You change the version to add a local identifier in your patch.

I really don’t know how to respond to this. It’s so fundamental to how Python packaging works that I’m struggling to find a way to explain that isn’t just “don’t do that”

One higher-level problem here is that the packaging ecosystem is based around distribution. Managing development environments, which are fundamentally far more fluid than a released artefact, is a very different situation. And while people have, over the years, forced tools like pip into use in a development workflow context, they were (in general) never intended for that purpose, and the cracks do show, at times.

Maybe if we were starting from a clean slate, we’d do things differently, but at least in terms of the standards and older tools like pip, we don’t have that luxury. You’re coming at things from a different perspective with uv, and can start without preconceived assumptions. But I would strongly advise you to look at “managing in-development code” and “installing distributed packages” as two separate things, otherwise you’ll keep hitting this sort of misunderstanding.

Charlie Marsh:

Like, it’s surprising that this would universally pick one of a7919970 or 4fe0aeba (implementation-defined, I think?) regardless of the user’s sys_platform:
flask @ git+https://github.com/pallets/flask@a7919970 ; sys_platform == 'darwin'
flask @ git+https://github.com/pallets/flask@4fe0aeba ; sys_platform != 'darwin'

Why would it do that? If those two items are in a requirements file, or a dependency list, only one of the markers would evaluate to true, and that item would be picked and installed. I don’t know what uv does, but pip would checkout the appropriate commit to a temporary directory and build a wheel from that, and install it. The version number and name of the installed package would be whatever the wheel build said it was (in the metadata) - pip would fail with an error if the name wasn’t “flask”, but it would accept whatever version was generated.

As I said, you’re coming at this from a different perspective. Which is good. What concerns me is that you (in the context of uv) might be trying to solve a slightly different problem than the existing ecosystem is focused on - and as a result, what you’re asking from the standards has far wider implications than you imagine. And while I’d very much want to incorporate your insights into the standards, so that we don’t end up with a split where uv is forced to “do its own thing”^[1], there’s only so much we can achieve with the resources we have.

Of course, it’s also possible I’m making too much of this - maybe @brettcannon can see an easy way to incorporate what you’re suggesting into the lockfile proposal. In which case, I’ll be happy and will apologise for making a fuss over nothing

that’s sort of what led to the conda situation, and IMO we don’t want another case of that… ↩︎

charliermarsh · August 3, 2024, 10:34pm

Thanks Paul. I’ll try to keep this one brief for now

Thanks, I appreciate this sentiment (around avoiding some kind of split) and I share it.

This is very interesting for me – again, thanks for sharing.

Have the lines blurred here over time? For example, taken to the extreme, would it not be correct (or at least valid) for pip install git+https://github.com/pallets/flask@4fe0aebab79a092615f5f86a24b91bac07fb2ef2 after pip install git+https://github.com/pallets/flask@a791997041b94b8a5effebc296cb427fde8e0ee5 to be a no-op, since you already have flask==3.1.0.dev0 in your environment? The commits don’t match, but they have the same package-version pair, and so should be functionally equivalent. (Feel free to just ignore this question if you feel it’s too far afield from the discussion at hand.)

I agree with these! I’m just pointing out that the results are unintuitive. Users will hit this behavior and report bugs. And it just seems entirely preventable, with a different schema. But I will take some time to reflect on this.

I just want to clarify this one: yes, uv would do the same thing!

But the Package Locking lockfile has no way to represent this, right? There can be exactly one package entry for “flask==3.1.0.dev0” (name and version must be unique across entries), with one marker. That package entry can have a single [packages.vcs] sub-table. That sub-table can only point to a single commit. My point is that the lockfile will not be able to capture this scenario (perhaps I’m wrong and we’re talking past each other) – so any installer would subsequently get it “wrong” (at least compared to taking that dependency list and installing it with pip or uv).

ncoghlan · August 4, 2024, 3:58am

That’s a legitimate concern and the main reason the process allows for provisional acceptance periods where we may amend provisionally accepted specifications if a spec is determined to be sufficiently flawed that it’s better to take the pain of an early compatibility break over living with the flaw indefinitely.

(I can’t recall a case where we’ve ever actually used that escape hatch, though. Most post-acceptance fixes have come in the form of clarifying ambiguities rather than having to make genuinely backwards incompatible changes to provisionally accepted specs)

While we try to avoid assigning semantics to the local version labels in the shared specifications, I think this is a case where it’s legitimate to mandate a particular way of using them. (For anyone not clear on what Paul, Charlie, and I are referring to, it’s the ...+<mostly arbitrary label> part of the version identifier spec)

Handling situations where name==public_version is ambiguous is exactly the reason the local identifier escape hatch exists. Being able to compose multiple local version identifier segments even without fully defined semantics is the reason the spec reserves . as a field separator.

So while I think you’re right that PEP 751 should cover this situation, I also think that coverage may be as simple as saying something like:

A lock file may need to represent modified versions of packages with the same nominal public version (for example, a library may need patches applied for compatibility with different platforms or Python versions). To handle such situations, the locking tool MUST generate (or be given) appropriate local identifier suffixes to use in the [[packages]] array entry for the otherwise ambiguous versions such that the combination of packages.name and packages.version remains unique within the [[packages]] array. If the nominal version of the package already includes a local version identifier, the disambiguation suffix MUST be appended as a new local version identifier segment (after a separating .).

pf_moore · August 4, 2024, 6:59am

Pip special cases installs of “things that come from places that might change” (and URLs are one of those, even though these particular URLs are static) and rebuilds/reinstalls.

I don’t recall the precise details because it’s messy “do what I mean” logic, and does not fit with pip’s underlying model () but it’s something like that.