I donāt know, and if @frostming is right, there arenāt any. (Well, at least not any publicly available ones.) Like I said, itās not something I need. I just mean that it seems like for the package-locking case, we can gauge the fitness of the PEP by looking to things like poetry and seeing whether the PEP can support at least that functionality. But if we have no existing examples of what people want for file-locking, itās a little harder to know whether this PEP would be enough to do what people would want to be done. It sounds like right now we just have you as a data point, which is fine as far as it goes, but ideally more people who need this would chime in on that aspect. Otherwise itās not clear to me what need that part of the PEP is meeting for a larger audience.
We can look at other languages for that, though, so Iām not working from a position of no background knowledge or experience. I actually run into people all the time that are shocked Python doesnāt have a per-file locking solution.
But itās the simplest part, so I donāt know if people feel the need. We also went through all of that for my last PEP which was per-wheel locking only, so this isnāt a new discussion point.
Iām afraid you might have to trust me on this one then based on my research and knowledge that itās important.
If there are formats in other ecosystems providing inspiration for the file locking side, itās likely worth listing them in the PEP (similar to the short list of existing Python tools).
It may also be worth clearly stating the most basic form of a file lock that we discussed in the last thread (and inspired the current target enumeration design for file locks):
Lock each environment as a separate file, recording all environment markers and wheel tags so an installer can determine if the lock file is applicable
Clean up each lock file by dropping all environment markers and wheel tags that had no effect on the package selection process (or never record those in the first place)
Combine the multiple lock files into a single lock file, ensuring any common packages are only listed once
One way I think of the difference is that a package lock is more instructions on how new environments should be built, while a file lock is more a description of how a given set of environments are built. Both can technically be used for either purpose, but their primary intent is different enough to make it worth explicitly naming the target environments in the file lock use case (since that list of supported targets is the primary new information a file lock adds).
Hmm, that gives me a thought: do the file lock and package lock fields actually need to be mutually exclusive?
hybrid installers check for a matching file lock first, and if they donāt find one, fall back to using the package lock fields
If weāre open to repainting the bikeshed: I think calling file locks ātarget locksā would better express that they work backwards from an identified target environment to a specific list of files to install, while a āpackage lockā works forward from a defined set of package versions and environment markers to the corresponding artifacts for a given platform.
Alternatively, we could just slightly tweak the full name of āfile locksā to be ānamed file locksā (without changing the syntax) to emphasise that their main benefit over pure package locks (beyond being able to use wheel tags as part of the selection criteria) is being able to give particular targets a name.
On that note, I also think I finally thought of an elegant way to allow overlapping file locks: use wheel tag priority order to pick the most preferred lock for a given target. That neatly allows a lock file to describe both C accelerated and more portable builds in the same file.
I also wonder if we should add a field to define required values for OS environment variables in file locks, otherwise the dev/staging/production use case seems difficult to express (since those should be using similar hardware and hence have the same environment markers and wheel tags). Such an escape hatch would also cover arbitrary selectors, like CPU and GPU capability details.
If your dependency requires funky flags to be passed to pip, is there a way to specify that somewhere? I donāt see it. This particular package is a pain point for me because I canāt put all the options in the dependencies list in pyproject.toml (or pretty much any other tool that is supposed to automate installs).
Relatedly, I feel like the motivation section of the PEP could actually be stronger than it is right now.
Other ecosystems have lockfile formats, but I donāt know of many that have a lockfile standard. (For example, in the JavaScript ecosystem, I believe that npm, pnpm, Yarn, and Bun all use their own lockfile formats.) Having a good format is important, but thatās really the responsibility of the individual tools. So why is it important to have a standard? The bulk of the motivation section is about improving the format, and thereās less on the purpose of a standard:
The lack of a standard also has some drawbacks. For instance, any tooling that wants to work with lock files must choose which format to support, potentially leaving users unsupported (e.g., if Dependabot chose not to support PDM, support by cloud providers who can do dependency installations on your behalf, etc.).
The Dependabot case makes sense (to continue with my example, I believe Dependabot supports npm, pnpm, and yarn, but not bunās format ā if they used a standard, bun couldāve gotten that for free). Are there other use-cases that we can expand on? For example: in our Discord today, one member mentioned that lockfiles could enable installers to perform locked/reproducible installs. Imagine you use Poetry to ship a CLI application, and you want your users to be able to do pip install --locked my-app. Is that kind of thing an eventual goal that this PEP is building towards? Or a non-goal? Would we alter the design at all if we knew that this was eventually intended to ship as part of a built distribution?
Whatās the benefit of that? One is more explicit in an upfront way while the other isnāt. It feels like youāre saying, āI want to be precise in these select cases, but otherwise YOLOā which to me goes against the purpose of per-file locking.
How? The highest tag? The total or average of all the tags? And whose tag order are you using to pick your priority since it could vary from lock file to lock file?
Thatās between you and pip (if it supports this PEP). They could provide a way to do it on the command line or [packages.tool] (off the top of my head).
I was going to clarify that for packages.directory when I added editables support (and I am planning to say, āyes, relative to the lock fileā), but Iām not sure what youāre after here for packages.files. Do you mean in packages.files.origin? I can clarify that if itās a file: URI that it can be relative.
Funny you bring that up because I got editorial push-back for keeping that paragraph at all. But yes, I can expand on it.
Itās at least a hope of mine. Tool interoperability and portability is part of why Iām doing this. Much like w/ pyproject.toml, tool lock-in goes away and lets us focus in the artifacts we all work with when we have shared commonality.
Having just been burned on PEP 667 not repeating the rationale for parts of the design that it shared with PEP 558, itās probably worth including at least a paraphrase of this bit:
Other programming language communities have also shown the usefulness of lock files by developing their own solution to this problem. Some of those communities include:
The trend in programming languages in the past decade seems to have been toward providing a lock file solution.
That still isnāt quite what I was suggesting is missing, though. Instead, Iām curious whether the package lock formats in each of those ecosystems would correspond to package locks or file locks in PEP 751 terms (I genuinely donāt know, and I think itās relevant which of them have an equivalent to named file locks).
It was primarily just a thought that struck me while pondering the question of why file locks and package locks are genuinely different things: āWait, these fields are orthogonal, so they can happily coexist in one file (including the ability to check them for internal consistency), so why is the PEP forcing mutual exclusivity rather than allowing locking tool developers to make the decision between combined files and separate files as a UX design choice?ā
That said, the first practical use case that comes to mind is situations where the file locks are being used as an optimisation tool by selecting for things that regular environment markers (and hence package locks) canāt express. Falling back to a less optimised package lock is then a way of providing graceful degradation for unknown environments rather than a hard failure.
Similarly, if environment variables were added to handle the dev/ci/staging/production use case, it would likely make sense to express ci, staging & production requirements as file locks, while leaving dev as a package lock.
By changing the meaning of the wheel-tags array from āall markers must matchā to āat least one marker must matchā (and typically only listing the most specific marker used in the artifacts that correspond to that file lock) and then using the following algorithm (which resembles the one for choosing wheels):
first, select all named file locks where all marker-value expressions are true for the current target. If no matching file locks are found, a file lock install is not possible. (if the ability to check OS environment variables when installing from a lock file is added, it would apply here)
If multiple file locks are found, only one is permitted to omit the wheel-tags array. The rest must include it. If multiple matching file locks without a wheel-tags array are found, that is an ambiguity error, and a file lock install is not possible.
then, iterate over the valid wheel tags for the current target in the usual priority order as used for selecting wheel files to download. As soon as a file lock is found that contains a matching wheel-tags entry, use that file lock and stop searching. If multiple file locks are found for the first matching wheel tag, that is an ambiguity error and a file lock install is not possible (at a spec level, the wheel-tags array entries for all file locks with matching environment markers must form disjoint sets, and any lock file not abiding by that rule is ill-formed)
if no matching file locks containing a wheel-tags array are found, but there is a matching file lock without a wheel-tags array, use that file lock
otherwise, a file lock install is not possible (there is no file lock defined with both matching environment markers and at least one compatible wheel tag)
As applies when choosing wheel files to install, installers may choose to allow users to override the wheel tag priority order when installing, but theyāre not required to do so.
This is a good point. A concrete example Iāve run into recently (bootstrapping pdm in GitHub Actions) is that without a standardised lock file format, pretty much every CI process developer is forced to make a choice between:
export the actual tool-specific lock file to the baseline requirements.txt format, use pip to install that in CI (using a standard Python CI environment)
add a bootstrapping step to the CI process that gets the relevant tool installed (while respecting the locked requirements)
speed up option 2 by defining custom base images for CI with the relevant tool preinstalled
Option 1 becomes a lot more attractive if it isnāt restricted by the limitations of the requirements.txt format (and that not only saves the effort that implementing option 2 or 3 correctly otherwise requires, it also avoids the high risk of inadvertently introducing unlocked CI dependencies by attempting to implement option 2 or 3 and not getting the bootstrapping right).
Something relevant that came up for me recently - there is no standardised priority order for wheel tags. There isnāt even a standard for what tags apply on a platform. Packaging and distribution appear to give different answers, for example.
So Iād be a strong -1 on having the reproducibility of lockfiles depend on tag priority
I know a couple of comments have been made about unifying file/package locks. So maybe dumb question, but canāt a locker just choose to produce a package-lock style file thatās described in a way that is a āfile lockā?
ā¦unless iām missing somethingā¦is there an example of a file-lock scenario, where you couldnt produce an equivalent package lock that yields the same strictness?
The decision to lock down to specific wheel-tags/marker-values, a-la file-lock, seems like an install-time decision you could make, given a package-lock file, no?
I was able to spend some more time today reviewing the format in detail. I have some high-level thoughts and a few that are more specific. Apologies for the wall of textā¦
First, as a meta-point, I know Iām new to the PEP process and perhaps too naive, but Iāll just admit that it feels hard to commit to a format (and a standard) without more examples and real-world stress-testing. I appreciate that there are examples in the PEP, and I know itās a lot of work to produce them, but the included examples are fairly simple and fall into the happy-path
Iām not at all suggesting that our format is perfect, but in building uv.lock, weāve already iterated on it a ton based on filed issues and hard test-cases. The proposed schema is fairly different from anything that exists today, so itās hard for me to know where it will and wonāt work in practice. For example: what would this format look like when resolving transformers with all of its extras enabled?
As an example of something we only discovered after our initial implementation: version and package name alone are not sufficient to act as unique identifiers (1), and this PEP relies on that. E.g., imagine a pyproject.toml like this:
Where both bar1 and bar2 contain definitions for a package named bar, at the same version (letās say 0.1.0). Theyāre not the same package, but they have the same name and version. I donāt believe the Package Locking format in the PEP is capable of representing this, since you can only include one node for bar==0.1.0, and that one node has to have a single marker. In reality, you need two nodes with distinct markers.
If you accept that name and version arenāt sufficient, then you need some other input to package identity. In uv.lock, we use the idea of a package āsourceā, which could be a registry, or a direct URL, or a Git URL, etc.:
Entries are thus uniquely identifiable by name, version, and source But if you go down that road, then the format and semantics have to change a bunch too, since you no longer support mixing multiple kinds of sources, whereas the PEP allows directory, vcs, and files all at once.
(Relatedly: what use-case is that designed to support?)
Second, I worry that attempting to support both File Locking and Package Locking in a single format adds a lot of complexity to the schema. For example:
packages.directory can be present in when File Locking is enabled, but is ignored.
packages.files.lock and packages.vcs.lock can be present when Package Locking is enabled, but is ignored.
(Can the current format be represented as JSON Schema, to enable in-editor validation? Iām not sure ā maybe thatās a non-goal, but itās kind of a helpful barometer for complexity.)
I know youāll get critiques in the other direction, but personally, Iād rather see two totally separate formats and files. The use-cases and the things you care about in File Locking vs. Package Locking just seem really different to me, and the formats could be optimized to support those use-cases.
If we committed to separate formats, and delegated multi-platform support to Package Locking, the File Locking case could even be simplified to a flat list of entries for a single Python platform (i.e., commit to being a āreceiptā of exact distributions to install). That would be maximally auditable ā and extremely simple! Whereas now, the File Locking format is more complex than it needs to be (in my opinion), in order to support the Package Locking and multi-environment use-cases. For example:
Should File Locking care about extras? Probably not? But Package Locking might need to.
By looking at a single package entry, you can no longer tell if itās going to be installed on your Python platform of interest, in the File Locking case ā you have to cross-reference the locks entries with the table at the top.
You could even imagine generating File Locking ālockfilesā for specific platforms from a Package Locking ālockfileā. If thatās true, it seems odd that they would use the same filename, schema, etc., since one would effectively be a derivative artifact of the other.
For uv, there are a few possible outcomes hereā¦ If youāll forgive me, Iāll speculate on what they might be (assuming the PEP is accepted in some form):
The PEP is accepted, but only File Locking is supported. In that case, weād like to support PEP 751 as an export format. Seems straightforward.
The PEP is accepted roughly as-is. In that case, we wouldnāt be able to use PEP 751 as our āprimaryā lockfile, but weād like to support both the File Locking and Package Locking formats as export targets.
The PEP is accepted, and the Package Locking design is modified such that itās a functionally viable alternative to uv.lock (e.g., extras and dependency groups are solved in some way, etc.). In that case, weād still need to decide whether we want to use it as our āprimaryā format (replace uv.lock), our as an export target for uv.lock. Iām not sure what weād do there yet. It would take some testing and experimentation to come to an answer.
As an example of something that would matter for the last point, but not for the first two: itās critical for us that we can āresolveā from a lockfile, to enable fast āIs this lockfile up-to-date with the requirements?ā checks. In short, with uv.lock, we can validate that itās āacceptableā for the current set of input requirements based on information thatās stored in the lockfile alone. Could we support this with PEP 751? I think so, but weād have to try it out. (One example: it requires that we record the requested revision, not just the Git SHA. But perhaps we can put that in the tool section.) Itās not necessarily intended as a critique of the current format, but rather, an example of something that weād need in order to fully adopt PEP 751 (but wouldnāt need to support exporting to these formats).
Smaller things:
Can packages.multiple-entries be marked as optional? In other words: itād be nice if anything that isnāt required to resolve (i.e., itās either redundant or purely informational) is marked as optional. (If we were implementing the format, weād probably omit a lot of those fields, like the description, since one of our goals is to have a succinct format.)
I appreciate that the tool escape hatches exist. There are a variety of things that we could not support in the current format, but that the tool escape hatches would help with. (For example, the PEP allows writing dependencies, but theyāre marked as optional and must use PEP 508 syntax, which doesnāt fully capture (e.g.) editables. Perhaps we could write to packages.tool for anything weāre missing there.)
Thanks as always for all the work thatās gone into the PEP and discussion thus far.
Separately: if anyone is interested, I asked Weihang Lo, a Cargo maintainer, if they had any design documents or RFCs from the initial Cargo.lock design. (The Rust ecosystem is of course very different, but Cargo.lock does have to support packages at multiple versions, optional features, VCS and path dependencies, etc.)
While he couldnāt recall any such document from the initial design, he kindly sent me a list of things that they would reconsider with hindsight:
We donāt, though. Package identity is determined by name and version (more precisely, a package is uniquely identified by its name, and any package may have multiple versions, but any two distributions claiming to be the same package name and version are required to be functionally identical). I donāt think itās explicitly stated in any of the existing standards, but itās an assumption that is made throughout the ecosystem, and itās extremely likely that it can be deduced from a sufficiently careful reading of the standards[1].
In my view, your example is simply wrong - the two dependencies should have different names.
But as stated, the user would see different behavior between (1) running pip install or equivalent on a machine with Python 3.10 and Python 3.11, and (2) running pip install or equivalent from a lockfile generated by the same input requirements with Python 3.10 and Python 3.11. That seems not good to me.
I think the scenario Iāve described here is not entirely contrived. Imagine youāre using a Git dependency, and you tend to just leave the version in the repo at 0.0.1, but want to use different commits or different branches for different Python versions.
Or, imagine that you want to use requests from PyPI for Python 3.10 and below, but patch it with a Git fork on Python 3.11.
The response to these scenarios might be for the user to do something different, but I find it really unintuitive from a user perspective. Like, itās surprising that this would universally pick one of a7919970 or 4fe0aeba (implementation-defined, I think?) regardless of the userās sys_platform:
Indeed itās not good. But only in the same way as any situation where the user has a bug in their system.
Your mistake is āleaving the version in the repo at 0.0.1ā. You should do something like constructing the version number with a local identifier of the commit ID.
You change the version to add a local identifier in your patch.
I really donāt know how to respond to this. Itās so fundamental to how Python packaging works that Iām struggling to find a way to explain that isnāt just ādonāt do thatā
One higher-level problem here is that the packaging ecosystem is based around distribution. Managing development environments, which are fundamentally far more fluid than a released artefact, is a very different situation. And while people have, over the years, forced tools like pip into use in a development workflow context, they were (in general) never intended for that purpose, and the cracks do show, at times.
Maybe if we were starting from a clean slate, weād do things differently, but at least in terms of the standards and older tools like pip, we donāt have that luxury. Youāre coming at things from a different perspective with uv, and can start without preconceived assumptions. But I would strongly advise you to look at āmanaging in-development codeā and āinstalling distributed packagesā as two separate things, otherwise youāll keep hitting this sort of misunderstanding.
Why would it do that? If those two items are in a requirements file, or a dependency list, only one of the markers would evaluate to true, and that item would be picked and installed. I donāt know what uv does, but pip would checkout the appropriate commit to a temporary directory and build a wheel from that, and install it. The version number and name of the installed package would be whatever the wheel build said it was (in the metadata) - pip would fail with an error if the name wasnāt āflaskā, but it would accept whatever version was generated.
As I said, youāre coming at this from a different perspective. Which is good. What concerns me is that you (in the context of uv) might be trying to solve a slightly different problem than the existing ecosystem is focused on - and as a result, what youāre asking from the standards has far wider implications than you imagine. And while Iād very much want to incorporate your insights into the standards, so that we donāt end up with a split where uv is forced to ādo its own thingā[1], thereās only so much we can achieve with the resources we have.
Of course, itās also possible Iām making too much of this - maybe @brettcannon can see an easy way to incorporate what youāre suggesting into the lockfile proposal. In which case, Iāll be happy and will apologise for making a fuss over nothing
Thanks Paul. Iāll try to keep this one brief for now
Thanks, I appreciate this sentiment (around avoiding some kind of split) and I share it.
This is very interesting for me ā again, thanks for sharing.
Have the lines blurred here over time? For example, taken to the extreme, would it not be correct (or at least valid) for pip install git+https://github.com/pallets/flask@4fe0aebab79a092615f5f86a24b91bac07fb2ef2 after pip install git+https://github.com/pallets/flask@a791997041b94b8a5effebc296cb427fde8e0ee5 to be a no-op, since you already have flask==3.1.0.dev0 in your environment? The commits donāt match, but they have the same package-version pair, and so should be functionally equivalent. (Feel free to just ignore this question if you feel itās too far afield from the discussion at hand.)
I agree with these! Iām just pointing out that the results are unintuitive. Users will hit this behavior and report bugs. And it just seems entirely preventable, with a different schema. But I will take some time to reflect on this.
I just want to clarify this one: yes, uv would do the same thing!
But the Package Locking lockfile has no way to represent this, right? There can be exactly one package entry for āflask==3.1.0.dev0ā (name and version must be unique across entries), with one marker. That package entry can have a single [packages.vcs] sub-table. That sub-table can only point to a single commit. My point is that the lockfile will not be able to capture this scenario (perhaps Iām wrong and weāre talking past each other) ā so any installer would subsequently get it āwrongā (at least compared to taking that dependency list and installing it with pip or uv).
Thatās a legitimate concern and the main reason the process allows for provisional acceptance periods where we may amend provisionally accepted specifications if a spec is determined to be sufficiently flawed that itās better to take the pain of an early compatibility break over living with the flaw indefinitely.
(I canāt recall a case where weāve ever actually used that escape hatch, though. Most post-acceptance fixes have come in the form of clarifying ambiguities rather than having to make genuinely backwards incompatible changes to provisionally accepted specs)
While we try to avoid assigning semantics to the local version labels in the shared specifications, I think this is a case where itās legitimate to mandate a particular way of using them. (For anyone not clear on what Paul, Charlie, and I are referring to, itās the ...+<mostly arbitrary label> part of the version identifier spec)
Handling situations where name==public_version is ambiguous is exactly the reason the local identifier escape hatch exists. Being able to compose multiple local version identifier segments even without fully defined semantics is the reason the spec reserves . as a field separator.
So while I think youāre right that PEP 751 should cover this situation, I also think that coverage may be as simple as saying something like:
A lock file may need to represent modified versions of packages with the same nominal public version (for example, a library may need patches applied for compatibility with different platforms or Python versions). To handle such situations, the locking tool MUST generate (or be given) appropriate local identifier suffixes to use in the [[packages]] array entry for the otherwise ambiguous versions such that the combination of packages.name and packages.version remains unique within the [[packages]] array. If the nominal version of the package already includes a local version identifier, the disambiguation suffix MUST be appended as a new local version identifier segment (after a separating .).
Pip special cases installs of āthings that come from places that might changeā (and URLs are one of those, even though these particular URLs are static) and rebuilds/reinstalls.
I donāt recall the precise details because itās messy ādo what I meanā logic, and does not fit with pipās underlying model () but itās something like that.