PEP 751: lock files (again)

brettcannon · July 25, 2024, 6:51pm

This was all last discussed in Lock files, again (but this time w/ sdists!) . Probably the biggest change since the initial post of that topic is adding support for per-package locking instead of only per-file locking (it’s explained in the PEP what those terms mean). I also focused on making the format work well when read as a diff for changes, so there’s a bit more information for people to understand why a package got pulled in along w/ minimizing having to read other parts of the file (when you look at the per-file locking example at mousebender/pylock.example.toml at pep · brettcannon/mousebender · GitHub you can also look at a couple of recent commit diffs I did just to show what a version change looks like). This also means the file itself is smaller than my initial proposal.

And while I thank them in the acknowledgements section of the PEP, I want to thank again @pf_moore , @radoering of Poetry, @ofek of Hatch, and @sethmlarson of Python security for reading the initial draft of this PEP back in March (the PEP hasn’t changed substantially since then; mostly tightening up some ambiguity).

woodruffw · July 25, 2024, 7:51pm

Thank you @brettcannon! This is incredibly exciting, and I think the dual approach (both package and file being available) makes a lot of sense.

Something that I realized I forgot to post in the original discussion thread (or if I did, I’ve long since lost it ): with PEP 740 we now have a notion of a signing “identity” behind packages, which is effectively a piece of metadata (not baked into the dist, but tracked/checked by the index). In practice, at the moment, these “identities” are really just Trusted Publishers (i.e. the GitHub, GitLab, etc. URL that publishes the package).

To make PEP 740 useful to downstream users (i.e. doing verification not just on the index, but within clients), we could include those identities in the lockfile. This would enable TOFU-style schemes, making it harder for an attacker to compromise package foo either by changing the location of foo’s upstream repository or by compromising the index itself (since the index can’t impersonate foo’s original signing identity).

How would you feel about including these kinds of identities in the lockfile schema? As a rough sketch I think something like [[package.files.identities]] as an array of tables would suffice, e.g.:

[[packages.files]]

name = example
# ...

# this package can be signed by either of these identities
[[packages.files.identities]]
type = "GitHub"
repository = "example/example"
workflow = "release.yml"

[[packages.files.identities]]
type = "GitLab"
repository = "example/example"

(That’s a very rough sketch – it could probably be made less verbose, if desirable.)

I’m curious what you think about this – I could see this being too big of an ask/too much additional design space for version = "1.0" as stipulated in the PEP, but figured I’d open it up for consideration.

(+CC @dustin, since I know this interests him as well)

sethmlarson · July 25, 2024, 8:20pm

Thank you for putting this together Brett, I wanted to commend you on your tenacity and care for this important topic!

I have some concern that users won’t know (nor should they need to know) the difference between package-locks and file-locks, thinking that a package-lock has file-lock properties or the other way around. That means that implementations will have to do some deciding on behalf of users via defaults.

In my mind, the file locking scenario is what I first think of when I think of a “lock file”, I don’t know if that train of thought should manifest as a recommendation in the PEP for implementations, or something similar? For example, tools generating lock files should favor the “file-lock scenario”, unless configured explicitly to require locking with multiple candidates per package.

Maybe there won’t be many tools supporting both approaches? Either way, users will need to audit whatever is in their lock files regardless of the locking approach since they are all candidates to be installed.

My own thoughts on this would be that this information would get encoded into the “package.tool” or “tool” section of the lock file. Also, happy to help you write a standard to encode this information into lock files once the lock file PEP is accepted.

woodruffw · July 25, 2024, 8:22pm

Blarg, I completely missed that tool meant the same thing as in pyproject.toml. Agreed that it makes sense there, and that putting it there avoids any special accomodations!

pradyunsg · July 25, 2024, 8:30pm

Are the separate package.tool and tool intentional? What’s the goal with this separation, if so?

pradyunsg · July 25, 2024, 8:32pm

Ah, one of them is on a per-package basis vs top level. I think having clearer descriptions in them would be helpful because they’re meaningfully distinct and also distinct from how (specifically) pyproject.toml might use them today - no one’s expecting black configuration to live in the lock file.

woodruffw · July 25, 2024, 8:34pm

Yep, that was my read. For PEP 740 purposes package.tool is useful to me, since each package’s set of valid identities is discrete and only valid for that specific package name

pf_moore · July 25, 2024, 9:03pm

I have yet to read the new PEP - I followed the drafts, and have a broad picture, but it was a while ago, and my memory is hazy Having said that, I’m not convinced that we can avoid users needing to know the difference between the two types of lock. A user choosing between PDM and pip-tools has to understand how the forms of locking provided by those tools differ.

Like it or not, I think this PEP is going to have to formally establish some “official” terminology that we can use in tool documentation. It’s possible that the terms we choose will never catch on in popular usage (much like the terms “distribution package” and “import package” tend not to be used in casual discussions) but that’s less important than having precise terms that can be used in formal contexts. What’s important is that we have common terminology defined in the standards - so that users don’t provide a “package lock” as input to an installer that only handles “file locks”, for example.

The “Locking Scenarios” section covers this to an extent, but I’m thinking of it more in terms of how a tool like PDM, or uv pip compile, would describe in a usage summary what sort of lockfile they produce. As a user, I’d like to be able to type pdm lock --help and uv pip compile --help, and be able to understand from the help summary, what sort of locking the tools will do, and what differences I can expect in terms of capabilities and limitations. With well-defined terminology, “Creates a package lock for the current project” or “Creates a file lock from the specified requirements.in file” could be sufficient to do that.

brettcannon · July 25, 2024, 9:45pm

I don’t know if I want to have that fight because I can already see people coming at it from both directions. Plus, w/o the concept of per-file locking not already existing in Python packaging I don’t know if that’s actually what most people will think of what they think “lock file”.

I’ll come up w/ something.

Famous last words as you very well know.

I tried to do that in the PEP knowing that talking about concepts got a bit muddled last time due to a lack of common vocabulary.

brettcannon · July 25, 2024, 9:58pm

Clarified w/ an example thanks to @woodruffw .

pf_moore · July 25, 2024, 10:01pm

As I said, I’ve not read the whole thing yet - if you meant the “Locking Scenarios” section, then it’s the sort of discussion I’m thinking of, but framed as “what I want to do” where I’m suggesting that people are more likely to want to see it from the perspective of “what does this tool do?”

I’ll have a proper read of the whole PEP, and try to come up with some more concrete suggestions.

sethmlarson · July 25, 2024, 10:10pm

Ack, I’m okay with not encoding a recommendation in the PEP.

barry · July 25, 2024, 10:58pm

Minor point, but this sentence in the Rationale doesn’t parse to me:

It is also to facilitate easy understanding of what would be installed if the lock file without necessitating running a tool, once again to help with auditing.

A typo perhaps, but I’m not sure what was intended?

While I’m here, there’s another minor typo:

[…] minimmize […]

barry · July 25, 2024, 11:26pm

@brettcannon - Have you considered more clearly separating file locking and package locking by defining them in separately named files? Since file-lock is mutually exclusive with package-lock and there aren’t a lot of other top level keys ^[1], you could potentially differentiate the two use cases using different file names. It would make the specification a little more complicated, but perhaps ease the reasoning burden on the end-user ^[2] easier.

E.g. pypkglock.toml vs pyfilelock.toml with the rest of the File Name section relatively unchanged.

minimizing the DRY involved ↩︎
and potentially, parsing of the files by tools ↩︎

DanCardin · July 26, 2024, 12:58am

It’s not obvious to me from the PEP text how extras, pyproject.toml optional-dependencies, or the equivalent of poetry “dependency groups” would work here (henceforth using “extras” to mean all three). They dont seem to be part of the “marker” detail, which was the only place that seemed obvious.

And if it’s not in the lockfile, unless i’m misunderstanding something, it feels like there would have to be 1 lockfile per combination of available extras? Without a callout for it on the dependency itself, i cant see how it’d be evaluated without multiple files.

Also, perhaps it’s left to the implementer to decide this, but having separate files feels problematic. It feels like unless things like extras are considered during the resolution of every environment/lockfile, it would be trivially easy to generate two lockfiles (at the same time) that generate mutually incompatible depdendencies.

ncoghlan · July 26, 2024, 7:04am

Very nice @brettcannon!

Something I’d like to see added to the design discussion part of the PEP is the trade-offs between implicitly allowing all packages to be optional or target dependent (the approach in the PEP), and having separate top-level lists for [[common-packages]] and [[conditional-packages]].

While the separation would create some redundancies in the spec and file processing, it still seems genuinely valuable to me from an auditing perspective when the common packages are unequivocally separated out rather than having to infer the common packages from:

package.marker is not set (when [package-lock] is used)
only one [[package.files]] entry is defined for that package (when [[file-lock]] is used)

(I don’t think skipping this is a huge deal, since programmatic scanners will be easy to write regardless, allowing this to be checked in CI or pre-commit hooks, I just liked the idea of having the presence or absence of conditional packages also be easy for a human reader to determine)

Listing those conditions like that did highlight a naming inconsistency with [[package.files]]: it is pluralised, while [[file-lock]] and [[package]] both use the singular form. Perhaps it would be worth pluralising all of them to emphasise that these are lists of tables rather than singular tables?

I was about to concur wholeheartedly with this question, and then realised I actually saw genuine benefits in allowing pylock.toml to use either format: whether to use [package-lock] or multiple [[file-lock]] entries doesn’t actually change the purpose of the files (installation consistency), it’s just a question of which installation strategy is most appropriate for a given use case. Putting that information inside the file means that only the installer tools themselves need to care about the technical details, the surrounding tools only need to know about the one filename pattern that needs to be passed to Python installer tools, they don’t need to know about the two different options.

The one point that gives me pause on that front is whether it would ever make sense to develop a [[file-lock]]-only installer. If external environments might want to handle package locks and file locks with different tools (including disallowing the use of package locks entirely), then encoding that information in the filenames would be helpful rather than irritating. (I’d be more in favour of pylock.* and pyfilelock.* as the prefixes corresponding to package locking and file locking rather than renaming both, though - the fact package locking is the version of locking that already has widely used implementations seems to me to offer sufficient justification for giving it the more obvious name).

I agree that framing is currently missing from the PEP text, and I believe both descriptions would benefit from explicit usage recommendations along the lines of:

“Per-file locking should be used when the installation attempt should fail outright if there is no explicitly pre-approved set of installation artifacts for the target platform. For example: locking the deployment dependencies for a managed web service.”
“Per-package locking should be used when the exact set of potential target platforms is not known when generating the lock file, as it allows installation tools to choose the most appropriate artifacts for each platform from the pre-approved set. For example: locking the development dependencies for an open source project.”

ncoghlan · July 26, 2024, 7:11am

Extras and dependency groups don’t exist when installing a locked set of dependencies, as they’re resolution time concepts, and resolution only happens when locking, not when installing.

In a lock file, extras would only appear in the top level dependencies list that is used to record the requested dependencies that were used to derive the reset of the lock file, and in the informational (and hence optional) per-package dependents and dependency lists in the individual package entries.

Dependency groups, if they appeared at all, would only appear in a tool-specific [tool] table entry that reported how the top-level dependencies list itself was derived from the locking tool’s own inputs.

But yes, if you did want to lock for different top level combinations of extras and dependency groups, then you would need to generate a separate lockfile for each combination of interest. That use case is one of the reasons for supporting multiple lockfiles rather than assuming each project will only ever need exactly one.

groodt · July 26, 2024, 9:53am

I’ve just read the latest PEP and I think it’s simple. I mean that in the most flattering way possible. Simple is very, very difficult.

I think this is an excellent piece of work and I just want to thank you for your relentless dedication to this problem.

DanCardin · July 26, 2024, 11:14am

I dont really see why the conditional nature of installing an optional dependency (i.e. extra) is inherently different from a python_version marker that’d cause conditional installation of some dependency within a single lockfile.

Also extras-installation is definitely still an installation-time concept. Just that by relegating it to separate files, you’re just externalizing the decision to the tool or the user, rather than encoding it into the file scheme. Which kind of makes it seem antithetical to the also explicit idea that you enable a project being just a resolver or an installer. In order for this lockfile scheme to be useful to a PDM/Poetry have APIs for declaring extras. They’ll need to devise a scheme for generating multiple files, and the installer would need to correspondingly know the scheme in order to know which lockfile to use.

And again, i feel like there are significant drawbacks to splitting them among files. For the simple case of dependencies and test dependencies, it doubles the number of lines of duplicate lockfile content. And it implies the two files aren’t dependent on one another, when they should be. With some PEP-631 compliant example:

[package.dependencies]
sqlalchemy = ">=1.3, <2"

[package.optional-dependencies]
tests = ["foo"]

where foo depends on sqlalchemy<1.4, when 1.4 is released you have a problem, unless the resolver is simultaneously generating both files from the same locking information. By locking a pylock.yml and a pylock.tests.yml, separately my only way of installing tests will resolve to sqlalchemy==1.3, whereas my “production” build will produce 1.4. If there’s some incompatibility i’m not aware of, my tests will imply everything is fine, when it’s not. Locking of the extras-inclusive version “needs” to essentially be based the exact lockfile set of dependencies and the pyproject.toml specifiers for the test dependencies.

Contrast that with extras being baked into the format, it would always lock the full declared dependency-set at once, and the installation-time decision to choose an extra or not would work the same way as e.g. python_version: to omit certain dependencies when they’re not applicable to the requested install command.

charliermarsh · July 26, 2024, 1:30pm

Thanks Brett! Really appreciate all the work that’s gone into the PEP. I know lockfiles have been a journey

I’ll try to keep it brief, but some background on how this all works in uv today. Historically, we’ve used the requirements.txt format as both resolver input and resolver output in the uv pip interface. (E.g., the convention is to use requirements.in as the name of an input file, and requirements.txt as the output, but the fundamental format is of course the same.) So requirements.txt is used as a sort of “lockfile” by way of being the uv pip compile output.

Separately, we have a series of APIs that are available in preview but haven’t been stabilized, which use pyproject.toml as input and a new uv.lock format as output (defined here in code; here’s an example snapshot). The uv.lock file is designed to handle what was referred to as Poetry- or PDM-style “universal” resolution, such that we produce a single lock for all environments (like “Package Locking” in this PEP). Like this proposal, it doesn’t require doing a “resolution” at install time, though there are some differences in the format.

My previous opinion was that we’d support the File Locking behavior that was put forth in the last proposal, but that we couldn’t yet commit to support the Package Locking behavior (if the proposal were amended) since we weren’t yet sure on our own requirements.

After reading the current proposal, my feeling is that, if standardized as-is, our intent would be to support PEP 751-style lockfiles, but as an alternative to requirements.txt rather than uv.lock (so, uv pip compile could produce them, and our installer APIs could consume them). The Package Locking proposal is close to what we need, but there are some important differences. For example:

We include all extras and dependency groups (i.e., development dependencies) in the lockfile, so that users can toggle the enabled extras at install time. (Poetry does this too, IIRC?)
We include uv-specific metadata in the lockfile (e.g., whether pre-releases were enabled; whether the user locked with minimum or maximum version resolution), so that we can invalidate it if the user changes those settings.
We have a distinction between packages that are locked as editable vs. those that are not.

(Of course, I’m open to discussing these requirements and whether they make sense to tackle within the scope of the PEP; more just reporting the world as it exists today.)

My honest opinion (which is biased) is that the lockfile proposed here is a clear improvement over the way we use requirements.txt today, but not a clear improvement over the formats used by Poetry, PDM, uv, etc. for their lockfile use-cases, and so I’m worried that there won’t be enough value-add to convince those tools to move over, given that they already serve as both resolvers and installers (and so benefit less from standardization here when weighed against the ability to iterate independently on their own proprietary formats). And, in that light, I kind of prefer a proposal that just does File Locking and eschews the complexity of Package Locking. But again, if standardized, my intent would be for us to support it as an input and output format in the uv pip APIs and elsewhere as appropriate.