PEP 751: lock files (again)

(I do have some questions that are separate from the previous opinion, so adding them as a standalone post.)

Lockers MUST NOT generate multiple [file-lock] tables which would be considered compatible for the same environment.

This means that the environments must be completely disjoint, right? Or could you have a “Python 3.12” environment and then a “Python 3.12 on ARM Windows” environment?


file-lock.wheel-tags

  • An unordered array of wheel tags which must be supported by the environment.

Can we clarify that the the environment must match all tags, rather than any? Or you can just call this a comprehension issue on my part. Originally, I was confused as to how an installer would choose between these given that they both contain py3-none-any (from the example):

[[file-lock]]
name = 'cp312-cp312-manylinux_2_17_x86_64'
marker-values = {}
wheel-tags = ['cp312-cp312-manylinux_2_17_x86_64', 'py3-none-any']

[[file-lock]]
name = 'cp312-cp312-win_amd64'
marker-values = {}
wheel-tags = ['cp312-cp312-win_amd64', 'cp312-none-win_amd64', 'py3-none-any']

I wish there were more guidance on how to generate the [[file-lock]] entries – what each entry should “be”, how to determine the marker values, the wheel tag values, etc. In the example above, each entry name is a specific wheel tag, and then the wheel-tags are the relaxed variants of that tag. Are there other examples of what you might do here that would be reasonable?

4 Likes

The key thing that results in a different lockfile is a different declared set of input dependencies. project and project[tests] are different dependency declarations (the latter presumably includes additional dependencies that the former omits), so they’re expressed as different lock files.

By contrast, project; python_version < "3.12" is a single input dependency (even when the project is listed multiple times with different version constraints depending on the Python version), it’s just one where the effect of the dependency declaration may vary based on the Python version of any given target environment.

The current tools that support locking already allow for the notion of “constraint” files: these are extra inputs to the resolver that say “If a distribution package listed in the constraint file is included in the declared dependency set, then apply these additional requirements to it” (the additional requirements usually take the form of an exact version pin).

That capability won’t go away just because there is a formally specified lock file format available (indeed, the interoperability between tools in that regard should be enhanced, since the lock file format being proposed should also be viable as a constraint file format).

In the scenario described, the bare project would be locked first, and then that lock file would be used as a constraint file to lock any extras-inclusive lock files that are to be generated (and locking tools that are also project workflow management tools may even do that implicitly when lcoking a single project lockfile rather than requiring that it be requested explicitly).

1 Like

Exactly. Regarding one file vs. multiple files @ncoghlan @DanCardin:

With multiple files you need one file for each combination of extras and groups. That does not scale well with many extras/groups.

Poetry resolves all (root project) extras and dependency groups into one lock file so that you can choose at install time which extras and dependency groups to install from the lock file. Extras can be expressed via markers so that one is easy but for dependency groups you need something else. Currently, Poetry does not lock markers and groups but resolves again with a “lock file repository” at install time, but triggered by Bretts initial draft I worked on Lock markers and groups by radoering · Pull Request #9427 · python-poetry/poetry · GitHub so that we will be able to adopt a standardized package-lock lock file with less effort.

@brettcannon Not sure, if this should be addressed now or later but package-lock might be influenced by PEP 735 – Dependency Groups in pyproject.toml | peps.python.org If you want to lock all groups (like Poetry already does), you need a package.groups to lock which groups include the package and package.marker can be different per group.

For better understanding: In Lock markers and groups by radoering · Pull Request #9427 · python-poetry/poetry · GitHub (in Poetry’s format) it can look like this if the marker is the same for all groups:

groups = ["github-actions", "test"]
markers = "python_version < \"3.11\""

or like this if they are different:

groups = ["main", "github-actions", "test"]
markers = {
    main = "os_name == \"nt\"",
    github-actions = "sys_platform == \"win32\"",
    test = "sys_platform == \"win32\"",
}

(Real world examples from Poetry’s own lock file.)

If there is no standardized way to define groups, we can still put it in the per-package tool section. We can put a dummy marker (union of all markers?) in package.marker and the per-group markers into the tool section. (Of course, that would mean that other installers, which do not consider Poetry’s tool section, may install additional packages, which are not required for a specific group in a specific environment.)

We have that, too. This could be put in the (per-package) tool section. However, that means that the lock file is more bound to a specific installer, i.e. other installers would ignore it…

I think such information can be put into the (global) tool section. It is not relevant for the installer, is it? Thus, it should not even hurt to be tool specific.

4 Likes

Disagree. What I imagine everyone must want from a lockfile is that the locked dependencies for project and project[tests] are identical, but whereby the dependencies included in tests are omitted when not selected for. Essentially identically to python_version. It differs in that it’s not a standardized value specified ahead of time, but in effect it’s the same thing, some specific marker of relevancy to the current installation command.

The moment you start treating them as two separate, independent dependency declarations, it will start using different versions of packages between the two. Which for the example of a “tests” extra, seems like the worst thing that can happen.

I can appreciate the distinction you’re trying to make with “constraint” files, but i dont think it applies to extras, because the locked versions are the same regardless, it’s just choosing to install fewer packages. Whereas with “constraint” files, it may be less specific, is what it seems like you’re saying.

4 Likes

That matches my mental model.

3 Likes

I like this framing and if I’m understanding things correctly, would give us a way to decide how to lock in face of additional environmental constraints that aren’t currently captured in markers and tags. I’m thinking specifically about the whole GPU question[1]. @msarahan can probably speak to the intersection between GPU support and lock files more coherently than I can, but I’m also wondering if either @brettcannon or @ncoghlan have thought about how these topics might interact. There are (at least) two things I think we’d need to think about:

  • Locking for a specific environment. E.g. You resolve all your variants for this exact configuration of hardware and software, and capture those in per-file locking. Change your GPU and you invalidate your lock file.
  • Locking for a flexible/portable environment. E.g. you don’t fully resolve all your variants at lock time, but you do know the package set you want to install. However, you want the lock files to be portable between machines with potentially different variants, so you have to leave some of the installation choices up to the installer and not the locker. E.g. move your lock file to a machine with a different GPU and everything still “works”. I assume this would fall under per-package locking.

  1. although I use “GPU” in the more generic sense ↩︎

I see the logic, but I will point out that it’s simply not how things work at the moment. pip install project and pip install project[tests] can quite happily install a different version of a dependency, because the base list of requirements is different in the two cases.

The question here is essentially whether a lockfile is a persistently stored representation of what would happen if you did a pip install, or whether it’s a more complex construct that captures a single result that can satisfy any of a number of installation requests.

I think the more complex idea has merit, but to my knowledge it’s not been raised at all in any of the lockfile discussions we’ve had until now, and I believe it’s going to require some non-trivial rethinking of both the design and the user experience if we want to go down that route.

For example, suppose we have a package A, that depends on B. And A has an extra X that depends on C, which in turn depends on B<2.0. And suppose B 1.0 and B 2.0 exist. Then a lock of the type you describe for A will require B 1.0, in order to support being used to install A[X]. That seems very odd to me - I lock for A, and install the lockfile, and I get a different version of B than I’d get if I simply installed A. I may not even know that A has that extra, and I quite likely don’t care about it.

Personally, I feel like this sort of locking might need to be deferred to a later iteration of the lockfile specification, unless there’s an existing implementation of it that has seen significant use, and which can be used to provide practical answers for situations like the one I proposed above. I don’t think this is something we should attempt to design without proper real-world experience of how it plays out in an actual tool.

2 Likes

I am used to thinking about lockfiles as creating environments, but I think it is better how the PEP describes them: to install a set of packages, whether it is to create a new environment or into an existing one. My wording may reflect my older mentality, but I do agree that the generalization away from environment-specific ideas is important and helpful.

This is a really cool idea - thanks for elaborating, @pf_moore. The way that I interpreted the “file-lock” spec was that it was the former idea: a finite set of distributions that match wheel tags or some other environment specifier. This is a natural place for extension to GPU/generic metadata, though I’m not sure yet if wheel tags will work for the generic case (out of scalability concerns that Paul expressed in other threads), but the general notion of matching one and only one (or one “best”) set of packages to install is solid.

It sounds like what uv is doing, and what the package-lock spec is doing to a lesser extent, is to be more greedy in terms of capturing everything that an installation of those packages might be expanded to mean. I like the explicit way that the PEP handles this possibility for per-file locking:

Lockers MAY want to provide a way to let users provide the information necessary to install for multiple environments at once when doing per-file locking, e.g. supporting a JSON file format which specifies wheel tags and marker values much like in [[file-lock]] for which multiple files can be specified, which could then be directly recorded in the corresponding [[file-lock]] table (if it allowed for unambiguous per-file locking environment selection)
{
“marker-values”: {“”: “”},
“wheel-tags”: [“”]
}

However, I’m confused on how much “resolution” the package-lock strategy avoids at runtime. Do I understand correctly that:

  • the lockfile is a graph of packages, where each package may have a different version
  • install time means traversing the graph with a particular set of inputs (platform metadata, extras selection, future generic metadata (one day?)
  • This traversal means concrete-ifying the graph from package name/version into resolved dist file references

Could it be described as a kind of graph reduction/simplification/subsetting, and what the lockfile captures is that graph along with the traversal starting point?

1 Like

As an installer maintainer, my view is that a lockfile should completely avoid any need for resolution at install time. I say that because I’m very aware of the complexities that can arise in even what seem like simple cases. So in particular, I would not expect an installer to have to choose a package version based on dependency data - and indeed, the PEP clearly notes that package.dependencies “does not provide information which influences the installer”.

That’s not my understanding. I believe an installer should simply need to sequentially read the list of package/version entries, and select any for which the package.marker expression evaluates to true. I assume it would be an error if this process selected more than one version for a given package.

There’s no graph visible to the installer. The graph edges are defined by the dependency data, so the graph is present but the installer is specified to not be influenced by it.

2 Likes

I’m going to change it to:

It is also designed to facilitate easy understanding of what would be installed from the lock file without necessitating running a tool, once again to help with auditing.

The amount of pushback over the file name by some, along w/ people not liking having multiple files did not cause me to think about totally separate files. At that point you have two PEPs, and I got so much pushback from just doing per-file locking that I’m worried that will sink this whole endeavour (unless people actually say they would accept the per-file part as-is and that per-package locking is close, but just needs some tweaks).

Off the top of my head, the separation isn’t as clear to me for per-file locking.

I did it that way because I expect package.files to typically be inlined, so singular spelling looked weird. As for the rest, I thought I found something that suggested that was TOML practice, but I can’t find it now, so I’ll change the table arrays to be plural.

I will blatantly copy that text into the PEP. :grin:

But baked in how? I think you’re assuming pyproject.toml is always the input and thus you have a singular concept of what the potential extras are because they are coming from project.optional-dependencies. But the PEP is written such that the input could be anything as not every Python project has a pyproject.toml file, nor uses a [project] table.

How does the user specify what extras they want? Is the assumption it always stems from a pyproject.toml file? Are you assuming everyone has a [project] table?

Do [tool] or [packages.tool] not work in that instance?

What are you specifically after instead of packages.directory?

I tried that and got pushback that wasn’t good enough.

Yes, otherwise how does the installer know which file lock applies? The only way I can see this working would either be introducing priorities to [[file-locks]] entries or something along the lines of, “if two [[file-locks]] entries match, select the one where more wheel tags match” or something. But every time I thought about that it felt a bit arbitrary.

Sorry, I thought not saying “any” implied “all”; I make it more explicit.

Whatever it takes to make them disjoint. For most cases I would expect wheel tags alone will do it. Maybe the Python version or some other marker might be enough. Otherwise you can record everything that went into resolving.

I only used the tag name because it was the easiest to do. It could have easily have been e.g., “Python 3.13 on Windows”.

I think we all need to remember two things:

  1. PEP 735 is still draft, so I technically can’t rely on it existing.
  2. I don’t know if we as a group have decided that if PEP 735 gets accepted that we expect pyproject.toml to become the way we do inputs into packaging tools.

Because if we don’t make pyproject.toml the input for everything then people asking for extras support are coming from a place that is specific for a way to do input into tools, but not the only way (e.g., my PoC takes stuff on the command-line for what to install, so there is no concept of an “extra” or “dependency group” and yet it still works.

It depends on how strict you want to be for per-file locking (per-package locking should be fine and is just a question of how to encode the details of which file to install for what GPU). Would you expect to lock to the GPU as well or not?

It hasn’t, probably because of the input question I raised above (i.e. PEP 735 isn’t a thing yet).

Paul is right. PEP 751 – A file format to list Python dependencies for installation reproducibility | peps.python.org covers what installers are expected to do and you will notice it’s a linear scan of the listed package versions.

The only “graph” is the one you can draw if you wanted to w/ optional data to help visualize your dependency graph, but it isn’t used by an installer.

6 Likes

Yes, I would. That means that a per-file lock wouldn’t be portable to machine with a different GPU, but I think that’s no different than portability of that same per-file lock to a CPU with a different architecture. But since…

There can be multiple environments specified in a single file, each with their own set of files to install. By specifying the exact files to install, installers avoid performing any resolution to decide what to install.

… I think there’s no problem, as it would be possible (theoretically) to have a different environment with different file-locks for the different-GPU machine.

I think this punts the question back to the GPU/variants threads, but it does mean that any locker calculating the set of files that go into a per-file lock section must perform the same dynamic resolution algorithm as specified in that future PEP.

1 Like

Oops, I completely missed this. Yes, this would work for the case I described (storing tool-specific metadata), and in general that flexibility seems helpful.

When locking, or when installing? When locking, yes, we assume that the extra groups are defined in a pyproject.toml file. When installing, the user provides the enabled extras on the command-line. The extras are encoded in the lockfile.

We have a distinction in the uv lockfile between those packages that should be installed as editable and those that should not. The schema is the same between those two kinds of entries, but the installer installs them as editable or not based on that key. I’m anticipating a response here along the lines of, “That shouldn’t be part of the lock, it should be an input from the user at install-time”?

Different people will have different opinions on it :slight_smile: Perhaps my reaction just comes from being wary about scope – we had a lot of discussion and iteration on File Locking, but that Package Locking was new in this draft and a significant expansion. We can see how much consensus we’re able to build around the Package Locking proposal during this period!

(Just to clarify (because uv was mentioned in that post): in uv, we do write the graph edges to the lockfile along with the relevant markers, instead of writing them to the nodes as in this proposal. At install time, we just do a breadth-first traversal of the graph – there’s no solver, you just follow the edges and ignore edges with non-matching markers.)

(obviously whatever poetry/pdm/uv are already doing is going to be more well thought out than my example solution, but for example…)

I dont see why this would imply a pyproject.toml necessarily, so long as the information about the set of groups/extras is included in the lockfile. Record the groups at the top level of the file (generalizing across dependency groups and extras, which should be the same as far as installation goes), and then record a groups: ["foo", "bar"] per dependency that belongs to those groups (if that dependency is not already required outside groups).

An installer with no concept of this feature, like you mention, is simply the same as a no-extras scenario, i.e. install only dependencies with no groups. An installer with the ability to select for extras/groups, given some set of groups (maybe check that the groups are valid and…) would install a dependency if at least one dependency’s groups matches the groups provided.

It doesn’t seem obvious to me how this concept is any different than any of the static environmental markers. Some python_version==3.9 in the lockfile is really just the same thing as if someone supplied a python39 extra, as far as the comparative installation behavior goes…beyond the fact that python_version involves math and applies to transitive dependencies, whereas extras/groups do neither.

If nothing else, I feel like it should be explicit in the PEP about how/why extras/groups are not important, or how they can be punted and backwards compatibly standardized later

1 Like

FWIW if you asked me how I would think lockfiles should work, I would not expect this property to hold. Feels to me like this counts as different “input dependency specifiers”.
Also, to go even further than Paul’s example, it is perfectly valid for a project to have multiple extras whose requirements conflict. If foo[tests] and foo[docs] conflict (including transitively), I think the behavior you describe to

is not possible. There can’t be a single set of package-versions that satisfies all possible extras. And if the solution were for the installer to determine which version combinations can satisfy the requested extras at install time or whether they can’t be satisfied, you’re basically back at doing resolution.

2 Likes

What’s stopping us from (ab)using the semi-specified extra environment marker as a way to add extras support? It may lead to different versions of dependencies being installed based on the selection of extras specified at install time, but I think that’s what the user signs up for in this case anyway.

Making package entries disjoint should be easy: whenever an extra uses a different version (of B, in Paul’s example) (marker = 'extra == "test"'), then the original would have the opposite marker (marker = 'extra != "test"'). [1]

You may then want a list of all possible extras in [package-lock], where the UX is up to the installer (eg what to do when a user requests an unknown marker, or when markets conflict).

I have a feeling I’ve not considered something, please let me know when you discover what.


Some critique:

  • Is there specification on what to do for unknown TOML table keys? If there is, I’ve missed it (I expect it just under the “File Format” heading)
  • Not all TOML table specifications say that their keys should stay in PEP order. Can’t hurt to add that

I want to use file-locking while installing the package in the repo which contains the lock file. I don’t need to lock this package itself because it’s in source control, but I do want to lock the build requirements. It’s there a way to use this lock file to lock the build requirements?


  1. my interpretation of the environment marker string extra == "test" is that it gets transformed to the Python code "test" in requested_extras, and similar for != ↩︎

1 Like

For per-file locking, the [[common-packages]] array would be limited to those entries where there is only one [[packages.files]] entry for the package (or no file entries with a packages.vcs entry instead), and that file or VCS entry would list every defined file lock in its lock array.

If the two arrays were split, [[common-packages]] entries could potentially just omit their lock arrays entirely, since being in that array inherently means “all of them”.

The TOML project itself doesn’t take an official stance on the topic (e.g. see the discussion in Naming conventions for arrays of tables · toml-lang/toml · Discussion #932 · GitHub ).

From that TOML project discussion, you may have been thinking of this mention in the cargo docs that favours the singular forms: Cargo Targets - The Cargo Book

Naming conventions for arrays of tables is the one area of TOML where I’ve never found an approach I’m genuinely happy with. The pluralised names look weird when used to refer to a single entry in the expanded form, and the singular names look weird when used to refer to an inline list, as well as when accessing the arrays programmatically.

Looking at pyproject.toml for inspiration, it universally favours the plural form, so that’s a reasonable precedent for sticking with plurals here. There is one array that manages to duck the question entirely by being named with an adjective, and leaving the noun implied: dynamic (it’s short for “dynamic fields” or a phrase to that effect). The adjective trick is hard to generalise, though.

That means “array names are plural nouns or adjectives” is likely the cleanest and most consistent naming convention we can adopt for TOML packaging specs. After your recent edits, packages.files.lock and packages.vcs.lock are the only PEP 751 field names that aren’t following that convention (they’re arrays of strings, but currently have a singular name), so they should both be pluralised.

Based on the feedback from the poetry and uv devs, as well as the concrete example of ensuring that a project’s test dependencies are a strict superset of the project’s deployment dependencies, I think we do have enough real world experience to design a suitable solution here (in particular, if the poetry and uv devs agree it covers the way their lock files already work, we won’t have messed it up).

The use case does still tie in to the way constraint files work, but it’s backwards from the way I suggested it might work in an earlier message: rather than locking the subset, and then using that subset as a constraints file when locking the optional dependencies, you would instead generate the all-inclusive lock, and then define installation filters that select subsets of that full dependency set, similar to passing an all-inclusive lock as a constraints file when locking a dependency subset. Extras declarations and dependency groups would then just be common ways of declaring locked installation filters, rather than the locking format assuming those are the only way to define installation filters.

Similar to other aspects of the locking process, PEP 751 won’t need to concern itself with how the dependency subsets are passed to the locking tool, nor even with the way installation tools are told to only install a subset of the declared dependencies instead of all of them. Instead, it only needs to cover:

  • which dependency subsets have been defined (and hence are available to be requested)
  • which packages should be installed when a given subset is requested

One possible design sketch for enabling that:

  • add a new top-level [[optional-filters]] array to the lock file format with the following fields in each entry:

    • name: the name of the dependency subset
    • dependencies: the subset of the top-level dependencies list that this filter will install (as with other such fields in the lock file, this list is informational, installers don’t actually refer to it)
  • add a new optional packages.filters array field that specifies which named installation filters include that package. If this field is omitted, the package is always installed (subject to the other environment related installation checks).

By default, installers should only install packages that omit the packages.filter entry entirely. The interface for requesting installation of the optional dependencies would be installation tool dependent (while there’s a case to be made for standardising how the names of project extras should be translated to installation filter names, that’s a detail that I believe genuinely needs to be postponed until we have concrete experience with an initial iteration of the lockfile spec).

Edit: while the field names are different, this is essentially the same approach that @DanCardin proposed here. I didn’t read that message until after writing this one, but the fact we came up with the same approach independently suggests it has potential.

The suggested format here is intentionally different from the format of optional-dependencies in pyproject.toml, as while the two concepts are closely related, they’re not exactly the same, so reusing the exact optional-dependencies structure would be misleading. In particular, while installation filters might be derived from extra names, they’re not required to be - they might come from optional dependency groups, or some other source. The proposed format also sticks with the convention established by [[file-locks]] and [[packages]] of using table arrays with name fields, rather than tables with dynamically defined keys (the latter can be nice for human-editable files, but the static structure is better for machine-generated formats).

For the case @a-reich mentions where different extras or dependency groups have conflicting requirements, that becomes a locking tool UX issue where (for example) pylock.toml contains the fully resolved lock for everything except the docs build environment, and pylock.docs.toml locks just the docs build environment. Installation filters only need to cover the case where the dependencies don’t conflict, but it is desired to ensure that they are kept consistent (with test dependencies being a genuinely compelling example of that concern).

2 Likes

There is: Poetry

It is exactly as you describe it, you will get B 1.0 even if you install without the extra. That is because the resolver prefers to choose one version of B that satisfies both cases (installing with/without extra) and does not split the graph on its own. However, there are workarounds to get B 2.0 like defining that you want B>=2.0;extra!=X.

To prevent misunderstandings: That only happens if A is the root project and you lock A. If A is a dependency (and no other dependency requires A[X]), then it is fixed if you require A or A[X] and the resolver ignores extra requirements and you will get B 2.0.

Have you considered the case where different groups/filters can require different markers as described in

?

2 Likes

I guess this is a UX question.

A tool (like Poetry - thanks @radoering for the example of a real-world case!) might provide a locking capability as “lock this project” which only works on a project with a pyproject.toml. But a different tool might provide “lock this list of requirements” which works from a requirements.txt file or a simple list of requirements on the command line. The former might do a “solve for all extras at once” operation and lock that, whereas the latter might do “solve for what’s specified” and lock that.

As long as the lockfile format can handle both cases, I think we’re OK.

Personally I’d hate a tool where locking a project could potentially give a different result than installing it, but that’s not important as long as the standard allows tools that I do like to exist :slightly_smiling_face:

PS: @radoering out of curiosity, what does Poetry do if a project has two independent extras, for example test and docs, that have conflicting requirements (A depends on B, A[test] requires B>1.0, A[docs] requires B<1.0)? Does it refuse to lock in that case?

3 Likes

From a this-file-format perspect, seems like that extra wouldnt be allowed in that file-locks table. But generally, I’d expect resolution to fail.

yes, that would fail to lock. Although in practice i dont find failures like this to be frequent, while i have found that them not being locked together will lead to different versions.

Again, you’re already getting this with the different environmental markers. I dont see how this is any different. The point is that the resultant lockfile is more consistent and will give you the same versions across environment marker constaints and extras/groups

1 Like

Yes it fails. That is a limitation for sure but there is some ongoing work[1] to make this possible if you define mutually exclusive requirements, for example:

B>1.0 ; extra == "test"
B<1.0 ; extra == "docs"

will fail because there is no solution if both extras are active but

B>1.0 ; extra == "test" and extra != "docs"
B<1.0 ; extra == "docs" and extra != "test"

will succeed because both requirements are mutually exlusive.


  1. see Pass Install Extras to Markers by reesehyde · Pull Request #9553 · python-poetry/poetry · GitHub ↩︎

1 Like