PEP 751: lock files (again)

This is a UX decision, not an interoperability decision, so we’d be unlikely to require anything.

At most, we’d suggest that sdists aren’t the intent, or might reduce any security benefits to the point where users ought to be made aware of it before doing it. Whether a tool chooses to make it opt-in or not at that point is up to the tool, not the specification.

Thanks Brett.

To be extremely clear: [[resolved-environments]] is entirely computable from the rest of the lockfile, is that right? It represents environments that were materialized ahead-of-time?

Additionally, to confirm my understasnding: package.dependencies is only optional if unresolved-environments-allowed = false, right? Otherwise, you need it in order to travers the graph?

I find the strategy a little bit resolve-y… As a counterpoint, in uv.lock, we don’t write version specifiers in the dependency list. We just write the dependencies with their markers:

[[package]]
name = "build"
version = "1.2.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
    { name = "colorama", marker = "os_name == 'nt'" },
    { name = "packaging" },
    { name = "pyproject-hooks" },
]

If the package is ambiguous (i.e., there are multiple versions in the lockfile), then we include the version and source – here’s a fictional example for anyio > 3 ; sys_platform == 'darwin' and anyio < 3 ; sys_platform == 'win32':

[[package]]
name = "foo"
version = "0.0.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
    { name = "anyio", version = "2.2.0", source = { registry = "https://pypi.org/simple" }, marker = "sys_platform == 'win32'" },
    { name = "anyio", version = "4.4.0", source = { registry = "https://pypi.org/simple" }, marker = "sys_platform == 'darwin'" },
]

(This is a trick we borrowed from Cargo, which only shows versions in the dependency list if there are multiple versions of a given package in the lockfile, and leads to a very succinct representation. Others may disagree with it.)

The use of (name, version, source) means we don’t need to worry about version specifiers, parsing them, evaluating them, etc. at install time. It also means that lookups are very fast, because we index the lockfile by (name, version), and so can lookup dependencies without having to scan the table.

I think requiring == in any specifiers could achieve a similar effect but personally I find it nice to have a schema that can’t even represent those ambiguities. (There may be a reason that I’ve overlooked as to why we’re writing version specifiers though.)

Will continue thinking on whether this could work for us.

1 Like

Fair enough, but a “SHOULD” would be reasonable.

Yes. A benefit of this is if you make the requirements for the resolved environment tight enough you can avoid any shift of what might get installed in e.g., production if something unexpectedly updated in the environment.

Or your package has no dependencies, e.g., packaging. I’m fine requiring it if that’s less confusing for people as it can be set to an empty list.

That could also work as it’s the same as == in that case if people are good w/ that restriction. I personally would be fine w/ this approach if people are okay w/ implicitly having == for the version specifier.

How does the source play into this? As in you pre-select wheel vs. sdist? Otherwise why not store the source w/ the package version? Or does this tie into your discussion w/ @pf_moore earlier about whether support for different code w/ the same version should be supported?

2 Likes

Yes sorry – it ties into that earlier discussion. “Source” is what we use to enable supporting (e.g.) use of packages with the same name and version from different URLs on different platforms.

1 Like

If people don’t like this, I would probably just model pyproject.toml pretty closely:

[requirements]
[requirements.project]
dependencies = ["..."]
optional-dependencies = [
  {"..." = ["..."]},
]

[requirements.dependency-groups]
# ... Match PEP 735, or if necessary only the naming and leave `include-group` out for a version update later.

I didn’t call the table pyproject because it doesn’t have to come from pyproject.toml, and pyproject.project looks weird to me. And I don’t think having dependencies/optional-dependencies at the same level as dependency-groups makes sense as it might confuse folks not used to pyproject.toml.

Another optimization that uv does that @charliermarsh didn’t mention is they have separate keys for sdists and wheel files. A more complete uv example showing this is:

[[package]]
name = "rich"
version = "13.7.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
    { name = "markdown-it-py" },
    { name = "pygments" },
    { name = "typing-extensions", marker = "python_version < '3.9'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/b3/01/c954e134dc440ab5f96952fe52b4fdc64225530320a910473c1fe270d9aa/rich-13.7.1.tar.gz", hash = "sha256:9be308cb1fe2f1f57d67ce99e95af38a1e2bc71ad9813b0e247cf7ffbcc3a432", size = 221248 }
wheels = [
    { url = "https://files.pythonhosted.org/packages/87/67/a37f6214d0e9fe57f6ae54b2956d550ca8365857f42a1ce0392bb21d9410/rich-13.7.1-py3-none-any.whl", hash = "sha256:4edbae314f59eb482f54e9e30bf00d33350aaa94f4bfcd4e9e3110e64d0d7222", size = 240681 },
]

What do people think about that?

You will also notice they shrunk things by having the index server’s URL and then inferring the project details URL since it’s f"{index_url}/{project_name}". That could be another uv-inspired simplification/clarification if people liked; change package-index-project-detail-url to package-index-url and cut out the package in the URL (I would argue to keep package-index-project-detail-url at the file level as a fallback).

General idea looks good to me, and I quite like [name] as a syntax for extras (although extra-name might be clearer, and the dependency groups PEP could reserve that prefix).

The restriction of resolved environments to a single dependency group doesn’t seem right (I would expect an array listing all the included groups). The default group should also get a real name so resolved environments can list whether they include it or not.

1 Like

It’s not clear to me if sdists are expected to be supported or not (there seems to be some disagreement there?), but if they are to be supported, it would make sense to be able to lock the version of the build dependencies, and to be able to specify config_settings and possibly environment variables (given setting config_settings remains somewhat painful at the moment).

They are.

That’s what [packages.build-requires] is for.

3 Likes

Why do you expect that? Because you expect multiple extra combinations to end up w/ the same files to be installed?

When installing, I can request combinations of dependency groups (e.g. default + dev is very common). I expected the dependency group info in a resolved environment entry to be a listing of the included dependency groups, rather than the locking tool generating a synthetic dependency group name for the combination.

I also had a thought regarding special dependency groups (at least “default”, and perhaps “self” for project lock files): perhaps they could use surrounding colons (like “:default:”) the way pip does for “:all:” in its CLI options? The “:all:” special group (for attempting to install all groups) would be implied rather than written out in the lock file.

2 Likes

:red_question_mark: This may be survivable for the ci use case, although npm ci and yarn install --frozen-lockfile support production vs dev distinction from a single lock file. Likewise there’s a support for this in poetry. It feels like lowest common denominator spec.

:warning: for me, at least, this is deal-breaker for the superset use case. In fact, I hazard a guess that extras are actually more universal than multiple architectures.

I’ve just skimmed through the comments of the last two weeks, so please let me know if I overlooked something.

I think Poetry is in a pretty similar situation to uv. I think we will definitively try to support the format. It might be sufficient to be our primary lock file format but we cannot be sure until we try it out. I think it will be possible with the packages.tool escape hatch but I might overlook something as I overlooked the following:

That will also help Poetry because it supports the same dev use case as uv (two entries with same name, same version but different source/artifact).

All in all, that feels too resolve-y to me. Actually, it feels much like what Poetry is doing at the moment except for the “no backtracking” part. I think especially

will not work well without backtracking. What if a direct dependency allows both versions, but a transitive dependency requires the older version. Am I missing something or will the algorithm choose the newer version in the first round and fail later?

As already mentioned in PEP 751: lock files (again) - #23 by radoering, Poetry will probably shift from “very resolve-y” to “barely resolve-y” in the future. As a result, my vision is more like (example, only relevant fields):

[[packages]]
name = "..."
version = "..."
dependencies = ["..."]  # Optional (not relevant for installer, only for re-locking)
extras = [  # Optional (not relevant for installer, only for re-locking)
  "...": ["..."],
]
dependency-groups = [ "main", "dev", "test" ]  # package is required in these groups (no matter if directly or transitive)
marker = {  # resulting markers, not just the direct markers from pyproject.toml
  "main": "python_version > '3.9' and (extra == 'a' or extra == 'b'"),
  "dev": "*",  # always required for dev
  "test": "python_version > '3.9'",
}
# ...

The locker locks for each packages entry:

  • relevant resulting dependency groups
  • relevant resulting environment marker per dependency group
  • (top-level) extras are encoded in the marker
  • extras of dependencies are resolved/transparent, i.e. implicitly encoded in resolved (non-extra) marker and dependency-groups

The installer will not have to do a graph traversal, but only iterate over a flat list of packages entries and check if

  • the intersection of relevant (locked) dependency groups and requested (at install time) dependency groups is not empty
  • the union of locked markers for the relevant requested dependency groups is true for the target environment with the requested (at install time) extras
  • choose best wheel / sdist

Of course, the installer part only works this way if groups/markers for multiple entries of a specific package are disjunct. (Poetry will ensure that in the mentioned PR.) If you wanted to allow overlapping markers/groups, then the installer would have to evaluate all entries of a specific package and choose the best afterwards.

2 Likes

Just an FYI for readers, anyone looking at Brett’s outline should know that the code block scrolls. I was confused looking at the proposal because the structure happens to perfectly end in such a way to visually indicate that there isn’t any more, so I didn’t realize it was a multi-line code block.

I’m happy with your proposal Brett from a security perspective, the most important part for me is that there’s a way to instruct an installer to install specific files (and using similarly specific builders) and that there’s a mechanism to make it clear that no resolution must occur (via unresolved-environments-allowed = false).

I had some smaller thoughts on the format:

  • Let’s capture the size of the file in addition to its hash in the lock.
  • Differences between [[resolved-environments]].packages[].package, [[packages]].name, want to make it more obvious how one maps to the other? Thinking about visually inspecting deeply-nested structures with little context, I think using .package (and even using .file under [[packages].files[].file instead of .name) would make it easier to know “where you are” on a code review.
  • Having more distinction between wheels and non-wheel sources under [[resolved-environments]].packages would make me happy, just so it’s easier to audit. Perhaps wheels and sources instead of one block of packages? Open to have this one declined or other suggestions.
  • Is [[resolved-environments]].packages.wheel supposed to be a URL? Filename? I am not certain on what the values look like there.

+1, fully agree with this.

I like this idea, but I think the distinction we’d want to make is between fully static artifacts (wheels and source distributions that are PEP 621 compliant with no dynamic metadata) vs non-static artifacts (source distributions that are dynamic or use setup.py), although I’m not sure if that’s something that would be easy for a locker to do today.

1 Like

With my security + pip-audit hat on: I believe this fully addresses the needs/use cases I have in mind!

In general, I think pip-audit’s needs (and therefore my needs) as satisfied so long as each of the following is possible:

  1. The lockfile format expresses a “total” (potentially best effort, not necessarily representing all possible environment resolutions) closure of the package’s dependency graph, with each graph member locked to its specific version.
  2. The format can express a “specific” closure of the graph, i.e. for a particular environment or marker/extra resolution, with each graph member locked to its specific version.

My understanding (which might be wrong, please correct me!) is that (1) is enabled by [[packages]], while (2) is enabled by [[resolved-environments]].

This would be an immense boon for pip-audit!

My biggest concern about this mechanism isn’t so much in terms of security, but rather discoverability. My understanding is that the locker will choose (possibly based on an opt-in by the user, but maybe just by picking a default that the designers of the locking tool favour) whether the lockfile needs some form of resolution at install time.

Following on from that, installers will be in an awkward position. Obviously they can support all forms of lockfile, but to do that they will need to include a resolver (of some form - my impression is that it’s not completely clear yet whether backtracking could be needed, for example). If an installer chooses not to do this, they will have to reject certain lockfiles as not supported - and that’s going to result in confused users, who just wanted to lock their application and send the resulting lockfile to their hosting provider.

I’m reluctant to try to impose UI choices on either lockers or installers, but I don’t think a standard with optional features is going to serve users well. If nothing else, it’s not exactly “interoperable” if a tool claiming to support the spec can fail to handle some files.

1 Like

I think interoperability can mean a few things. The format is interoperable, but the features provided by the tools that can process the format don’t strictly need to be identical?

I think there are some standardized packaging capabilities that not all tools support already?

Pip doesn’t support installation from PEP621 pyproject.toml, not all indexes support PEP658 metadata, there’s probably some others.

I know these aren’t great examples, but maybe it can be left for tools to choose to not support features. Users can choose tools that support what they need in their scenarios?

My comment was a general point, not intended to be read too literally, but in general (again, I may have forgotten some exceptions) tools either support a standard or don’t. The difference here is that we are talking about installers supporting part of this spec, but not all of it.

To reiterate, though, my main point was that it’s confusing for users if not all lockfiles are equal, in terms of which tools support them.

I’m not sure we’re going to be able to avoid that in this case since whether or not to fall back to the “figure it out” option is necesarily going to be up to the installer if we want to allow for use cases with a low tolerance for any ambiguity in what is going to be installed.

That means folks deploying to an environment that enforces full lock time resolution are going to get deployment failures if they don’t play by the target environment’s rules (and that UX decision will be made by the platform provider, trading off the security benefits of requiring full lock time resolution against the potential inconvenience of users getting installation failures when they omit the required details from their lock file).

I’m not sure the proposed flag to “disallow” falling back to install time resolution makes sense though. It would at best be advisory, and if a target system genuinely offers a choice on whether to fall back to installation time resolution or not, that feels like it should be a setting in the deployment platform’s own config (potentially in the lockfile’s tool table), rather than a standardised setting in the Python lockfile.

2 Likes