PEP 751: lock files (again)

My original intention was not to oppose universal resolution so vehemently. I am just trying to illustrate that universal resolution is merely a difference between broad and narrow restrictions and we don’t need to single it out for discussion. So the two formats can be unified based on this mental model. Although PDM deprecates the cross_platform strategy, it still tries to create a broad resolution by default same as before.

Exactly

4 Likes

Oh yes. :sweat_smile: I don’t know if we ever got to a point in the last discussion on the topic beyond that people should be consistent. The PEP currently says:

  • When creating a lock file for [package-lock], the locker SHOULD read the metadata of all files that end up being listed in [[packages.files]] to make sure all potential metadata cases are covered
  • If a locker chooses not to check every file for its metadata, the tool MUST either provide the user with the option to have all files checked (whether that is opt-in or out is left up to the tool), or the user is somehow notified that such a standards-violating shortcut is being taken (whether this is by documentation or at runtime is left to the tool)

I phrased it that way because the last time this came up I don’t think we came up w/ a way to standardize the requirement short of just telling projects that if it differs they may be causing users issues.

So you essentially always assume there is some way to install a package version?

What “assumptions” are you thinking?

And @charliermarsh , what were the reasons for going w/ a graph?

This would be motivation to still have an explicit file lock format.

:+1:

2 Likes

Actually, I do not know a reason why it should not be possible to create a sequential lock file from a graph traversal lock file in theory. However, I think there are some challenges:

  • I can imagine it becomes more complicated/expensive the more fields are relevant to decide which edges of the graph have to be traversed. Currently, I think there is only markers - and groups if you want to support depenency groups.
  • Unsufficient marker algebra that misses to detect empty markers might also be an issue.
  • Concerning marker algebra, performance might be another issue.
  • Loops in the graph are a challenge because they may influence the resulting marker. [1]

And what if you want to know in which environments a specific package will be installed and not only if it will be installed in a specific environment?


  1. I think we found a solution for that in the mentioned Poetry PR, but it is not battle tested so I might be wrong. ↩︎

1 Like

A few things:

  1. We thought it would make it easier to understand why a lockfile changed, i.e., trace the dependencies. (May or may not be true.)
  2. The markers can be much smaller, since you don’t need to mark each package with the “fully-joined” markers.

We do have a routine for converting the graph to a “flat” format though, so I think it’s pretty easy to do. (We use it for pip compile --universal, which has to write to requirements.txt format.)

3 Likes

Quick apologies for my delayed response; fell ill last week w/ a cold.

Does any of this make you change your mind about …

?

I’m just saying that if a graph approach was taken then for security purposes the file lock could be more explicit in what would be installed. I’m not worrying about design or anything, just there’s a potential option there.

Have you gotten any feedback suggesting this assumption has held up?

That I totally believe, but that impacts aesthetics, performance, and readability and not abilities to lock for something.

OK, so it seems the graph choice is a UX thing, not a technical requirement in order to enable some capability.


I think what I’m going to do next is generate lock files with Poetry, PDM, and uv using the same requirements and then we can look at the differences and similarities to figure out what makes the most sense.

9 Likes

No, because I do not have experienced any severe issues with Lock markers and groups by radoering · Pull Request #9427 · python-poetry/poetry · GitHub yet.

It might make sense to use Lock markers and groups by radoering · Pull Request #9427 · python-poetry/poetry · GitHub for that because then you will get a sequential lock file that does not require re-resolving at install time and I am quite sure that it will be included in the next release if nobody discovers critical issues with it.

2 Likes

Apologies for being late to the party, but I only just became aware of this discussion. If is not the appropriate place for this, I apologize.

As a platform engineer, one thing I struggle with is deploying Python apps/libraries that may need to run on different versions of Python, and which have differing package version requirements that I would like to pin the hashes for. I have read the PEP but it is not clear to me if this would be in scope.

For example, the package botocore contains the following requirements:-

    'urllib3>=1.25.4,<1.27 ; python_version < "3.10"',
    'urllib3>=1.25.4,!=2.2.0,<3 ; python_version >= "3.10"',

For now, I generate two sets of requirements.txt files using pip-tools and manually splice them together. If this new format could make this easier, I would be very appreciative!

The existing higher level lockers (uv, poetry, and PDM) handle disjoint dependency branches in their respective lock file formats and the standardised format will preserve that capability.

3 Likes

I tried to get this to work but I kept getting errors:

The Poetry configuration is invalid:
  - Additional properties are not allowed ('tool' was unexpected)
  - Additional properties are not allowed ('build-system' was unexpected)

But everything worked fine under Poetry 1.8.3, so I’m not sure what I did wrong. If you can get it working with the example pyproject.toml I have in my gist mentioned below I will happily include it.


I created a GitHub gist containing lock files from PDM, Poetry, and uv. I used the following as the requirements:

dependencies = ["httpx;platform_python_implementation=='CPython'", "requests"]

The reason I did this was to see how the lock files looked when a top-level dependency had a marker on it (and I chose a marker that I didn’t expect a package to use for its own dependencies), that dependency had unique dependencies of its own (i.e. its dependencies should only be installed if the marker for the top-level dependency passed), and the top-level dependencies shared some dependencies.

:warning:NOTE: some of the inline TOML examples may require scrolling to see all the content!

Simple example

Let’s look at a small example: idna which both httpx and requests require and has no dependencies of its own.

PDM

[[package]]
name = "idna"
version = "3.8"
requires_python = ">=3.6"
summary = "Internationalized Domain Names in Applications (IDNA)"
groups = ["default"]
files = [
    {file = "idna-3.8-py3-none-any.whl", hash = "sha256:050b4e5baadcd44d760cedbd2b8e639f2ff89bbc7a5730fcc662954303377aac"},
    {file = "idna-3.8.tar.gz", hash = "sha256:d838c2c0ed6fced7693d5e8ab8e734d5f8fda53a039c0164afb0b82e771e3603"},
]

Poetry

[[package]]
name = "idna"
version = "3.8"
description = "Internationalized Domain Names in Applications (IDNA)"
optional = false
python-versions = ">=3.6"
files = [
    {file = "idna-3.8-py3-none-any.whl", hash = "sha256:050b4e5baadcd44d760cedbd2b8e639f2ff89bbc7a5730fcc662954303377aac"},
    {file = "idna-3.8.tar.gz", hash = "sha256:d838c2c0ed6fced7693d5e8ab8e734d5f8fda53a039c0164afb0b82e771e3603"},
]

uv

[[package]]
name = "idna"
version = "3.8"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/e8/ac/e349c5e6d4543326c6883ee9491e3921e0d07b55fdf3cce184b40d63e72a/idna-3.8.tar.gz", hash = "sha256:d838c2c0ed6fced7693d5e8ab8e734d5f8fda53a039c0164afb0b82e771e3603", size = 189467 }
wheels = [
    { url = "https://files.pythonhosted.org/packages/22/7e/d71db821f177828df9dea8c42ac46473366f191be53080e552e628aad991/idna-3.8-py3-none-any.whl", hash = "sha256:050b4e5baadcd44d760cedbd2b8e639f2ff89bbc7a5730fcc662954303377aac", size = 66894 },
]

If you ignore things that are the same but just using different key names, there a few key differences:

  • Different approaches to recording what group a dependency belongs in
    • PDM has groups
    • Poetry has optional
    • uv doesn’t have it embedded in the package data because they use a graph traversal to determine what to install
  • uv leaves out the Python requirement while PDM and Poetry embed it (uv covers it with a global requires-python)
  • uv separates out the sdist and wheels explicitly while PDM and Poetry leave it up to file name processing
  • uv records the file size
  • uv records the source of the package data
  • PDM and Poetry record the summary from the package
  • PDM and Poetry record the filename while uv records the URL (I don’t know what uv does if the URL doesn’t end in the expected filename which is guaranteed by the HTML Simple index API but not the JSON API)

A dependency has a marker requirement

anyio not only is only installed if httpx is installed, but its dependency on typing-extensions is only if Python <= 3.11.

PDM

[[package]]
name = "anyio"
version = "4.4.0"
requires_python = ">=3.8"
summary = "High level compatibility layer for multiple asynchronous event loop implementations"
groups = ["default"]
marker = "platform_python_implementation == \"CPython\""
dependencies = [
    "exceptiongroup>=1.0.2; python_version < \"3.11\"",
    "idna>=2.8",
    "sniffio>=1.1",
    "typing-extensions>=4.1; python_version < \"3.11\"",
]
files = [
    {file = "anyio-4.4.0-py3-none-any.whl", hash = "sha256:c1b2d8f46a8a812513012e1107cb0e68c17159a7a594208005a57dc776e1bdc7"},
    {file = "anyio-4.4.0.tar.gz", hash = "sha256:5aadc6a1bbb7cdb0bede386cac5e2940f5e2ff3aa20277e991cf028e0585ce94"},
]

Poetry

[[package]]
name = "anyio"
version = "4.4.0"
description = "High level compatibility layer for multiple asynchronous event loop implementations"
optional = false
python-versions = ">=3.8"
files = [
    {file = "anyio-4.4.0-py3-none-any.whl", hash = "sha256:c1b2d8f46a8a812513012e1107cb0e68c17159a7a594208005a57dc776e1bdc7"},
    {file = "anyio-4.4.0.tar.gz", hash = "sha256:5aadc6a1bbb7cdb0bede386cac5e2940f5e2ff3aa20277e991cf028e0585ce94"},
]

[package.dependencies]
exceptiongroup = {version = ">=1.0.2", markers = "python_version < \"3.11\""}
idna = ">=2.8"
sniffio = ">=1.1"
typing-extensions = {version = ">=4.1", markers = "python_version < \"3.11\""}

[package.extras]
doc = ["Sphinx (>=7)", "packaging", "sphinx-autodoc-typehints (>=1.2.0)", "sphinx-rtd-theme"]
test = ["anyio[trio]", "coverage[toml] (>=7)", "exceptiongroup (>=1.2.0)", "hypothesis (>=4.0)", "psutil (>=5.9)", "pytest (>=7.0)", "pytest-mock (>=3.6.1)", "trustme", "uvloop (>=0.17)"]
trio = ["trio (>=0.23)"]

uv

[[package]]
name = "anyio"
version = "4.4.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
    { name = "exceptiongroup", marker = "python_full_version < '3.11'" },
    { name = "idna" },
    { name = "sniffio" },
    { name = "typing-extensions", marker = "python_full_version < '3.11'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/e6/e3/c4c8d473d6780ef1853d630d581f70d655b4f8d7553c6997958c283039a2/anyio-4.4.0.tar.gz", hash = "sha256:5aadc6a1bbb7cdb0bede386cac5e2940f5e2ff3aa20277e991cf028e0585ce94", size = 163930 }
wheels = [
    { url = "https://files.pythonhosted.org/packages/7b/a2/10639a79341f6c019dedc95bd48a4928eed9f1d1197f4c04f546fc7ae0ff/anyio-4.4.0-py3-none-any.whl", hash = "sha256:c1b2d8f46a8a812513012e1107cb0e68c17159a7a594208005a57dc776e1bdc7", size = 86780 },
]
  • PDM propagates the marker requirement for httpx as you can scan its lock file from start to finish linearly, while Poetry and uv don’t work that way
  • uv and Poetry separate out version requirements from markers on a per-requirement basis while PDM keeps the requirements as full strings

Knowing the top-level dependencies

How do you know what was specified?

PDM

[metadata]
groups = ["default"]
strategy = ["inherit_metadata"]
lock_version = "4.5.0"
content_hash = "sha256:ad8d737d864c796dd999f825b17b9226e856eca6f3e7b8c2c654917eec423d19"

[[metadata.targets]]
requires_python = ">=3.8"

Poetry

Based on default and doing a full resolve.

uv

[[package]]
name = "lock-example"
version = "2024"
source = { virtual = "." }
dependencies = [
    { name = "httpx", marker = "platform_python_implementation == 'CPython'" },
    { name = "requests" },
]

[package.metadata]
requires-dist = [
    { name = "httpx", marker = "platform_python_implementation == 'CPython'" },
    { name = "requests" },
]
  • PDM has the concept of groups, but otherwise you scan the lock file from top to bottom and evaluating each file independently, so the top-level dependencies are not explicitly listed
  • uv makes the package you are installing from its own package and thus acts as its own top-level dependency list

Overall file metadata

What details about the file itself are recorded?

PDM

# This file is @generated by PDM.
# It is not intended for manual editing.

[metadata]
groups = ["default"]
strategy = ["inherit_metadata"]
lock_version = "4.5.0"
content_hash = "sha256:ad8d737d864c796dd999f825b17b9226e856eca6f3e7b8c2c654917eec423d19"

[[metadata.targets]]
requires_python = ">=3.8"

Poetry

# This file is automatically @generated by Poetry 1.8.3 and should not be changed by hand.

# ... content

[metadata]
lock-version = "2.0"
python-versions = ">=3.8"
content-hash = "d120ac5b1079e801d817067b736686bfa7d27e81331a17922554282c3c847dab"

uv

version = 1
requires-python = ">=3.8"
  • PDM and Poetry record the content hash of pyproject.toml

The questions I have after looking at this are:

  • @frostming any reason why you didn’t explicitly record the top-level dependencies?
  • @charliermarsh
    • any reason you didn’t record the filename separately for each file from the URL?
    • why did you go with using the project’s own name to record top-level details instead of a dedicated section at the top of the lock file?
    • What would you do if there’s ever a wheel 2 format? Keep wheels and choose based on filename, separate key, or not really thought about it yet?
  • @radoering any reason you don’t record the overall required Python version?
7 Likes

Thank you very much for this comparison, it’s clear at a glance.

I think you have already pointed out the reason: PDM can scan the lock file linearly to get the installation list. Top-level dependencies are not needed. Whenever rebuilding the lock, it always gets the dependencies from pyproject.toml. And the metadata.content-hash correspond to a snapshot of pyproject.toml(not pdm.lock)

1 Like
  1. We just encoded the assumption that they would be identical to keep the lockfile concise, so yeah we’d get it wrong if they differed. But if we did support that, I think we’d only write the filename if it differed from the filename in the URL.

  2. It means that the root package isn’t special at all which generalizes much better for workspaces. E.g., from a single lock, the user can install the project in the current directory, or they can provide uv sync --package member and we can install from there. (This was inspired by Cargo which also doesn’t special-case the current project.) We do now store some details at the top in some cases though (e.g., if you’re in a workspace, which contains multiple packages, we write the workspace members to the top for various reasons).

  3. Haven’t really thought about it but would probably use the filename if we could?

2 Likes

By the way, for what it’s worth, I’m now considering writing the fully-resolved markers to the uv lockfile rather than relying on propagating markers across the graph. I think it makes it easier to audit and debug the output file.

2 Likes

Finally, not that anyone asked, but we intentionally did not include a content hash in the lockfile, because it makes merge conflict resolution more painful. Specifically, you can’t resolve conflicts without re-resolving, because neither hash is valid anymore.

5 Likes

Bad timing, sorry. You probably got a version of poetry-core that has not yet been compatible with the PR. The PR should work again.

Examples with the PR:

[[package]]
name = "idna"
version = "3.8"
description = "Internationalized Domain Names in Applications (IDNA)"
optional = false
python-versions = ">=3.6"
groups = ["main"]
files = [
    {file = "idna-3.8-py3-none-any.whl", hash = "sha256:050b4e5baadcd44d760cedbd2b8e639f2ff89bbc7a5730fcc662954303377aac"},
    {file = "idna-3.8.tar.gz", hash = "sha256:d838c2c0ed6fced7693d5e8ab8e734d5f8fda53a039c0164afb0b82e771e3603"},
]

Difference: groups are locked.

[[package]]
name = "anyio"
version = "4.4.0"
description = "High level compatibility layer for multiple asynchronous event loop implementations"
optional = false
python-versions = ">=3.8"
groups = ["main"]
markers = "platform_python_implementation == \"CPython\""
files = [
    {file = "anyio-4.4.0-py3-none-any.whl", hash = "sha256:c1b2d8f46a8a812513012e1107cb0e68c17159a7a594208005a57dc776e1bdc7"},
    {file = "anyio-4.4.0.tar.gz", hash = "sha256:5aadc6a1bbb7cdb0bede386cac5e2940f5e2ff3aa20277e991cf028e0585ce94"},
]

Difference: groups and markers are locked.

We do. You probably missed it because it is at the end of the file:

[metadata]
lock-version = "2.1"
python-versions = ">=3.8"
content-hash = "92ee2f33e52fd1c6238cf132a8d2de51cbc240afcb20da1278ea9f96a562ae9b"

Because Charlie mentioned it: We have the hash in the lock file because we take the stance that it does not make sense to merge lock files without re-resolving.

@brettcannon Further, your examples do not contain the interesting case where different groups have different markers. It is an edge case, but it might make sense to consider it. See PEP 751: lock files (again) - #23 by radoering for an example.

3 Likes

You’re welcome!

:+1: (and I edited my original post to clarify what the content hash is of).

:+1:

To make sure I’m understanding correctly, you’re thinking of taking the linear install approach so the markers that would apply to a package are recorded there instead of as part of the edges of the graph like PDM does and Poetry is probably moving to?

That was early feedback that I got as well.

No worries! So if I try again do I need to install poetry-core from main?

Sorry about that! I will update my original post accordingly.

I think you’re suggesting an example where each extra has the same dependency but with different markers; that can be the next comparison.

2 Likes

Yeah, with some caveats. I’m considering writing out the full markers, so you could do a linear install if you wanted to install everything in the lockfile. But per your question about top-level packages, we still need the dependencies because we support installing subsets of the lockfile.

I was gonna put up a PR to write the full markers and see how they look on some example cases.

2 Likes

No.[1] I pinned the poetry-core version in the PR/branch (not just in the lock file but also in the pyproject.toml) so you can just install from the branch via pipx or similar without having to care about the correct version.

That can also be interesting, but I mean groups, not extras. Extras can be expressed via markers, groups cannot. Groups may be out of scope because PEP 735 is still a draft, but on the other side PDM, Poetry and uv all support groups (if I am not wrong) so they will have to handle it somehow - if necessary in the tool section.


  1. The latest main is not compatible yet because it requires another PR first that is still under review. You need an older version of main. ↩︎

This section recently came up in a discussion on the pip issue tracker, and it left me with a question about locking build requirements (apologies if this is already answered, it’s a long thread, I did try searching it).

Let’s say I have two source distribution packages “foo” and “bar”, and they both have a build requirements on “setuptools>50”, is it possible for me to create the lock file such that “foo” is locked to “setuptools 59.8.0” and bar is locked to “setuptools 73.0.1”?

2 Likes

Another question with regard to build requirement locking, triggered by the same issue on the pip tracker. How would recursive build requirements be handled? For example, if a project requires a build backend of foo-build, locked in [[packages.build-requires]], and foo-build is only available as source, and uses another build backend for its build (let’s say flit, just to avoid a long chain of what-ifs), how would the locked version of flit which needs to be used to build the locked version of foo-build be specified?

And in theory, that chain of build dependencies can be arbitrarily deep…

2 Likes

More on the whole can of worms that is bootstrapping: Bootstrapping problem (how to bootstrap all-source environments) · Issue #342 · pypa/packaging-problems · GitHub

That did make me realise that it isn’t clear to me how any of the lockers cope with dependency cycles (even for runtime deps). Do they just truncate the tree from the point where the backlink is detected, adding in any relevant environment markers?