PEP 751: now with graphs!

pf_moore · November 8, 2024, 8:35pm

I agree, this is an important question that has been pushed to the sidelines for a while because we’ve been working on making sure the format supports all of the lockers that exist.

But the installer side is just as important. And I’ve been thinking mostly about “how would pip install this”, which is relatively straightforward, because pip is low level and can afford to demand that the user is explicit about all the details. But that’s not going to fit well in higher level tools, as you point out.

Also, there’s an interesting wrinkle with uv, as the uv pip interface would presumably follow pip’s low-level approach, which could look odd alongside the native uv sync interface. I’m going to assume that’s not a problem, though, as uv pip is intended to not look like the native uv commands.

I’d be very disappointed if “interoperability” for lockfiles meant nothing more than “high level tools continue as now, supporting only their own lockfiles, but the format is standard so pip can install from it as well”.

In one way, I agree. It’s impractical to define an interoperable format without agreeing on the core concepts. But on the other hand, this runs extremely close to dictating user interface, which is not what we want standards to do, as it prevents tools from making their own choices.

charliermarsh · November 8, 2024, 11:23pm

I agree. I believe we’re aligned based on your response, but just to clarify (after reading the “only their own lockfiles” language): in the scenario I was suggesting, uv would still use the standard lockfile in lieu of uv.lock. It’s just that commands like uv sync --package root or uv sync --group test might only work with uv-generated lockfiles, and to use a Poetry-generated lockfile, users would need to drop into a lower-level API where they specify the exact section of the lockfile to install.

Like you, I don’t think it’d be a great outcome, but it’s one clear option where we’d still get interoperability in the sense that “any installer can install from a lockfile generated by any tool”.

Again, I agree.

mitsuhiko · November 12, 2024, 11:32am

Correct me if I’m wrong, but wasn’t the discussion mostly just about having a singular hash per dist/wheel, and not a singular hash algorithm throughout the file? I’m specifically calling this out because when looking at at least for how today people are operating private file indexes you often end up with a singular hash (in the URL). If that hash does not match what the lock file format requires you’re out of luck and have to fall back to fetching. For instance we use dumb-pypi which really only generates out a HTML that looks like this:

<li>
    <a
        href="https://pypi.devinfra.sentry.io/wheels/Babel-2.10.3-py3-none-any.whl#sha256=ff56f4892c1c4bf0d814575ea23471c230d544203c7748e8c68f0089478d48eb"
            data-requires-python="&gt;=3.6"
    >Babel-2.10.3-py3-none-any.whl</a>
    (2.10.3, 2022-08-31 15:05:19, git@cabb6cf)
</li>

Why not just let each hash be prefixed with the algorithm? eg: sha256:ff56f4892c1c4bf0d814575ea23471c230d544203c7748e8c68f0089478d48eb.

(Which I believe all of uv, pdm and peotry already do today)

More specifically this is what an uv lockfile looks like if you are adding an md5 hashed source and a sha256 locked source today:

[[package]]
name = "anyio"
version = "4.4.0"
source = { registry = "http://localhost:12345/simple/" }
dependencies = [
    { name = "idna" },
    { name = "sniffio" },
]
wheels = [
    { url = "http://localhost:12345/wheels/anyio-4.4.0-py3-none-any.whl", hash = "md5:9188b782e810cc427518e96bfa6f1a49" },
]

[[package]]
name = "blinker"
version = "1.9.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/21/28/9b3f50ce0e048515135495f198351908d99540d69bfdc8c1d15b73dc55ce/blinker-1.9.0.tar.gz", hash = "sha256:b4ce2265a7abece45e7cc896e98dbebe6cead56bcf805a3d23136d145f5445bf", size = 22460 }
wheels = [
    { url = "https://files.pythonhosted.org/packages/10/cb/f2ad4230dc2eb1a74edf38f1a38b9b52277f75bef262d8908e60d957e13c/blinker-1.9.0-py3-none-any.whl", hash = "sha256:ba0efaa9080b619ff2f3459d1d500c57bddea4a6b424b60a91141db6fd2f08bc", size = 8458 },
]

brettcannon · November 13, 2024, 2:25am

Which is why I have been thinking about it, just had some positive, busy real-life stuff going on this month (and will continue through the month).

I think the key concern for the issue you’re bringing up is it requires tying dependency groups to the project they originate from (if there is a project). Since I assume we would want to support dependency groups from a pyproject.toml file even if it lacks a [project] table, that means we need some concept of just dependency groups. With that, the question about how to support what uv is after is do we also define dependency groups inline with any project they may be defined with, or do we let dependency groups optionally reference the project?

E.g.

[[groups]]
name = "orphaned-group"
requirements = [
   ...
]

[[packages]]
name = "root"
[[packages.groups]]
test = [
   ...
]

or

[[groups]]
name = "orphaned-group"
requirements = [
   ...
]

[[groups]]
name = "test"
part-of = "root"
requirements = [
   ...
]

I would have placed money that there was a discussion about it. I even have a memory of convincing @ncoghlan that it was a good idea, but I can’t find the related posts, so I will just assume I’m wrong. Sorry to everyone for apparently misleading folks.

First, I think they all do it because Poetry does it, but I haven’t heard anyone until now really push back on the proposal.

Second, and this is minor, I would push for algorithm=hash as that’s defined for Binary distribution format - Python Packaging User Guide .

Third, the original motivation for the single hash algorithm was that it makes auditing the hash is strong enough for the whole lock file is easy to spot. But having to check every files hash algorithm choice and not accidentally forgetting to check or misreading it while reviewing a diff is somewhat of a risk.

Now if more people think my rationale isn’t important and would rather have the locker using the hash provided directly by the index, then that’s fine and I’m willing to change the PEP.

ncoghlan · November 13, 2024, 4:31am

I think I may have convinced myself before you had a chance to convince me. The gist was:

lockers routinely regenerate the entire lock file anyway, so migrating to a new hash algorithm is straightforward even with a common algorithm required across the whole file
lockers already have to rehash all the artifacts to check their hashes anyway, so extra work is only needed for artifacts where the index doesn’t provide the artifact hash for the algorithm used in the lock file

Given those points, the auditing benefit of using a consistent hash algorithm across the whole lock file was considered a good trade-off (at least, the topic didn’t come up again until Armin asked about it).

Edit: I found the original exchange about this: Lock files, again (but this time w/ sdists!) - #300 by a-reich (a general Discuss search didn’t pick it up, but searching that thread specifically got there). Brett’s original recollection was correct: I changed my mind after considering a question he asked.

hynek · November 14, 2024, 4:53am

Since it just came up in Lock files don't mesh with dynamic package version info · Issue #7533 · astral-sh/uv · GitHub – it looks like that neither this PEP nor the discussions have no provisions for dynamic package version metadata. I have to admit that I’m not deep enough in the topic to make a cogent argument WRT the PEP, but let me explain why I think it would be a mistake to pass it without:

Isn’t this just for packages? Lockfiles are for applications!

This is correct, but many projects with higher churn and dependencies that tend to break^[1] already have full locks of their dependencies to keep their CI stable – usually using pip-tools or Poetry. Updates + fixes happen in a controlled manner on one’s own time. uv’s speed and DX was my hope to embrace it for some of my projects too.

Why dynamic version metadata?

Projects like like setuptools-scm or hatch-vcs are highly popular because they allow to deduplicate the VCS metadata (usually: Git tags) and package metadata to only use one source of truth: the one that drives CI and e.g. GitHub releases, which is a big deal together with PyPI Trusted Publishing. In certain configurations, the version of the package changes with every commit. One of them adds a number of commits since the last tag that also allows you to continuously to upload your package to TestPyPI to ensure your whole packaging pipeline works as expected (as popularized by @hugovk): Client Challenge

But this is just one use-case; it does make sense to have a direct connection from the version number to a certain commit in the repo. But that means that the lock file needs to change after every commit.

What do I want?

I want a way to say that a certain (current!) package’s version is “dynamic” and shouldn’t be part of the lock file. There’s already been plenty of discussion in Lock files don't mesh with dynamic package version info · Issue #7533 · astral-sh/uv · GitHub if you want to dig deeper.

I’m sorry for suggesting another color, but it seems to me like a miss if we pass this without good support for PyPI packages and the workflows employed by high-profile projects. This could be a stepping stone for more stable CIs using only standards and many pieces are already in place (e.g., tox).

E.g., attrs has top-locks on both Mypy and Pyright as I’m writing. ↩︎

konstin · November 14, 2024, 5:28pm

Dynamic versions are convenient because we already need to track the version in git tags, so we decided to make git the source of truth and base package building on it.

This worked nicely, because all package building and package management was procedural: You were calling setup.py, you were calling pip install, etc., and it was you deciding if something in the venv needed updating or a re-install. With lockfile-based package management, the workflow changes from procedural to declarative: You declare the list of packages you want, and the package manager materializes a coherent lockfile and environment with those packages present. In the case of {poetry,uv,..} run, you even skip the environment part and go directly to running your code with the packages you need (the environment becomes an implementation detail).

While reading information from git is convenient for users, it has some non-trivial implications with a lockfile: With static metadata, pyproject.toml is self-contained, we can read the pyproject.toml file(s) and use that for cache invalidation: If it matches the lockfiles and the user didn’t request an update, the resolution is fresh. Editables are .pth-linked so they don’t need updating either (there’s its own heap of problem with native packages/code generation and cache invalidation, but that’s a separate discussion), so we do a venv freshness check just to be sure (optional, depending on your package manager flavor) and launch the user code.

With dynamic metadata, all caching now depends on the state of the git repo! To determine whether the lockfile and the environment are fresh or not, we either need to make all package managers git-aware (and have some solution for other VCS) or we need to always run arbitrary code, which is slow (admittedly this is problem mainly for uv, but uv sync often runs faster than starting a python interpreter), fallible and has a (solvable) bootstrapping problem. It also means the state moves from a plain file to something only accessible through a helper program.

In other ecosystems, at least cargo and npm, there were only static metadata and static versions from the beginning and tooling developed around that (rather than python, which started with the ability to only run arbitrary code - pyproject.toml isn’t that old yet). In rust, there are for example https://github.com/crate-ci/cargo-release and https://github.com/MarcoIeni/release-plz. The latter inverts the control by making the version change and creating the git tag for you ^[1]. It has built-in compatibility checks, so it knows which level of version bump is right for the next release. I fully acknowledge that in python, this would be costly change of a stable workflow and that the Python ecosystem isn’t as far with publishing tools.

(Disclaimer: I’ve never published a real js package myself, so please correct me if I’m wrong on those) In the javascript ecosystem, npm version has an on-by-default option to create a git tag (npm-version | npm Docs). This is also a change of controls, where the tagging isn’t done by the git cli git tag, but the npm cli npm version. This result in a workflow such as:

npm version patch # Update package.json, commit and tag
git push —-tags origin main
npm publish

npm also support reading the from-git (npm-version | npm Docs), reusing the latest git tag. Yarn 1 does only seem to support git tagging, but not reading from git (yarn version | Yarn).

I’m not saying that we can’t or shouldn’t make dynamic metadata work with lockfiles, but I want to point out the high cost this has for package managers and the jump in complexity, and also show the pros and cons of workflows centered around static metadata.

Incidentally, we’ve also switched to creating the tag in CI in uv. For linux distros, we can’t modify the tag, so we first make a draft release, and only when this passes, tag the release in git. If there is an error, we can make another commit and retry publish without breaking downstream distributors. ↩︎

mitsuhiko · November 14, 2024, 5:47pm

In an ideal world, dynamic metadata would not exist. Is it avoidable today in Python is a completely different question, but I think it would be great of the ecosystem if we could evaluate what would be needed to not have dynamic metadata.

pf_moore · November 14, 2024, 6:47pm

I don’t think there’s any realistic chance in the short to medium term that we can avoid dealing with dynamic versions, but I do think it’s worthwhile to remind people of the problems they cause, as you do here. Personally, I hadn’t realised that this was a uniquely Python-related issue - from the strong views people have about “the VCS being he single source of truth” I assumed that getting versions from VCS was a common, language independent, practice. So if nothing else, thanks for teaching me that this isn’t the case.

Having said that, it’s worth remembering that dynamic versions are only possible in source distributions - wheels are required to have a static version. So any issues with accurately locking dynamic versions can be considered as part of the general problem that installing from sdists can’t be considered to be fully reproducible anyway. That doesn’t mean we can ignore the issue, but if we have to make compromises, I think that’s acceptable. As I said in a previous message, I think we have to stop trying to solve every issue with every use case. Tools that want to try to handle this case can do so using the tool-specific keys. Such lockfiles won’t be interoperable, but that’s fine.

We need something that’s basically workable (and I don’t know what that is, unfortunately) but it’s fine if it’s minimal and/or incomplete.

Tinche · November 14, 2024, 7:39pm

I’m in the same boat as Hynek (I usually just copy what he does, infrastructure-wise so I feel confident to comment. We use wheels in our CI pipelines, and technically the wheels do have a static version. It’s just that this version changes every commit, so in reality it’s not really static. In other words, this isn’t just a sdist issue.

offby1 · November 14, 2024, 8:30pm

Dynamic versions are important to packaging and distribution, too; we don’t want PyPi (test or otherwise) to allow replacement of the artifacts for any version, those need to be immutable. So, unless we want to impose a manual versioning requirement on every commit that might be uploaded, dynamic versioning from SCM metadata is going to remain a necessary part of the package lifecycle.

EpicWink · November 14, 2024, 11:11pm

My current workflow (to solve this precise issue) is to not include the current project on the lock file at all, and simply install it separately. In terms of PEP 751 (this PEP), that has the benefits of both side-stepping the version issues, and avoiding including the projects extras that I don’t want in the lock file.

If I were to start including it, I would have to host the project’s built distributions somewhere (or use --find-links on a wheel directory), and somehow pre-compute the hash of the wheel for that version. Also, the requirement for all extras to be included means I can’t just assume everything in the lock file will be installed.

I should clarify, I do this to have the package and lock file be stamped and reliable from the same Git tag, without even the need for a version commit.

brettcannon · November 14, 2024, 11:55pm

Correct, no one has brought up dropping the version details in this discussion.

OK, that seem reasonable if that’s what you’re publishing and you want to distinguish between the state of an e.g., sdist built from a VCS.

OK, still not seeing a problem …

Are you wanting this for your code (e.g., what you’re testing in CI), or for any dependency? If it’s the former then I get it, I’ll figure something out (it might assume an editable install), and you can stop reading the rest of my response to you. But if it’s the latter …

I think this is where we differ because that’s kind of the point of a lock file (according to the PEP); to reproduce an environment based on what you recorded in the lock file (I’m assuming you read at least the motivation and rationale sections of the PEP). The expectation is you install the same thing for everyone who uses the lock file, whether you have a million releases or two. It’s the same reason the PEP specifies Git commit IDs/hashes and not branches: one leads to a consistent result and one may not.

So I’m going to turn this around and ask what are you expecting a lock file to record if packages lack a version (and seemingly a Git commit hash)? How are you to be sure you’re getting the same code – and also dependencies – if the version floats? How do I make sure you and I install the same version of the code for reproducible environments (e.g., testing the same thing in CI)? As of right now everything short of a directory of files has some hash-like thing to make sure the same thing gets installed, but you seem to be wanting to avoid version numbers not because they might be annoying to get from e.g., a Git repo, but instead don’t want to pin to a specific version number at all.

Maybe if you said how you’re specifying what to install today and how you expect to get the same files installed every time then that might help clarify things for me (and what lock files did it the way you expected).

That’s a separate topic/conversation.

I would say it’s not necessarily built into the packaging tooling in other ecosystems. That doesn’t mean people aren’t using scripts to write out the version number based on a Git tag.

… or a source tree (which includes VCS checkouts). But as of right now, to get an sdist listed in the lock file you pointed the locker at an index or a directory of files and the sdist would still have the version number in the file name in either case.

Correct.

I don’t think anyone is saying dynamic versioning should go away from a packaging perspective. The question is how are they expected to interact or come into play with lock files.

offby1 · November 15, 2024, 12:10am

I guess the point I’m trying, albeit imperfectly, to make is: lock files for libraries are a pretty reasonable thing. Testing library release processes is also a reasonable thing, and dynamic versions are a key part of the latter.

I think what follows from that is that any lock file specification needs to explicitly decide what its stance on dynamic package metadata is, one way or the other, so that installers and lockers can build accordingly, ideally in a PEP-conformant way. I’d certainly prefer tools such as uv implement PEP-751 than go their own way.

I’d be interested to know why uv wants to include the version in the lockfile, or if it’d be all right to loosen the lockfile when the version is dynamic. Same for the PEP; is the enclosing package’s version critical to the locker or the installer role?

brettcannon · November 15, 2024, 12:55am

Then I will ask you the same question I asked @hynek : Are you wanting this for your code (e.g., what you’re testing in CI), or for any dependency? If it’s the former then I get it and I’ll figure something out (it might assume an editable install). But if it’s the latter see all the follow-up questions I had for Hynek.

offby1 · November 15, 2024, 12:56am

My code. I think restricting the option to editable installs makes a fair amount of sense, actually.

hynek · November 15, 2024, 5:20am

I’m so sorry, Brett.

I honestly lack the fantasy how it would make sense to be the latter.

What I want is something like this uv.lock parlance:

[[package]]
name = "attrs"
version = { dynamic = true }
source = { editable = "." }

Or even ignore = true because there might be more reasons to not lock the version of the local project to the lock file? I honestly don’t know why it would make sense for editable projects in the first place.

I don’t want to repeat the points that Tin and Chris already made; I think/hope the use case is clear. This is not about publishing this lock file in any way, it’s just to bring stability to development/CI.

I’d just like to re-stress the point that with the advent – end explicit PyPA encouragement – of CI-based PyPI uploads it makes more sense than ever to make SCMs a first-class citizen.

dhduvall · November 15, 2024, 7:36am

Perhaps I’m misunderstanding some of the comments, but I wouldn’t want dynamic versions tied to editable installs. During development, that’s fine, but in my CI builds, I turn that off and fully install the local project … and still want to be able to give it a dynamic version.

mitsuhiko · November 15, 2024, 8:20am

Let me push back on this a bit. The reason I’m suggesting to having that conversation (outside of this thread!) is because dynamic metadata as we have it today is not a good idea. But the use cases it addresses are real. If we define a lock file, we should have an answer for how dynamic metadata is to be dealt with.

Today dynamic metadata does not just cause performance issues for tools like uv, removing a lot of potential for optimizations, it also cases real usability concerns. uv itself right now is gaining a lot of functionality to deal with manual cache invalidation which is entirely independent of the standardization effort. Eg, you can do this today:

[tool.uv]
cache-keys = [{ git = { commit = true } }]
reinstall-package = ["my-completely-magic-package"]

I think understanding what dynamic metadata does today, what problems it causes, and how we can avoid it and provide a better experience is a conversation we should have. We should not have it here but we should at least understand where we are standing or want to go prior to agreeing on how to deal with dynamic metadata in lock files.

steve.dower · November 15, 2024, 11:10am

I suspect we need to more clearly draw the line between resolvers and installers again. These lockfiles are intended to be the intermediate format from resolver to installer, and the installer part shouldn’t be making any decisions about what package to choose.^[1]

Realistically, that means it gets all the URIs that it needs, and at most chooses yes or no for each (and doesn’t reevaluate any other part as a result of that selection).

The most we can do here is have an entry type that says “install the files at this URI^[2] without validation^[3]”. Making the installer do any more work than that is making it into a resolver, and therefore out of scope.

Apart from what we put in the spec, and require to be manually specified by the caller. ↩︎
Which may include a local file or source tree. ↩︎
Established earlier, there’s no real reliable way to validate an unpublished package, and it’s probably not desirable anyway. ↩︎