PEP 751: now with graphs!

anon62990384 · November 6, 2024, 6:18pm

I don’t see how they could. Unless the meaning of these locks is so watered down as to say, some items will be verifiable via hash, some not. I personally want nothing to do with that sort of spec and will just stick to what I have.

anon62990384 · November 6, 2024, 6:22pm

Perhaps I was not clear in my assesment of source trees. As far as I can tell, a package in a lock can have only a source tree, no sdist, no wheels, as the spec stands today. Its for that sort of case it seems to me you must either outlaw sorce trees or specify how to hash them.

charliermarsh · November 6, 2024, 6:34pm

I don’t understand why source trees have to be outlawed entirely. Why is it not sufficient to enable users to specify whether they want to allow them or not, similar to source distributions in the spec as-written?

charliermarsh · November 6, 2024, 6:52pm

In uv, a node in the graph looks like this (complete entry; nothing omitted):

[[package]]
name = "flask"
version = "3.0.3"
source = { registry = "https://pypi.org/simple" }
dependencies = [
    { name = "blinker" },
    { name = "click" },
    { name = "itsdangerous" },
    { name = "jinja2" },
    { name = "werkzeug" },
]
sdist = { url = "https://files.pythonhosted.org/packages/41/e1/d104c83026f8d35dfd2c261df7d64738341067526406b40190bc063e829a/flask-3.0.3.tar.gz", hash = "sha256:ceb27b0af3823ea2737928a4d99d125a06175b8512c445cbd9a9ce200ef76842", size = 676315 }
wheels = [
    { url = "https://files.pythonhosted.org/packages/61/80/ffe1da13ad9300f87c93af113edd0638c75138c42a0994becfacac078c06/flask-3.0.3-py3-none-any.whl", hash = "sha256:34e815dfaa43340d1d15a5c3a02b8476004037eb4840b34910c6e21679d288f3", size = 101735 },
]

The unique identifier for the node is the tuple of (name, version, source). (We call this a PackageId.)

Here are some example values for source:

{ registry = "https://pypi.org/simple" }
{ editable = "../library" }
{ path = "../library" }
{ url = "<https://example.org/foo-1.0.zip>" }
{ git = "https://github.com/agronholm/anyio?tag=4.6.2#c4844254e6db0cb804c240ba07405db73d810e0b" }

Each entry in dependencies points to a [[package]] by listing the PackageId – so each entry in dependencies is semantically (name, version, source), but we omit fields that aren’t necessarily to keep the format concise. In the above example, there’s only one entry for blinker in the lockfile, so the dependency is just { name = "blinker" }, instead of { name = "blinker", version = "1.8.2", source = { registry = "https://pypi.org/simple"} }.

In this way, we represent the resolved graph, rather than the requirements. Installation is much simpler than in the PEP, because you’re just looking up [[package]] nodes by PackageId rather than testing version specifiers.

Separately from the above, we also write the raw package metadata to the lockfile, but it’s only used for lockfile invalidation.

I’m not attached to specific details but I am a big fan of this schema. I find it easy to reason about, easy to read, and easy to audit manually when viewing actual uv.lock files. It works for highly complex resolutions while remaining very concise.

pf_moore · November 6, 2024, 6:53pm

Understood, and that’s your choice - although I will note that the approach @charliermarsh suggested of writing source tree hashes to the [tool.pex] section means that you could use the standard format with no loss of functionality. IMO, it would be a shame if pex chooses a proprietary lockfile format purely over a matter of principle, but that’s up to you.

At some point, we have to set some boundaries here, otherwise the standard will become an unusable “kitchen sink” of every feature anyone ever wanted.

I agree - speaking with my “PEP delegate” hat on, I would not want this PEP to define “how to hash a source tree”. Specifically, the problem of calculating a hash for a source tree is something that would have applicability far beyond just lockfiles, and therefore people looking for the official way to do so should not have to go looking in the lockfile PEP.

If someone wants to write a “how to hash a source tree” PEP, the lockfile spec could reference it. That’s something that could be deferred to a v2 lockfile, though.

anon62990384 · November 6, 2024, 7:29pm

That’s correct. However now a subset of the locks I create actually work for a given set of users. They are the ones left in the lurch when they hit a subset that doesn’t work.

To be concrete: Pex adopts this spec and moves its primary lock file format to it. Existing and new users lock things. They use the new found lock interoperability in their AWS Lambda deployments (a use case I detailed earlier). Everything works great. Months go by and a package they depend on bumps to a version that has a (transitive) VCS requirement on a tag. Pex can still lock this and uses the tool escape hatch Charlie mentioned. AWS won’t support that and their deployment goes boom. If I were that user I’d get angry at Pex. If I were that user and knew a little more though and liked to blog, I might post the umpteenth Python packaging is broken screed.

pf_moore · November 6, 2024, 7:41pm

If pex doesn’t warn the user that the lock requires a non-portable feature of pex and won’t guarantee identical results on other installers, that is something pex should fix. And of course “goes boom” simply means “installs the source tree without checking the hash”, so it probably works just fine, in practice.

But it sounds like you’re not willing to compromise on this (as is your right) so I’ll stop trying to persuade you otherwise at this point.

anon62990384 · November 6, 2024, 7:48pm

I am willing to compromise. I think the only valid one is to eliminate package items that are not verifiable from a spec about lock files. I think Brett mentioned people wanted this though. That makes no sense to me. How can a lock standard permit items in the lock with no verifiable source? That’s the sort of spec that I want no part of. Either it fundamentally is in support of verifiable locks and restricts itself to meet that goal: say no sdists, no source trees, or it does allow those and specifies how to reach verifiability, but a half state between the two is not good. It pushes the onus on tool adopters to implement warnings and users of those tools to now deal with researching those. When I could not support source trees, as I mentioned, I failed fast with a useful message. Then I added support later and removed that restriction / message. To now choose to adopt a standard that forces Pex and its users back to a mushy middle ground where warnings are emitted and tea-leaves read seems like a step backwards.

anon62990384 · November 6, 2024, 7:51pm

@brettcannon and @pf_moore I think you can take Pex out of your considerations here going forward. As Paul alluded to, its hard to juggle a bunch of players in a spec and I think I’m an outlier here in my views. Good luck!

EpicWink · November 6, 2024, 8:50pm

This quote was in relation to source trees, but I think it equally can apply for lack of package URLs: a user can specify to allow the installer to get the URLs from the simple API. Moreover, I would be happy if that’s optional for the installer to support.

My idea of a lock file seems to be different to other participants: I don’t want to specify how to install, rather just what to install (with validation).

mikeshardmind · November 6, 2024, 8:57pm

If the spec allows things that can’t be verified, I’m probably going to have to avoid using it professionally. Not that my place of work contributes back to open source anything frequently, but another data point to do what you will with.

brettcannon · November 6, 2024, 9:05pm

I don’t think so (and to be clear, the VCS case is appropriately covered, so this is only when dealing with a directory of files that doesn’t have a VCS backing it).

It’s definitely weakened, but I don’t think all security from the lock file becomes useless. Security in depth proposes doing what you can at all levels and not relying on a single layer to handle everything. In this case there’s a weak link, but hopefully there are other protections in place. Plus it limits your point of exposure, so you can add whatever protections you want to fill in that gap if you choose to (which could be a separate PEP to define how to hash a directory of files).

Why is that? PEP 751 – A file format to record Python dependencies for installation reproducibility | peps.python.org and PEP 751 – A file format to record Python dependencies for installation reproducibility | peps.python.org are there to support sdists and wheels, respectively.

I can word the PEP to say installers MAY support searching for sdists and wheels if no URL or path is specified, but that installers MUST specify a path or URL when reasonable to do so (and in the case @EpicWink has, it wouldn’t be reasonable).

Does anyone else have an opinion?

Could you provide the complete list?

How do you detect when the lock file will fail on a platform? In the last draft of the PEP I had that declared upfront, but there was some concerns on how easy it would be to amalgamate details into one and whether the details I was specifying made sense (e.g., I think listing supported wheels was a concern). And in this draft I list all requirements so you can tell when an edge goes to nowhere. But I’m trying to think if that strategy still works with this as you’re effectively simplifying the requirements down to just what to install without having to make any decisions at install time? I assume there’s a marker/markers key to make a requirement conditional. I guess having all the requirements could still work as the edges would still resolve to keys that don’t exist or lack a wheel for the platform. Or you could have requirements that are marked as unsupported and thus known to go nowhere.

And the general key concept matches in my head what you were basically going to suggest, so at least I understand it. I will say the current PEP breaks things up more into keys than what this format does, but if we went down this path and didn’t want to generate key names then I think the PEP would have to shift a bit to rely on the suggested source key more.

brettcannon · November 6, 2024, 9:31pm

Thanks for the input you did provide!

I understood it was for files, not source trees, and so that’s how I updated the PEP. I’m not sure how you would specify what you’re after as you would just have a dangling package version with details of what you locked against, i.e., was it a wheel, source tree, etc.

So you essentially want to strip all paths and URLs from the lock file so that the installer has to go find the, e.g., source tree or wheels on its own? But you still want all the other details, e.g., wheel filenames, hashes, file size, etc.? That would be a bigger shift to the PEP as all the places that paths and URLs are would need to have a second state of “go figure it out on your own where to find it, but this is for a directory of files”.

Do you just want permission in the PEP to have an installer that ignores paths and URLs and simply treat them as metadata at time of locking (or just ignore the PEP in the case of using paths and URLs in your own installer and you just know it’s not compliant in that one regard)?

What does your work do now in this situation? Are you using Pex to get that guarantee or some other tool that’s verifying a directory of files hasn’t changed since locking? And is it the allowance of any unverified package source, period, or that you’re not sure if an installer will provide a way to opt out of an unverifiable directory of files (i.e. have the PEP say installers must provide a way to opt-in/out of using any install mechanism that cannot be verified as the same as lock time)? And how do you expect editable installs to play into this as they are, by definition, designed to be edited and changed and thus not exactly verifiable to have changed since you created the lock file? Or would you not want editable installs either (which uv has explicitly requested based on their own user feedback)?

pf_moore · November 6, 2024, 9:37pm

One other question with all of this - is it critical that the lockfile format doesn’t allow even the possibility of these problematic cases, or is it sufficient that someone can audit (or better, run a tool over) the lockfile and get a report confirming that those features aren’t used?

Because while I can see it being important to prevent people using lockfiles that are insecure according to your policy, I find it hard to understand why the format having a feature that you choose not to use is so bad as to make the proposal unusable for you.

charliermarsh · November 6, 2024, 9:49pm

For sure – the source is here.

They are:

{ registry = "https://pypi.org/simple" }
{ url = "https://example.org/foo-1.0.zip" }
{ git = "https://github.com/agronholm/anyio?tag=4.6.2#c4844254e6db0cb804c240ba07405db73d810e0b" }
{ path = "../library/foo-1.0.0-py3-none-any.whl" } (a local distribution, like a .whl or .tar.gz)
{ editable = "../library" } (an editable source tree)
{ directory = "../library" } (a non-editable source tree)
{ virtual = "../library" } (this one is the strangest, it means: install the project’s dependencies, but not the project itself)

The whole editable vs. directory vs. virtual thing is somewhat debatable but hopefully the intent is clear at least.

Brett Cannon:

How do you detect when the lock file will fail on a platform? In the last draft of the PEP I had that declared upfront, but there was some concerns on how easy it would be to amalgamate details into one and whether the details I was specifying made sense (e.g., I think listing supported wheels was a concern). And in this draft I list all requirements so you can tell when an edge goes to nowhere. But I’m trying to think if that strategy still works with this as you’re effectively simplifying the requirements down to just what to install without having to make any decisions at install time? I assume there’s a marker/markers key to make a requirement conditional. I guess having all the requirements could still work as the edges would still resolve to keys that don’t exist or lack a wheel for the platform. Or you could have requirements that are marked as unsupported and thus known to go nowhere.

I think there are two parts to this…

First, we typically lock for all platforms, so the graph is “complete”. In that case, we traverse the graph, and if we can’t find a compatible wheel or source distribution for a given package (or, e.g., we can’t find a wheel and source distributions are disabled), we error.

However, we also allow users to instruct us to only lock for a subset of environments via tool.uv.environments. So they can say, “Only lock for macOS”:

[tool.uv]
# Resolve for macOS, but not for Linux or Windows.
environments = ["sys_platform == 'darwin'"]

(The environments listed in tool.uv.environments must be disjoint.)

If tool.uv.environments is specified, we include that in the lockfile. And at install-time, we determine whether the current platform is compatible with any of the entries in tool.uv.environments.

mikeshardmind · November 6, 2024, 9:52pm

Brett Cannon:

mikeshardmind:

If the spec allows things that can’t be verified, I’m probably going to have to avoid using it professionally.

What does your work do now in this situation? Are you using Pex to get that guarantee or some other tool that’s verifying a directory of files hasn’t changed since locking? And is it the allowance of any unverified package source, period, or that you’re not sure if an installer will provide a way to opt out of an unverifiable directory of files (i.e. have the PEP say installers must provide a way to opt-in/out of using any install mechanism that cannot be verified as the same as lock time)? And how do you expect editable installs to play into this as they are, by definition, designed to be edited and changed and thus not exactly verifiable to have changed since you created the lock file? Or would you not want editable installs either (which uv has explicitly requested based on their own user feedback)?

Currently, almost every python package used in production must come from a wheel. We don’t go as far as building everything from source (currently…), it is within our current posture to allow certain trusted dependencies and their preexisting build processes with some amount of review and verification. This is enforced currently with what started as an internal fork of pip (that I don’t like that we have for reasons beyond the scope of this pep), specifically to change some behavior around multiple indexes and to make requiring hashes non-optional. A special exception exists for this tool itself, as well as 2 other internal tools that exist for similar reasons, but changes to these tools are infrequent, and have their own review process.

A lock file spec that allows unlocked files would be seen as a source of potential issues waiting to happen, it wouldn’t be enough for this to be a setting as I understand our current posture, we’ve been making a point to migrate to and/or fork more and more tools such that they can’t be configured/used insecurely.

Editable installs don’t seem to make sense to lock, and I have no reason to need this either professionally or personally.

I understand this may put this out of your view of scope for this pep, but I see this as a possible route forward to have less packaging fracturing due to internal needs like this. Whether or not that’s something you think is worth being part of this pep is up to you and not something I feel comfortable pushing for, only informing of, given my work’s reluctance to share back.

mikeshardmind · November 6, 2024, 10:17pm

Frankly, lockfiles are a security tool. I and many other people who have to deal with security questions don’t believe newly designed security tools should be possible to have an unsafe configuration. The possibility for misuse and requiring additional tooling to get the correct result makes this an “inappropriately designed tool”

Outside of stricter than what many people currently accept here though, I don’t get the logic of including something which by definition can’t be locked in a lockfile.

h-vetinari · November 6, 2024, 10:23pm

I would say they are a stability tool, and that’s a much wider set of usecases & requirements than just for security.

mikeshardmind · November 6, 2024, 10:33pm

I’m not sure calling it a stability tool changes anything here. If it can’t be locked, then why include it in the lockfile? What stability does that provide?

brettcannon · November 6, 2024, 11:11pm

I’m worrying about everyone else who doesn’t lock for every platform, e.g., I believe PDM stopped doing universal lock files.

Charlie Marsh:

we also allow users to instruct us to only lock for a subset of environments via tool.uv.environments. So they can say, “Only lock for macOS”:
[tool.uv]
# Resolve for macOS, but not for Linux or Windows.
environments = ["sys_platform == 'darwin'"]
(The environments listed in tool.uv.environments must be disjoint.)

If tool.uv.environments is specified, we include that in the lockfile. And at install-time, we determine whether the current platform is compatible with any of the entries in tool.uv.environments.

So basically it records the markers that you lock for as provided by the user, which could also be used to record assumptions made by the locker.

Do @frostming or @radoering have any input on how to detect when an install would fail on a platform? I believe both PDM and Poetry record the full requirements so they aren’t necessarily resolving all edges to a specific node.

Maybe recording something equivalent to source=null to signify this is a dead-end if you hit it? Or is recording markers upfront for any assumptions something want brought back into the PEP?

Fair enough. As @pf_moore has suggested, a separate PEP could introduce a way to hash a directory of files and then update the lock file spec, it just doesn’t have to hold up this “1.0” PEP for those whom this isn’t a showstopper (and I’m happy to work with folks on writing that future PEP, but I don’t have the patience to start trying to figure this exact detail right now by having yet one more thing to argue over).

I believe it’s useful in a monorepo scenario.

I think you have a more stringent definition of “locked” than some others do. To some, “locking” is writing down the dependencies so you know what will get installed at the time you run the installer. But I think your definition of “locking” is an exact match of what the locker saw and what the installer installs. So the former definition lets you know what files will get installed at install-time, but the latter makes sure that every file matches from locker to install.