PEP 751: one last time

The latest/last major draft of PEP 751 can be found at PEP 751 – A file format to record Python dependencies for installation reproducibility | peps.python.org. This version is starting out as a standard to replace/supplant using requirements.txt for a lock file (e.g., what pip-tools creates).

I say “starting out” as there are some open issues to go through which could make this work as a lock file replacement for e.g., pdm.lock. But I also want to make sure things will work for people in general and not a single tool.

I also don’t want this conversation dragging on forever, and so I plan to have this PEP done and submitted for pronouncement before April. That does mean the discussion needs to be done well before then to update my PoC, give people to think about it, etc. So that means I will call “time” on this discussion at some point if it drags on. Also, in the name of time, if I feel like you didn’t read the PEP and it contains an answer I will probably flat-out tell you to read the PEP w/o answering your question.

Open issues

So here is how this is going to work. For open issues which don’t massively affect how installation works, people in general can convince me to go one way or another (I’ll share where I’m leaning in each point). For things that are big shifts it will take at least two tools to say, “if you add this we can rely on this format” to make the change. For instance, if PDM and Poetry say, “add this and we can drop our custom lock file format” then that should be enough to get it into the PEP (no pressure, @frostming , @radoering , and @charliermarsh :wink:).

Also, I’m not really open to new ideas that are not listed below unless they come from a tool maintainer who is going to generate these lock files.

Simplification

Drop recording the package version

As this is written, the package version is optional since it can only be reliably recorded when an sdist of wheel file is used. And since both sources record the version in file names it is technically redundant. But having the version explicitly called out could be viewed as helping with auditing by not having to find and parse file names (especially if an sdist file name doesn’t conform to the latest spec).

I’m leaning against this idea.

Drop the requirement to specify the location of an sdist and/or wheels

At least one person has commented how their work has unstable URLs for all sdists and wheels. As such, they have to search for all files at install regardless of where the file was found previously. Dropping the requirement to provide the URL or path to a file would help solve the issue of recording known-bad information.

To support this, though, would require installation to support finding files via a package index or some other mechanism specified outside of this PEP. The former adds complexity (discussed as another open issue), while the latter means this PEP cannot fully explain the installation process.

I’m neutral on this.

Drop requiring file size and hashes

At least one person has said that their work modifies all wheels and sdists with internal files. That means any recorded hashes and file sizes will be wrong. By making the file size and hashes optional – very likely through some opt-out mechanism – then they could continue to produce lock files that meet this PEP’s
requirements.

As it weakens security by not making hashes and file sizes mandatory, it somewhat dilutes the purpose of this PEP. It also only works with external projects if the creator of the lock file is external to the company modifying the files and chose to leave out hashes. It also is only beneficial if the file modifications are not idempotent, thus causing random changes in hashes and file size.

I’m leaning against this idea.

Drop recording the sdist file name

While incompatible with dropping the URL/path requirement, the package version, and hashes, recording the sdist file name is technically not necessary at all (right now recording the file name is optional). The file name only encodes the project name and version, so no new info is conveyed about the file (when the package version is provided). And if the location is recorded then getting the file is handled regardless of the file name.

But recording the file name can helpful when looking for an appropriate file when the recorded file location is no longer available (while sdist file names are now standardized thanks to PEP 625, that has only been true since 2020 and thus there are many older sdists with names that may not be guessable).

I’m leaning against this idea.

Support installing files via a package index

With a package index URL and a file name, one can find the location of a file at install-time. This not only allows recording the URL or path optional, it could also act as a fallback if the original location is no longer valid.

This does increase the burden on tools performing installation as they would now have to support this fallback. It could be made as an optional feature, although the chances are people will expect it to be implemented as it shouldn’t increase the complexity of an installer drastically.

I’m leaning towards supporting this idea.

Make packaging.wheels a table

One could see writing out wheel file details as a table keyed on the file name. For example:

[[packages]]
name = "attrs"
version = "23.2.0"
requires-python = ">=3.7"
index = "https://pypi.org/simple/"
[packages.wheels]
"attrs-23.2.0-py3-none-any.whl" = {upload-time = 2023-12-31T06:30:30.772444Z, url = "https://files.pythonhosted.org/packages/e0/44/827b2a91a5816512fcaf3cc4ebc465ccd5d598c45cefa6703fcf4a79018f/attrs-23.2.0-py3-none-any.whl", size = 60752, hashes = {sha256 = "99b87a485a5820b23b879f04c2305b44b951b502fd64be915879d77a7e8fc6f1"}

[[packages]]
name = "numpy"
version = "2.0.1"
requires-python = ">=3.9"
index = "https://pypi.org/simple/"

[packages.wheels]
"numpy-2.0.1-cp312-cp312-macosx_10_9_x86_64.whl" = {upload-time = 2024-07-21T13:37:15.810939Z, url = "https://files.pythonhosted.org/packages/64/1c/401489a7e92c30db413362756c313b9353fb47565015986c55582593e2ae/numpy-2.0.1-cp312-cp312-macosx_10_9_x86_64.whl", size = 20965374, hashes = {sha256 = "6bf4e6f4a2a2e26655717a1983ef6324f2664d7011f6ef7482e8c0b3d51e82ac"}
"numpy-2.0.1-cp312-cp312-macosx_11_0_arm64.whl" = {upload-time = 2024-07-21T13:37:36.460324Z, url = "https://files.pythonhosted.org/packages/08/61/460fb524bb2d1a8bd4bbcb33d9b0971f9837fdedcfda8478d4c8f5cfd7ee/numpy-2.0.1-cp312-cp312-macosx_11_0_arm64.whl", size = 13102536, hashes = {sha256 = "7d6fddc5fe258d3328cd8e3d7d3e02234c5d70e01ebe377a6ab92adb14039cb4"}
"numpy-2.0.1-cp312-cp312-macosx_14_0_arm64.whl" = {upload-time = 2024-07-21T13:37:46.601144Z, url = "https://files.pythonhosted.org/packages/c2/da/3d8debb409bc97045b559f408d2b8cefa6a077a73df14dbf4d8780d976b1/numpy-2.0.1-cp312-cp312-macosx_14_0_arm64.whl", size = 5037809, hashes = {sha256 = "5daab361be6ddeb299a918a7c0864fa8618af66019138263247af405018b04e1"}
"numpy-2.0.1-cp312-cp312-macosx_14_0_x86_64.whl" = {upload-time = 2024-07-21T13:37:58.784393Z, url = "https://files.pythonhosted.org/packages/6d/59/85160bf5f4af6264a7c5149ab07be9c8db2b0eb064794f8a7bf6d/numpy-2.0.1-cp312-cp312-macosx_14_0_x86_64.whl", size = 6631813, hashes = {sha256 = "ea2326a4dca88e4a274ba3a4405eb6c6467d3ffbd8c7d38632502eaae3820587"}
"numpy-2.0.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl" = {upload-time = 2024-07-21T13:38:19.714559Z, url = "https://files.pythonhosted.org/packages/5e/e3/944b77e2742fece7da8dfba6f7ef7dccdd163d1a613f7027f4d5b/numpy-2.0.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", size = 13623742, hashes = {sha256 = "529af13c5f4b7a932fb0e1911d3a75da204eff023ee5e0e79c1751564221a5c8"}
"numpy-2.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl" = {upload-time = 2024-07-21T13:38:48.972569Z, url = "https://files.pythonhosted.org/packages/2c/f3/61eee37decb58e7cb29940f19a1464b8608f2cab8a8616aba75fd/numpy-2.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", size = 19242336, hashes = {sha256 = "6790654cb13eab303d8402354fabd47472b24635700f631f041bd0b65e37298a"}
"numpy-2.0.1-cp312-cp312-musllinux_1_1_x86_64.whl" = {upload-time = 2024-07-21T13:39:19.213811Z, url = "https://files.pythonhosted.org/packages/77/b5/c74cc436114c1de5912cdb475145245f6e645a6a1a29b5d08c774/numpy-2.0.1-cp312-cp312-musllinux_1_1_x86_64.whl", size = 19637264, hashes = {sha256 = "cbab9fc9c391700e3e1287666dfd82d8666d10e69a6c4a09ab97574c0b7ee0a7"}
"numpy-2.0.1-cp312-cp312-musllinux_1_2_aarch64.whl" = {upload-time = 2024-07-21T13:39:41.812321Z, url = "https://files.pythonhosted.org/packages/da/89/c8856e12e0b3f6af371ccb90d604600923b08050c58f0cd26eac9/numpy-2.0.1-cp312-cp312-musllinux_1_2_aarch64.whl", size = 14108911, hashes = {sha256 = "99d0d92a5e3613c33a5f01db206a33f8fdf3d71f2912b0de1739894668b7a93b"}
"numpy-2.0.1-cp312-cp312-win32.whl" = {upload-time = 2024-07-21T13:39:52.932102Z, url = "https://files.pythonhosted.org/packages/15/96/310c6f6d146518479b0a6ee6eb92a537954ec3b1acfa2894d1347/numpy-2.0.1-cp312-cp312-win32.whl", size = 6171379, hashes = {sha256 = "173a00b9995f73b79eb0191129f2455f1e34c203f559dd118636858cc452a1bf"}
"numpy-2.0.1-cp312-cp312-win_amd64.whl" = {upload-time = 2024-07-21T13:40:17.532627Z, url = "https://files.pythonhosted.org/packages/b5/59/f6ad378ad85ed9c2785f271b39c3e5b6412c66e810d2c60934c9f/numpy-2.0.1-cp312-cp312-win_amd64.whl", size = 16255757, hashes = {sha256 = "bb2124fdc6e62baae159ebcfa368708867eb56806804d005860b6007388df171"}

It’s entirely a structural change which some may (not) prefer.

I’m neutral on this.

Self-Referential

Record what tool created the lock file

Right now the PEP does not record any details about the tool that created a file. That’s out of simplicity reasons only. Which tool is used may be implicitly recorded by a [tool] table. But one could record various amounts of details about the tool to help recreate the file. Key details like tool name, the installation requirements when the tool is hosted on PyPI (encoded as dependency specifiers), and the command used to create the file would allow another tool to re-run the tool. It would also help discover what tool was used.

I’m neutral on this (and expect tool maintainers to inform deciding on this).

Drop the [tool] table

The [tool] table is included as it has been found to be very useful for pyproject.toml files. Providing similar flexibility to this PEP is done in hopes that similar benefits will materialize.

But some people are concerned that such a table will be too enticing to tools and will lead to files that are tool-specific and unusable by other tools. This could cause issues for tools trying to do installation, auditing, etc. as they would not know what details in the [tool] table are somehow critical.

I’m neutral on this (and expect tool maintainers to inform deciding on this).

Restrict the [tool] table to data that is disposable

The [tool] table is included as it has been found to be very useful for pyproject.toml files. Providing similar flexibility to this PEP is done in hopes that similar benefits will materialize.

But some people are concerned that such a table will be too enticing to tools and will lead to files that are tool-specific and unusable by other tools. As such, some have suggested only recording data that could be tossed at any time and have no negative effect (e.g., caching info). That would allow another tool to update a file and delete the [tool] tables without fear of impacting the file adversely.

I’m neutral on this (and expect tool maintainers to inform deciding on this).

List the requirement inputs for the file

Right now the file does not record the requirements that acted as inputs to the file. This is for simplicity reasons and to not explicitly constrain the file in some unforeseen way (e.g., updating the file after initial creation for a new platform that has different requirements, all without having to resolve how to write a comprehensive set of requirements).

But it may help in auditing and any recreation of the file if the original requirements were somehow recorded. This could be a single string or an array of strings if multiple requirements were used with the file.

I’m neutral on this.

Auditing

Recording dependencies

Recording the dependencies of a package is not necessary to install it. As such, it has been left out of the PEP as it can be included via [tool].

But knowing how costly a package is to include may be beneficial to users when determining why a certain package was included in the lock file. A flexible approach could be used to record the dependencies, e.g., as much detail as to differentiate from any other entry for the same package in the file (inspired by uv).

I’m neutral on this.

Recording dependents

Recording the dependencies of a package is not necessary to install it. As such, it has been left out of the PEP as it can be included via [tool].

But knowing how critical a package is to other packages may be beneficial. This information is included by pip-tools, so there’s prior art in including it. A flexible approach could be used to record the dependencies, e.g., as much detail as to differentiate from any other entry for the same package in the file (inspired by uv).

I’m neutral on this.

Including index-hosted attestations

We now have a spec that specifies attestation details for files uploaded to a package index like PyPI. Including some of those details may help detect issues with packaging when auditing the file (e.g., the publisher suddenly changing).The key reason this isn’t included in the PEP is because the specification is entirely focused on JSON. In order to bring it to this PEP either how to translate JSON to TOML would need to be specified, embed the JSON payload as a string, or re-specify some or all of the attestation spec.

I’m leaning towards supporting outlining how to translate the JSON to TOML.

Expanding the feature set

This PEP is currently oriented towards standardizing on something that can replace a requirements.txt file that acts as a lock file (e.g., what pip-tools produces). But with an expansion of features, the file format may be able to replace the internal lock file format used by tools like PDM and Poetry, especially when a pyproject.toml file is viewed as the ideal input for creating a lock file.

This stuff I definitely need tool buy-in for.

Record the requirements for extras of a package

A project with a pyproject.toml file may define some extras which add dependencies to install. In the simple case this would just be a matter of marking an entry in [[packages]] as only applying when a specific extra is requested. Unfortunately the simple case doesn’t cover all cases.

Consider the following example where the latest release of NumPy is 2.2.1 and the last NumPy 1 release was 1.26.4:

[project.optional-dependencies]
extra-1 = ["numpy"]
extra-2 = ["numpy~=1.0"]

Individually those extras cause no issue. But extra-2 does “overpower” extra-1 when it comes to what version of NumPy to install. That leads to the issue of needing a way to record the fact that if extra-1 is requested on its own then NumPy 2.2.1 should be recorded in the lock file, but if extra-2 is specified (either on its own or in conjunction with extra-1), then NumPy 1.26.4 should be recorded.

There are two possible solutions to this.

A single version across all extras in a single lock file

One solution to the problem is to do what uv does and lock to a single version for a package no matter what. That would mean any use of NumPy which could occur in any scenario would use NumPy 1.26.4. Some argue that leads to consistency as you won’t be wondering what version of NumPy you will end up with based on what extras you select.

But this does mean that if you want the version of NumPy to vary across extras you will need to create separate lock files for the various NumPy versions you want. While not technically an issue, it is ergonomically a bit annoying when this is necessary. But it’s not known how frequently varying package versions which depend on which extra(s) are chosen occur, and when they do occur do people still want the variance or prefer the approach uv_ has taken.

If this solution were to be taken, then very likely an extras key would be added which would list the extras that the entry in [[package]] should be used for. This works thanks to extras being additive, and thus only contributing more packages.

Support Boolean logic for extra selection

Another solution to this problem is specifying the conditions under which a package version applies. This would mean supporting Boolean logic to fully express the conditions under which a package applies.

But historically extras have not been expressed this way. The use of the extra clause in Requires-Dist is always singular and with a == operator. This also means the operators on extra have not been designed to treat the extras specified as a set, and so an expression simultaneously using == and != are not well-defined when it comes to extra. This all means that using extra == 'extra-1' and extra != 'extra-2' to appropriately express what is needed for extra-1 to work has not been done before. It would also mean potentially more use of an extra key as the default package version may need to explicitly exclude all extra groups when other groups restrict what package versions apply.

For this to work we would either need to expand the use of the extra clause so it can be used in packages.marker or have an extras key which expresses a Boolean expression for under which the package should be used. In both situations the spec around extra would need to be expanded by this PEP – or another PEP before this one is accepted – to lay out how Boolean expressions would work in this case.

Record dependency groups

Dependency groups have the same concerns as extras mentioned above along with lacking any pre-existing clause for use in dependency specifiers. And so dependency groups have the added issue that to use Boolean expressions would require defining a new clause type.


I realize this is a lot, but I don’t want to go through another round on this PEP after this, so I tried to cover everything that people have brought up about the PEP that I’m open to changing.

Let the conversation begin!

25 Likes

It’s very exciting to start reading in detail and follow the next (final?:crossed_fingers:) round of discussion!

Are there any downsides to including this data?

Your description captures the potential benefits well. I think it may turn out to be particularly valuable for users, even if tool maintainers themselves are neutral about it. In particular,

It would also help discover what tool was used.

This point seems very useful for complex projects which are combining multiple tools.

So I’m not clear on what would argue for omission?

3 Likes

Has anyone from pipenv spoken in - or been invited to - this discussion?

If not, seems like an omission. They even live in the pypa github org, and I think get considerably more downloads than eg pdm.

1 Like
1 Like

Thanks for your tenacity on this Brett! Here are my thoughts on specific issues:

  • Regarding provenance information, @woodruffw and I’s original plan would be to encode this information into the lock file in the [tool] table, so if there’s going to be an actual defined way to encode that information into the lock file then my opinions on the [tool] table change. Happy to discuss this more!

  • Drop requiring file size and hashes: I’m also against this idea, especially hashes. If users have a lock file they should feel confident in using the lock file to reproduce an environment, excluding hashes means users can’t make the 1-to-1 mental connection that installing from a lock file will either reproduce securely or fail.

  • [tool] table is disposable: I think this is acceptable, assuming that applications are committing lock files to version control having this information be disposable is fine? cc @woodruffw as a potential [tool] table user.

  • Recording dependencies: If this information were dropped from lock files (ie, no tracking dependencies or dependents) then to recreate the dependency graph would require checking package metadata and checking that against the lock file. Not a hard blocker, but if lockers already have this information in-hand when locking it seems fine to write to the lock to save others some work? From a human POV, there might be some utility in “shaking” unused dependencies using this info, but it’s probably marginal?

4 Likes

Only in so much as it’s more information and the last time I brought it up there was disagreement on what to include. People were also a bit concerned about giving people a false sense that they could regenerate the lock file exactly as it was. And since there could be variance between runs due to new version releases it won’t be the same.

Add in the file getting updated by e.g., Dependabot which wouldn’t be captured by this information and it gets a bit murky as to the utility of the data. Maybe just the tool name is good enough? Or tool and version? But once again, the instant another tool like Dependabot touches the file it isn’t totally accurate anymore.

That plan existed because I started PEP 751 before PEP 740 got accepted. Now that PEP 740 is accepted and a standard I personally think it’s worth including as it improves the security details to help with auditing and I assume installation as well (it might not be obvious, but I tried to go through PyPA specifications - Python Packaging User Guide and pull in anything that seemed relevant for installation).

So yes, let’s have a discussion with @woodruffw about this. :smiling_face: (And I’m going to mention @dustin as the other security person I know who has participated in previous discussions of this PEP.)

Because since I said substantial ideas can only come from tool authors, I was very liberal with open issues. I really don’t see this happening without a major groundswell of support, but since someone brought it up to me I only thought it was fair to include it.

Someone once said they wouldn’t want to use the lock file if it contained superfluous data that wasn’t relevant to installing. Tracking the dependencies/dependents of a project in no way affects installation; it’s purely information to the user. And with the amount of push-back I have gotten on practically every facet of this proposal at some point, I’m not assuming anything when it comes to optional data.

Having said all of that, I have personally found pip-tools listing the dependents of a project useful when puzzling over why something was installed or how important something was. But this could also go into a [packaging.tool] table pretty easily as it doesn’t affect installation since we aren’t going down the dependency graph route anymore to determine what to install.

So yes, it’s useful information, but it doesn’t have to be there in order for a lock file to be successful. That’s why I’m neutral. But it’s an open issue as I’m open for being convinced it’s worth making it an official part of the spec.

The only way an unused dependency ends up in a lock file is either it’s unused as a top-level dependency, there was a bug in the tool generating the lock file, or you are doing a pip freeze-style generation of your lock file. Only that last case really has any utility and the lock file itself doesn’t play a critical role as you could have aldo found out what you were going to record anyway since it’s a 1:1 translation from installed to recorded. As such I don’t think it’s that critical of a use-case.

4 Likes

Same for me – IMO a lack of hashes would significantly dilute the “locking” property. It would also pose a performance challenge for using this format with attestations, since hashes allow the installing client to pre-verify the attestation without having to download the entire wheel.

(OTOH I have no strong feelings dropping the file size – from an integrity perspective the hash is sufficient and much stronger. But maybe there are separate desirable properties that come from also having the file size?)

Yes, let’s! As a framing device, here’s what we need (at a minimum), to make attestation verification possible and useful. I’m using [tool] for the example, but per the point about PEP 740 being accepted and implemented this could easily live elsewhere :slightly_smiling_face:

[[packages]]
name = "foobar"
version = "1.2.3"
index = "https://pypi.org/simple/"
wheels = [
  { name = "foobar-1.2.3-blah.whl", url = "https://...", hashes = { sha256 = "abcd..." } }
]

[packages.sdist]
name = "foobar-1.2.3.tar.gz"
url = "https://..."
hashes = { sha256 = "abcd..." }

[[packages.tool.attestation-identities]]
# ALT: could be `url = "https://github.com/..."` and similar for GitLab, etc.
kind = "GitHub"
repository = "owner/repo"
workflow = "publish.yml"
environment = "pypi"

(All naming here is subject to bikeshedding!)

In this case, the semantics are as followed:

  • Every package definition has zero or more attestation-identities, which specify the expected signing identities for all files specified in the package.
  • If a package has attestation-identities, they should be used in “any” (boolean OR) fashion: if any attestation identity matches the wheel (or sdist) that’s been selected during the resolution process, then verification is said to succeed. Otherwise, verification is said to fail.
  • The fields within an attestation-identity are verified in an “all” (boolean AND) fashion: the client should e.g. reject a selected file if it has a valid attestation for publish.yml @ github.com/owner/repo if the environment doesn’t match.

The example above is also minimal, but an attestation-identity could also specify other claims to be verified. There are quite a few of these, however, so I’m not sure whether it makes sense for this PEP to enumerate them (which goes back to whether this should be a semi-opaque part of packages.tool or not, I suppose :slightly_smiling_face:)

So, I think some main discussion items:

  1. Assuming this can/should go somewhere other than packages.tool, how detailed does the PEP need to be w/r/t claims and their semantics? I don’t want to bog this PEP down with the dozen+ things that could be specified in the claim set, so one possibility is to start with a minimum-viable claim set (owner/repo, workflow, environment) and leave more complex policy-building to external users/tooling.
  2. The example above puts the attestations at the per-package level, which is architecturally slightly distinct from where they are in PEP 740 (at the per-file level). In practice I think this is a non-issue in terms of either security or usability, but it’s the reason for an “OR” between attestation-identities. The alternative to this would be to have packages.sdist.attestation-identities and packages.wheels[*].attestation-identities, which would (1) be duplicative, and (2) not actually reduce the complexity much, since files can of course still have multiple attestations.

I’m curious what @dustin things about the above as well!

1 Like

It prevents a possible DoS by tricking someone into downloading a file that never ends or is at least extremely large (one of the maintainers of urllib3 who initially suggested recording the file size can tell me if I’m wrong :wink:).

Looking at Index hosted attestations - Python Packaging User Guide, none of these fields are explicitly specified. Looking at PEP 740, the example seems to line up with Index hosted attestations - Python Packaging User Guide . That lines up with the Publisher class in the spec which is very open-ended.

So why be specific in this PEP while being so open-ended in the spec? Is it because only a subset of details is useful for installation and so it makes sense to be explicit? Do you think it will be better UX in this instance?

I think that’s the big question. Should this PEP be more explicit than Index hosted attestations - Python Packaging User Guide? I’m okay if it is, but then how do we line it up with the index hosted attestations spec? And if it isn’t explicit, does it make using the information harder?

If it’s just going to be duplicated information for most files for a package then doing it at the package level for readability – both in the file as a whole and diffs when there are a lot of wheel files – seems reasonable to me.

2 Likes

BTW, I also want to say I’m a bit excited at the idea of getting attestation details into the PEP as I’m not aware of other lock file formats that have it. I think this is an area where we can be leaders in the open source community in upping our security posture without inconveniencing users and maybe motivate other ecosystems to start including equivalent details themselves.

It also doesn’t hurt that better security will make my report chain at work happy as well (but they also didn’t ask for it; I just said better security was a motivator for me doing this work and they supported that as a goal). :grin:

8 Likes

Thanks Brett! Appreciate all your work here. It will take me some time to get through everything listed above, but the proposal broadly matches my expectations… I’ll try to share some initial reactions now.

For what it’s worth, I’m against this idea for the PEP, but we did start omitting these in uv recently when locking source trees with dynamic versions.

(In the world of PEP 751, though, the “parent project” itself isn’t part of the lockfile, right? Like, if locking a pyproject.toml, the project defined by the pyproject.toml itself isn’t included in the lockfile? So I don’t think that’s necessary here?)

For what it’s worth, I’m against this idea. It adds complexity to the PEP and to the installer implementation.

I don’t have strong feelings on this (other than that we should be allowed to include them, of course).

Like you, I lean against this.

I personally lean against this, since it adds yet another burden onto installers, but it’s not make-or-break. Doesn’t this also imply that hashes must be optional?

I lean against it; I find it harder to parse visually. (Entirely subjective.)

If we just use this as a replacement for requirements.txt, then it seems nice to omit [tool] entirely.

If we instead want to use it as a replacement for uv.lock in some cases, then we’ll probably need [tool] (and I wouldn’t be surprised if other tools feel similarly), but it’d be nice to put restrictions on it. Specifically, if we need the lockfile to help inform resolution, then we’ll need to store more data in it than is possible without [tool].

For example: we store various user-provided settings in the lockfile, so that if you user runs uv lock with new settings, we can invalidate it. I believe Poetry stores a checksum for similar purposes (validation). But ideally none of these settings inform behavior at install-time…

So, “disposable” seems like my vote? Assuming that tools are hoping to replace their proprietary lockfiles with this format? (If not, then omitting it seems preferable, though I expect to be in a minority on that one.)

This isn’t fully accurate. We have that behavior by default, but we do allow users to define extras that are mutually incompatible. Then we solve for them separately, and include multiple versions of any relevant packages. Then, at install-time, we enforce that the user didn’t activate two conflicting extras.

The most common example here is for PyTorch, where users can declare that they need different PyTorch versions depending on whether a cpu or gpu extra is enabled.

1 Like

Makes sense! I was wondering if it was for “slow-download” type scenarios – I think I’m a little more bearish than others on whether those have a security dimension to them (since the user can always interrupt the download), but that addresses my question regardless :slightly_smiling_face:

Fair point! The lack of precision in the spec was my (maybe misguided) attempt to avoid encoding specific OIDC IdP constraints, although in practice these are present regardless.

I think the UX is acceptable while leaving it open-ended, since it’ll mostly be a matter of documentation in terms of educating users on which claims they should provide depending on the kind of attestation. It might also make sense to have some docs advising clients on what to do if they encounter a claim they don’t know how to check, although that probably falls outside of the scope of this PEP itself.

Yeah, the more I think about it the more I think it’d be good to be as explicit as the attestations spec, not more :slightly_smiling_face: – being too precise here will make it hard to extend the attestations spec in future ways that users have requested, e.g. email-identity-signed attestations.

So TL;DR I’m good with having each attestation-identity really just be a dict[str, Any] where kind is the only explicitly specified field, just like in index attestations!

I think so. Or at least I don’t see a reason to include it if you’re viewing the lock file as part of the project (just like you don’t include it in a requirements.txt file).

I don’t think so. I view installing from an index to either satisfy the use-case that an index’s file URLs are not stable, the URL goes 404, or the file is missing on disk. The hash in that case becomes even more important as it helps verify you are still getting the same file you wanted in the first place.

Whether “tools are hoping to replace their proprietary lockfiles with this format” is an open question until @frostming or @radoering say they will or won’t (I’m assumig uv won’t based on previous comments, but feel free to surprise me :wink:).

I’m so sorry about that! I must have misread something at some point.

Right, using the dependency graph to figure out what to install. Makes sense.

I’m assuming “user” is “tool author” as I’m not expecting end users to make that call. I’m honestly assuming the data will be self-explanatory enough for users to notice when something is off in the attestation data in the lock file (e.g. the GitHub project suddenly changed).

Two things. First, are you proposing what’s in the Publisher pseudo-code entirely (i.e. kind and _rest, and claims), or only kind and _rest? My reading of what claims represents sounds out of scope for what a lock file would care about. I also want to make sure you’re not proposing anything else since Publisher is part of AttestationBundle which is a part of Provenance. Just including Publisher would explain the “attestation-identities” key you implicit suggested.

Two, is Publisher all you’re proposing to include?

2 Likes

What are thoughts around how caching/proxies will work with indices/URLs specified? Can installers substitute out URLs/paths with a different one with the same hash (and therefore should lockers try to canonicalise the URLs to PyPI if a matching file exists on PyPI when the locker is pointed at a different index)?

As a pure installer[1] maintainer, I’d expect to do as little as possible when installing from a lockfile. So if there’s a URL, I would download it, and fail if the hash didn’t match. If there’s no URL, but some sort of index spec (something I’m not keen on, but I accept could be necessary), then I’d fetch the exact name/version given, and again fail if the hash failed to match.

I wouldn’t expect to swap out an explicitly given URL, and I’d be unhappy with a spec that required me to.


  1. as opposed to a combined locker and installer, which will have different priorities ↩︎

4 Likes

Yep, sorry for the confusion there – “user” meant “tool author” in that context.

The latter!

(Conceptually, I can imagine someone would want to lock on something like “this release came from this exact commit,” but the claims key is a wart in the attestation spec and isn’t required for that property, since _rest is a catch-all for everything.)

Yep, exactly.

(I’m going to look at updating the living spec to rename it from Publisher to AttestationIdentity (AttestingIdentity?) or similar, since I think Publisher will become confusing as more attestation signers besides Trusted Publishing become possible. But structurally, all I’m proposing is the Publisher model :smile:)

2 Likes

Unfortunately, I cannot promise that we will replace Poetry’s lock file if this and that is added. I can only promise that we will try/evaluate. However, even this will take some time (at least months, I assume).

In the current state, we probably would just use it as an export format - and to be honest that is perfectly fine for us. (Less work, less risk.). [1]

Considering the open issues:

I’m also leaning against it. Optional is fine.

Currently, we only record the index URL (or no URL at all for the default PyPI) and search the package in the index at install time so I think we are neutral on this.

I would not drop hashes but wonder if file sizes are necessary. Is it always possible to get file sizes without downloading the file? (It would be a pity if we had to download all wheels during locking just to get file sizes even though the hashes are provided by the index.)

I’m also leaning against it.

As already mentioned, this is fine for us.

Neutral.

I’m supporting the idea.

I am against it if the format is to replace tool-specific lock files. In my opinion, dropping the [tool] table is a clear step from “might replace tool specific lock files” to “just an export format”. I do not think we would try to replace our lock file without a [tool] table. (As already mentioned this outcome would also be perfectly fine for us.)

I am against it if the format is to replace tool-specific lock files. If we want to replace our tool specific lock file, I do not think it will always be possible to make the [tool] table disposable. Over time, when extending the format the [tool] table might become less relevant though.

I think I do not understand exactly what that means.

We would probably put this into the [tool] section since it is just required for convenience.

Currently, we calculate this information (if requested) from dependencies.

Although Poetry prefers to lock a single version of a package if it satisfies all constraints, Poetry does already support Boolean logic for extra selection.

I think that is exactly what Poetry is doing. [2] If we wanted to replace our lock file, we would use extra in packages.marker.

This is required if we want to replace our lock file. We could put it into the tool section so that other installers will always install all groups. On the other hand, this only works as long as there are no conflicting groups. [3]

To put it in a nutshell, we might try to replace our lock file and put relevant information in the tool section (if this section is not dropped). Support for dependency groups is crucial for us. We might put this information in the tool section. Further, the way we cope with extras might be incompatible with other installers. We would probably also only accept lock files created by Poetry itself (resp. containing the relevant tool.poetry sections) as input - at least in the beginning. Alternatively, we just use it as an export format, which avoids the controversial topics “dependency groups” and “extras”.


  1. Alternatively, we might even try to use it as lock file with crucial information in the tool section. For details see below. ↩︎

  2. see Dependency specification | Documentation | Poetry - Python dependency management and packaging made easy and markers: add special handling for `extra` by radoering · Pull Request #636 · python-poetry/poetry-core · GitHub for details ↩︎

  3. Currently, Poetry does not support conflicting groups. ↩︎

3 Likes

It is if the index is using API version 1.1.

4 Likes

Re: wheel URL (or path) requirement, and installing from package indices.

To support our use case, would it be acceptable if it is optional for installers to support falling back to package indices? Ie: “if a wheel file can’t be retrieved, installers MAY fall back to finding wheel files via package indexes”.

I would expect most current resolver+installers (eg pip, uv) to already have the required components to support this, and I’d be fine if it’s not the default behaviour.

If this is in the spec, I am neutral on whether wheel URL (or path) stays mandatory, though @pf_moore indicates they would prefer not to ignore provided URLs.

The same goes for sdists.

1 Like

What do you mean by “index spec”? I’m just talking about recording the Simple API URL to use to potentially get the file, so there isn’t a “spec” beyond Simple repository API - Python Packaging User Guide .

But regardless, it sounds like you would prefer that installing a file is using the Simple API or by URL, but not both at once for the same file (and to be clear, this isn’t for discovery, just to look up the URL a specific file can be found, either because no URL was provided or the URL has subsequently changed, e.g. changing file hosting providers).

You can also get the file size via an HTTP HEAD call.

I’m also willing to tweak the PEP to say it SHOULD be included when the file size is easy to get – e.g. package index Simple API response or HTTP HEAD call – since no one seems too concerned about the denial-of-service protection it provides.

What requirements were used to generate the lock file, i.e. the top-level dependencies.

Fair enough! I’ll wait to see if @frostming has anything to say before making a final call on how hard to lean into the “export-only” target.

Based on what @pf_moore says below, I don’t think so; it would be either URL or package index, but not both (but I might also be misunderstanding Paul):

2 Likes

I was assuming that to get the file, you’d need to use the project name and version to get the file list and then use compatibility tags to pick the right file. But thinking further about it, do you mean that I’d have the project name and filename, so I could get https://pypi.org/simple/<project>/ and pick the entry with the given filename and use the URL from that?

If so, then that’s reasonable. But it’s not how pip (or any other installer, I imagine) uses the index at the moment, so it would be new code we’d need to write. I’m not saying it’s hard[1], it’s just not something I anticipated needing to do.

I would rather that lockfile creators always provide URLs. In the edge cases where for some reason they can’t (where an index doesn’t provide stable URLs, for example) then a simple index URL plus a filename would be acceptable. But I’m not clear why a locker couldn’t provide a URL, except in cases where I think we’d have other problems anyway.

Most of the cases I know of where indexes don’t provide a stable URL are generally cases where authentication is required, and that’s a whole different can of worms - if we assume lockfiles won’t contain credentials, what will lockfiles that reference files which need authentication look like? And what will the user interaction be when installing them? I’m happy if the answers are “lockfiles contain no authentication information” and “the user has to specify credential information out of band in an installer-specific form”, but it’s worth noting that not all installers will have ways of providing credentials (even pip’s credential handling is not sufficient for some more awkward use cases, if the issues we see are anything to go by).

I’m not going to say that I refuse to accept it being optional to do that. But I wouldn’t rely on pip doing it, so how useful that would be would likely depend on how realistic it is to expect people to be using installers other than pip. For example, if uv supported that case would it be acceptable to @EpicWink to be told “if you want that, you need to use uv”?

More abstractly, my position is that if the locker specified a URL, then that URL should be considered canonical, and the file must be fetched from that location. If the file can’t be fetched from the given location, that’s a problem that should be reported to the user so that they can get the lockfile fixed, not something that we should work around.

I’m not going to go quite as far as saying that “if a locker can’t specify a URL, then what you’ve got isn’t a lockfile” because people have very different understandings of what a lockfile is. But for me, personally, I tend towards that strict interpretation.


  1. it’s only a few lines of code, if you already use something like requests that does the hard work - but you can then scale it up to five times that or more once you add caching, error handling, etc., etc. ↩︎

3 Likes