The latest/last major draft of PEP 751 can be found at PEP 751 – A file format to record Python dependencies for installation reproducibility | peps.python.org. This version is starting out as a standard to replace/supplant using requirements.txt
for a lock file (e.g., what pip-tools creates).
I say “starting out” as there are some open issues to go through which could make this work as a lock file replacement for e.g., pdm.lock
. But I also want to make sure things will work for people in general and not a single tool.
I also don’t want this conversation dragging on forever, and so I plan to have this PEP done and submitted for pronouncement before April. That does mean the discussion needs to be done well before then to update my PoC, give people to think about it, etc. So that means I will call “time” on this discussion at some point if it drags on. Also, in the name of time, if I feel like you didn’t read the PEP and it contains an answer I will probably flat-out tell you to read the PEP w/o answering your question.
Open issues
So here is how this is going to work. For open issues which don’t massively affect how installation works, people in general can convince me to go one way or another (I’ll share where I’m leaning in each point). For things that are big shifts it will take at least two tools to say, “if you add this we can rely on this format” to make the change. For instance, if PDM and Poetry say, “add this and we can drop our custom lock file format” then that should be enough to get it into the PEP (no pressure, @frostming , @radoering , and @charliermarsh ).
Also, I’m not really open to new ideas that are not listed below unless they come from a tool maintainer who is going to generate these lock files.
Simplification
Drop recording the package version
As this is written, the package version is optional since it can only be reliably recorded when an sdist of wheel file is used. And since both sources record the version in file names it is technically redundant. But having the version explicitly called out could be viewed as helping with auditing by not having to find and parse file names (especially if an sdist file name doesn’t conform to the latest spec).
I’m leaning against this idea.
Drop the requirement to specify the location of an sdist and/or wheels
At least one person has commented how their work has unstable URLs for all sdists and wheels. As such, they have to search for all files at install regardless of where the file was found previously. Dropping the requirement to provide the URL or path to a file would help solve the issue of recording known-bad information.
To support this, though, would require installation to support finding files via a package index or some other mechanism specified outside of this PEP. The former adds complexity (discussed as another open issue), while the latter means this PEP cannot fully explain the installation process.
I’m neutral on this.
Drop requiring file size and hashes
At least one person has said that their work modifies all wheels and sdists with internal files. That means any recorded hashes and file sizes will be wrong. By making the file size and hashes optional – very likely through some opt-out mechanism – then they could continue to produce lock files that meet this PEP’s
requirements.
As it weakens security by not making hashes and file sizes mandatory, it somewhat dilutes the purpose of this PEP. It also only works with external projects if the creator of the lock file is external to the company modifying the files and chose to leave out hashes. It also is only beneficial if the file modifications are not idempotent, thus causing random changes in hashes and file size.
I’m leaning against this idea.
Drop recording the sdist file name
While incompatible with dropping the URL/path requirement, the package version, and hashes, recording the sdist file name is technically not necessary at all (right now recording the file name is optional). The file name only encodes the project name and version, so no new info is conveyed about the file (when the package version is provided). And if the location is recorded then getting the file is handled regardless of the file name.
But recording the file name can helpful when looking for an appropriate file when the recorded file location is no longer available (while sdist file names are now standardized thanks to PEP 625, that has only been true since 2020 and thus there are many older sdists with names that may not be guessable).
I’m leaning against this idea.
Support installing files via a package index
With a package index URL and a file name, one can find the location of a file at install-time. This not only allows recording the URL or path optional, it could also act as a fallback if the original location is no longer valid.
This does increase the burden on tools performing installation as they would now have to support this fallback. It could be made as an optional feature, although the chances are people will expect it to be implemented as it shouldn’t increase the complexity of an installer drastically.
I’m leaning towards supporting this idea.
Make packaging.wheels
a table
One could see writing out wheel file details as a table keyed on the file name. For example:
[[packages]]
name = "attrs"
version = "23.2.0"
requires-python = ">=3.7"
index = "https://pypi.org/simple/"
[packages.wheels]
"attrs-23.2.0-py3-none-any.whl" = {upload-time = 2023-12-31T06:30:30.772444Z, url = "https://files.pythonhosted.org/packages/e0/44/827b2a91a5816512fcaf3cc4ebc465ccd5d598c45cefa6703fcf4a79018f/attrs-23.2.0-py3-none-any.whl", size = 60752, hashes = {sha256 = "99b87a485a5820b23b879f04c2305b44b951b502fd64be915879d77a7e8fc6f1"}
[[packages]]
name = "numpy"
version = "2.0.1"
requires-python = ">=3.9"
index = "https://pypi.org/simple/"
[packages.wheels]
"numpy-2.0.1-cp312-cp312-macosx_10_9_x86_64.whl" = {upload-time = 2024-07-21T13:37:15.810939Z, url = "https://files.pythonhosted.org/packages/64/1c/401489a7e92c30db413362756c313b9353fb47565015986c55582593e2ae/numpy-2.0.1-cp312-cp312-macosx_10_9_x86_64.whl", size = 20965374, hashes = {sha256 = "6bf4e6f4a2a2e26655717a1983ef6324f2664d7011f6ef7482e8c0b3d51e82ac"}
"numpy-2.0.1-cp312-cp312-macosx_11_0_arm64.whl" = {upload-time = 2024-07-21T13:37:36.460324Z, url = "https://files.pythonhosted.org/packages/08/61/460fb524bb2d1a8bd4bbcb33d9b0971f9837fdedcfda8478d4c8f5cfd7ee/numpy-2.0.1-cp312-cp312-macosx_11_0_arm64.whl", size = 13102536, hashes = {sha256 = "7d6fddc5fe258d3328cd8e3d7d3e02234c5d70e01ebe377a6ab92adb14039cb4"}
"numpy-2.0.1-cp312-cp312-macosx_14_0_arm64.whl" = {upload-time = 2024-07-21T13:37:46.601144Z, url = "https://files.pythonhosted.org/packages/c2/da/3d8debb409bc97045b559f408d2b8cefa6a077a73df14dbf4d8780d976b1/numpy-2.0.1-cp312-cp312-macosx_14_0_arm64.whl", size = 5037809, hashes = {sha256 = "5daab361be6ddeb299a918a7c0864fa8618af66019138263247af405018b04e1"}
"numpy-2.0.1-cp312-cp312-macosx_14_0_x86_64.whl" = {upload-time = 2024-07-21T13:37:58.784393Z, url = "https://files.pythonhosted.org/packages/6d/59/85160bf5f4af6264a7c5149ab07be9c8db2b0eb064794f8a7bf6d/numpy-2.0.1-cp312-cp312-macosx_14_0_x86_64.whl", size = 6631813, hashes = {sha256 = "ea2326a4dca88e4a274ba3a4405eb6c6467d3ffbd8c7d38632502eaae3820587"}
"numpy-2.0.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl" = {upload-time = 2024-07-21T13:38:19.714559Z, url = "https://files.pythonhosted.org/packages/5e/e3/944b77e2742fece7da8dfba6f7ef7dccdd163d1a613f7027f4d5b/numpy-2.0.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", size = 13623742, hashes = {sha256 = "529af13c5f4b7a932fb0e1911d3a75da204eff023ee5e0e79c1751564221a5c8"}
"numpy-2.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl" = {upload-time = 2024-07-21T13:38:48.972569Z, url = "https://files.pythonhosted.org/packages/2c/f3/61eee37decb58e7cb29940f19a1464b8608f2cab8a8616aba75fd/numpy-2.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", size = 19242336, hashes = {sha256 = "6790654cb13eab303d8402354fabd47472b24635700f631f041bd0b65e37298a"}
"numpy-2.0.1-cp312-cp312-musllinux_1_1_x86_64.whl" = {upload-time = 2024-07-21T13:39:19.213811Z, url = "https://files.pythonhosted.org/packages/77/b5/c74cc436114c1de5912cdb475145245f6e645a6a1a29b5d08c774/numpy-2.0.1-cp312-cp312-musllinux_1_1_x86_64.whl", size = 19637264, hashes = {sha256 = "cbab9fc9c391700e3e1287666dfd82d8666d10e69a6c4a09ab97574c0b7ee0a7"}
"numpy-2.0.1-cp312-cp312-musllinux_1_2_aarch64.whl" = {upload-time = 2024-07-21T13:39:41.812321Z, url = "https://files.pythonhosted.org/packages/da/89/c8856e12e0b3f6af371ccb90d604600923b08050c58f0cd26eac9/numpy-2.0.1-cp312-cp312-musllinux_1_2_aarch64.whl", size = 14108911, hashes = {sha256 = "99d0d92a5e3613c33a5f01db206a33f8fdf3d71f2912b0de1739894668b7a93b"}
"numpy-2.0.1-cp312-cp312-win32.whl" = {upload-time = 2024-07-21T13:39:52.932102Z, url = "https://files.pythonhosted.org/packages/15/96/310c6f6d146518479b0a6ee6eb92a537954ec3b1acfa2894d1347/numpy-2.0.1-cp312-cp312-win32.whl", size = 6171379, hashes = {sha256 = "173a00b9995f73b79eb0191129f2455f1e34c203f559dd118636858cc452a1bf"}
"numpy-2.0.1-cp312-cp312-win_amd64.whl" = {upload-time = 2024-07-21T13:40:17.532627Z, url = "https://files.pythonhosted.org/packages/b5/59/f6ad378ad85ed9c2785f271b39c3e5b6412c66e810d2c60934c9f/numpy-2.0.1-cp312-cp312-win_amd64.whl", size = 16255757, hashes = {sha256 = "bb2124fdc6e62baae159ebcfa368708867eb56806804d005860b6007388df171"}
It’s entirely a structural change which some may (not) prefer.
I’m neutral on this.
Self-Referential
Record what tool created the lock file
Right now the PEP does not record any details about the tool that created a file. That’s out of simplicity reasons only. Which tool is used may be implicitly recorded by a [tool]
table. But one could record various amounts of details about the tool to help recreate the file. Key details like tool name, the installation requirements when the tool is hosted on PyPI (encoded as dependency specifiers), and the command used to create the file would allow another tool to re-run the tool. It would also help discover what tool was used.
I’m neutral on this (and expect tool maintainers to inform deciding on this).
Drop the [tool]
table
The [tool]
table is included as it has been found to be very useful for pyproject.toml
files. Providing similar flexibility to this PEP is done in hopes that similar benefits will materialize.
But some people are concerned that such a table will be too enticing to tools and will lead to files that are tool-specific and unusable by other tools. This could cause issues for tools trying to do installation, auditing, etc. as they would not know what details in the [tool]
table are somehow critical.
I’m neutral on this (and expect tool maintainers to inform deciding on this).
Restrict the [tool]
table to data that is disposable
The [tool]
table is included as it has been found to be very useful for pyproject.toml
files. Providing similar flexibility to this PEP is done in hopes that similar benefits will materialize.
But some people are concerned that such a table will be too enticing to tools and will lead to files that are tool-specific and unusable by other tools. As such, some have suggested only recording data that could be tossed at any time and have no negative effect (e.g., caching info). That would allow another tool to update a file and delete the [tool]
tables without fear of impacting the file adversely.
I’m neutral on this (and expect tool maintainers to inform deciding on this).
List the requirement inputs for the file
Right now the file does not record the requirements that acted as inputs to the file. This is for simplicity reasons and to not explicitly constrain the file in some unforeseen way (e.g., updating the file after initial creation for a new platform that has different requirements, all without having to resolve how to write a comprehensive set of requirements).
But it may help in auditing and any recreation of the file if the original requirements were somehow recorded. This could be a single string or an array of strings if multiple requirements were used with the file.
I’m neutral on this.
Auditing
Recording dependencies
Recording the dependencies of a package is not necessary to install it. As such, it has been left out of the PEP as it can be included via [tool]
.
But knowing how costly a package is to include may be beneficial to users when determining why a certain package was included in the lock file. A flexible approach could be used to record the dependencies, e.g., as much detail as to differentiate from any other entry for the same package in the file (inspired by uv).
I’m neutral on this.
Recording dependents
Recording the dependencies of a package is not necessary to install it. As such, it has been left out of the PEP as it can be included via [tool]
.
But knowing how critical a package is to other packages may be beneficial. This information is included by pip-tools, so there’s prior art in including it. A flexible approach could be used to record the dependencies, e.g., as much detail as to differentiate from any other entry for the same package in the file (inspired by uv).
I’m neutral on this.
Including index-hosted attestations
We now have a spec that specifies attestation details for files uploaded to a package index like PyPI. Including some of those details may help detect issues with packaging when auditing the file (e.g., the publisher suddenly changing).The key reason this isn’t included in the PEP is because the specification is entirely focused on JSON. In order to bring it to this PEP either how to translate JSON to TOML would need to be specified, embed the JSON payload as a string, or re-specify some or all of the attestation spec.
I’m leaning towards supporting outlining how to translate the JSON to TOML.
Expanding the feature set
This PEP is currently oriented towards standardizing on something that can replace a requirements.txt
file that acts as a lock file (e.g., what pip-tools produces). But with an expansion of features, the file format may be able to replace the internal lock file format used by tools like PDM and Poetry, especially when a pyproject.toml
file is viewed as the ideal input for creating a lock file.
This stuff I definitely need tool buy-in for.
Record the requirements for extras of a package
A project with a pyproject.toml
file may define some extras which add dependencies to install. In the simple case this would just be a matter of marking an entry in [[packages]]
as only applying when a specific extra is requested. Unfortunately the simple case doesn’t cover all cases.
Consider the following example where the latest release of NumPy is 2.2.1 and the last NumPy 1 release was 1.26.4:
[project.optional-dependencies]
extra-1 = ["numpy"]
extra-2 = ["numpy~=1.0"]
Individually those extras cause no issue. But extra-2 does “overpower” extra-1 when it comes to what version of NumPy to install. That leads to the issue of needing a way to record the fact that if extra-1 is requested on its own then NumPy 2.2.1 should be recorded in the lock file, but if extra-2 is specified (either on its own or in conjunction with extra-1), then NumPy 1.26.4 should be recorded.
There are two possible solutions to this.
A single version across all extras in a single lock file
One solution to the problem is to do what uv does and lock to a single version for a package no matter what. That would mean any use of NumPy which could occur in any scenario would use NumPy 1.26.4. Some argue that leads to consistency as you won’t be wondering what version of NumPy you will end up with based on what extras you select.
But this does mean that if you want the version of NumPy to vary across extras you will need to create separate lock files for the various NumPy versions you want. While not technically an issue, it is ergonomically a bit annoying when this is necessary. But it’s not known how frequently varying package versions which depend on which extra(s) are chosen occur, and when they do occur do people still want the variance or prefer the approach uv_ has taken.
If this solution were to be taken, then very likely an extras
key would be added which would list the extras that the entry in [[package]]
should be used for. This works thanks to extras being additive, and thus only contributing more packages.
Support Boolean logic for extra selection
Another solution to this problem is specifying the conditions under which a package version applies. This would mean supporting Boolean logic to fully express the conditions under which a package applies.
But historically extras have not been expressed this way. The use of the extra
clause in Requires-Dist
is always singular and with a ==
operator. This also means the operators on extra
have not been designed to treat the extras specified as a set, and so an expression simultaneously using ==
and !=
are not well-defined when it comes to extra
. This all means that using extra == 'extra-1' and extra != 'extra-2'
to appropriately express what is needed for extra-1 to work has not been done before. It would also mean potentially more use of an extra
key as the default package version may need to explicitly exclude all extra groups when other groups restrict what package versions apply.
For this to work we would either need to expand the use of the extra
clause so it can be used in packages.marker
or have an extras
key which expresses a Boolean expression for under which the package should be used. In both situations the spec around extra
would need to be expanded by this PEP – or another PEP before this one is accepted – to lay out how Boolean expressions would work in this case.
Record dependency groups
Dependency groups have the same concerns as extras mentioned above along with lacking any pre-existing clause for use in dependency specifiers. And so dependency groups have the added issue that to use Boolean expressions would require defining a new clause type.
I realize this is a lot, but I don’t want to go through another round on this PEP after this, so I tried to cover everything that people have brought up about the PEP that I’m open to changing.
Let the conversation begin!