PEP 665, take 2 -- A file format to list Python dependencies for reproducibility of an application

One might imagine, say, to lock one commit for linux and another one for windows, because each needs different patches.

Ah, ok. But even then, what are the benefits of keying by version ? Or the drawback of having a flat list per package.

This is why supporting sdists and source tree is a separate concern that may end up with their own PEP (as well as why the file format is versioned); as of right now there’s not reason to layer on extra things to tell something is a wheel. We don’t need to prematurely optimize for something that isn’t in the PEP yet and for which adding support for in the future will definitely require additional support by all tools involved.

Organizational. If your lock file pulls in 2 or 3 different versions of a package to meet various platform requirements, do you want to have to scan the body of array entry to see which version it is, or would you rather have it clear by the section? Or put another way, semantically various files for the same version are related, but files from different versions have no relation to each other beyond the fact that the same project released them. Grouping by release is (at least) how I view things on PyPI and as a whole and what you typically lock by, not as an overall project.

Well, that’s enough co-authors to accept that as the solution. :smiley:

Because there are a bunch of fields that PEP 621 mandates due to core metadata that an app simply doesn’t need to care about (e.g. you don’t really need to give your app a name).

It’s up to the locker to decide what to put into metadata.requires. This is on purpose so the resulting dependency graph can be whatever it needs to be to resolve appropriately.

Updated in the PEP in PEP 665: close out the open issue about "best-fitting wheel" · python/peps@95fe2fc · GitHub .

IMO this needs an entire PEP to discuss. It is not unreasonable to reuse project.dependencies as inputs, but there are still a bunch of unanswered questions. The input, for example, must allow the user to specify an alternative index to fetch packages (and allow index fallback etc.), and once you do that you’ll also need to discuss how the format can be successfully taught with minimal confusion regarding how the index specification is only effective if you use pyproject.toml directly, not when you package it into a distribution (i.e. you can’t specify indexes, put the package to PyPI, and expect pip to fetch the package’s dependencies from your own index instead of PyPI). This is even arguably enough rationale against reusing project.dependencies as input to the lock file.

1 Like

I figured this is where this is going to end up, but I figured I would at least ask to see if there was a chance people actually agreed on this. :wink:

Interesting difference in opinion as my brain always thinks of that as something to specify on the command-line rather than in the configuration file. But I do know that requirements files support this, so maybe its usefulness is broader than I realize/think.

OK, I will remove this as an open issue, leaving us just with the sdist discussion to resolve before the PEP is ready for pronouncement!

2 Likes

There’s a partial PEP in progress somewhere to define a shareable format for this kind of configuration.

I don’t think this necessarily has to be locked, though it’s definitely convenient to have the source listed provided installers allow it to be overridden (e.g. I should be able to use an internal PyPI mirror/cache rather than being forced to go via the internet).

I just added the following to try and make it as clear as possible that the PEP is flexible around anything it doesn’t specify on purpose:


As Flexible as Possible

Realizing that workflows vary greatly between companies, projects, and
even people, this PEP tries to be strict where it’s important and
undefined/flexible everywhere else. As such, this PEP is strict where
it is important for reproducibility and compatibilitiy, but does not
specify any restrictions in other situations not covered by this PEP;
if the PEP does not specifically state something then it is assumed to
be up to the locker or installer to decide what is best.

2 Likes

Note that I specifically said “input” without specifying where the input should come from :wink:

The input can come from the lock file used for installation, but it can also come from additional user inputs (e.g. command line options, configuration files, or environment variables) specified to the locker and installer, or even only available in the original application manifest without the information being locked.

Also note that what indexes were used to generate the lock file is inheritantly not meaningful knowledge to the installer, since the lock file already provides enough information for the installer to find exact artifacts without index knowledge. So the installer only needs to allow two use cases:

  1. No index override, where the installer simply uses the artifact URLs provided by the lock file.
  2. An explicit index override, where the installer ignores all the artifact URLs and find artifacts based on versions, filenames, hashes etc. instead.
2 Likes

While I understand the motivation here, how do we avoid lockers depending on a particular installer? (Past experience leads me to believe that people will expect anything install-related to be handled by pip, so “what pip does” could end up being a de facto standard for anything not covered by the PEP).

It’s possible this won’t be an issue in practice - do you have an example of something where the PEP deliberately doesn’t restrict something in the way you describe?

No, but some offline feedback I got was that some of the worries about not supporting sdists may be coming from people feeling that if it isn’t in the PEP that it can’t be solved outside of it.

2 Likes

OK, but sdist support is covered by the PEP, at least to the extent that it is currently an open issue and presumably at some point will be moved to “rejected ideas”. So there will certainly have to be some information in the PEP, even if the implementation details remain open.

I’d expect that when sdists are moved to the “rejected ideas” section (assuming that’s what happens), the following points would be made:

  1. The PEP clarifies that while there are use cases for sdist support, those are considered out of scope for PEP 665.
  2. It’s explicitly noted that adding sdist support via a follow-up PEP was discussed and is how this PEP expects sdist support to be added at a future date, if required.
  3. I’d imagine a key reason for the rejection would be because it’s much harder to achieve reproducibility for sdists - so the rejection should note this and clarify that any future sdist support PEP needs to describe how reproducibility will be handled.

I don’t know whether that matches people’s expectations about “solving sdist support outside of PEP 665”. In particular, though, if people want to experiment with implementing such support, they will need to write their own lockfile installer - pip isn’t the place for such experimentation.

To be explicit, pip will need sdist support to be backed by an approved PEP before we add it.

I’d also like to see (assuming sdist support gets rejected) an explicit paragraph in the motivation section explaining that the PEP chooses to only support wheels because that allows reproducibility to be guaranteed without needing build systems to provide reproducibility guarantees as well (which, as far as I know, none of them formally do). I think that would make the current emphasis in the PEP on reproducibility less distracting for people who don’t have a strong need for it.

Sure, but the rest of the PEP is written as if sdists are not supported.

Yes.

Assuming Supporting sdists and source trees in PEP 665 doesn’t lead to sdists making into this PEP, I would summarize what’s there which includes what needs to be resolved by any future PEP.

PEP 665 -- A file format to list Python dependencies for reproducibility of an application | Python.org was meant to capture that, but I can stengthen it a bit to be more explicit about this.

Added a paragraph in PEP 665: point out why relying on wheels is a good thing · python/peps@e4d35d7 · GitHub.

1 Like

Poetry has said they won’t support this PEP, both from a standardization and export viewpoint.

Not supporting sdists, source trees, and VCSs is one sticking point. The other is the per-file dependencies which was added to the PEP after Donald provided direct feedback on that very topic.

I have recorded this feedback in PEP 665: record Poetry's views on the PEP · python/peps@6bbde29 · GitHub .

2 Likes

I just posted a new draft of the PEP with sdist support listed in the rejected section. That closes out all open issues!

Rendered versions at:

1 Like

So now that the PEP no longer supports sdists, can we do a review of whether it still covers enough use cases to be viable? In my experience, I don’t think I’ve ever heard anyone ask for pip to support reproducible installs explicitly, so while I get that the idea is that PEP 665 is about ensuring reproducible installs, what evidence do we have that enough people actually want reproducible installs to make it worth having a standardised lockfile whose main (sole?) purpose is to provide them?

In particular, there’s already a conversation starting about a follow-up proposal for adding sdist support. I’m not particularly happy about the possibility that PEP 665 can’t stand on its own merits and is merely a starting point for adding sdists. We have enough PEPs that have been approved but no-one is working on implementing them, that I don’t want to add another one to that list…

I know of enterprise users who solve this problem by having a PyPI mirror because pip isn’t secure by default in terms of what it installs. They can’t trust users to use any other index but their own private one that only contains code that has been cleared for use (i.e. tightly control what dependencies may get pulled in).

I know @kushaldas greatly cares about reproducible installs for SecureDrop.

For me personally, I have to audit every call to pip in CI for work to make sure that every flag that is mentioned in PEP 665 -- A file format to list Python dependencies for reproducibility of an application | Python.org is used to avoid supply chain attacks.

Sorry, I explained badly. I wasn’t asking about reproducibility in its own right, I was asking about whether there was still sufficient demand for PEP 665, in its new form.

The original justification for PEP 665 was that “people want lockfiles” - this is pretty obvious, we have projects like pip-tools, poetry, pipenv, etc, all providing some form of lockfile functionality, so there’s clearly a need.

The complexities of supporting sdists in line with the goal of reproducibility meant that the PEP dropped the idea of supporting them, and focused solely on wheels. That’s fine, but it means that many of the users who we’d previously assumed would benefit from PEP 665 will no longer be able to use it. What proportion? I don’t know, and that’s essentially what I’m asking.

We can’t assume that people using a PyPI mirror will be able to use the new PEP, as they will be mirroring sdists as well, and may well need to install them. That applies to any cases of people concerned about supply chain attacks - it’s possible to mitigate the risk while still using sdists, and it’s entirely possible that solutions which block sdists won’t be acceptable in all cases.

OK, let’s put the question to an informal poll:

  • PEP 665 without wheel support is sufficient for my use cases
  • I won’t be able to use PEP 665 until sdist support is added
  • I will use PEP 665 without sdists, but I will need to handle sdists manually for my workflow
  • I don’t need PEP 665, I use existing solutions and am happy with them
  • Lock files don’t matter to me, and/or I have no opinion

0 voters

One major problem with relying on votes is that “people following the PEP discussion” isn’t a particularly representative group. If someone wants to reach out to the wider community for feedback, that would be helpful here.

I just noticed, I inadvertently worded the “PEP 665 is sufficient” option as “PEP 665 without wheel support is sufficient for my use cases” Aargh - and I can’t edit the poll now :slightly_frowning_face:

I hope it was obvious to everyone that I meant “PEP 665 with wheel support is sufficient for my use cases”. If anyone wants to change their vote, please comment. Or if people think this is sufficient of a mistake that I should restart the poll, then I’m happy to do that too.

Sorry for the error.