PEP 665, take 2 -- A file format to list Python dependencies for reproducibility of an application

Are you saying you view url only for the direct case?

Right, but what’s wrong with that? As long as that lock file doesn’t leave the control of the environment then there shouldn’t be an issue. This implies the lock file is being shipped with closed-source code, so the potential of the lock file getting out with private URLs is going to be the least worrisome issue if the entire code base leaked.

In my work environment I don’t think having an internal URL for internal code is problematic.

I don’t see why installers can’t be instructed to prefer hitting indexes instead of URLs and making the URL the fallback.

But this actually sounds more like a locker policy/option to not insert URLs unless absolutely necessary. And so I would assume it would be controlled at that point and not at the installer level.

I assume you mean by having someone speak up? Otherwise I’m not quite sure how to explore this specifically beyond asking Twitter for feedback from a wider audience.

1 Like

Hello! I have implemented something that looks a lot like this format in a proposed pip resolve command: metadata-only resolve with new `pip resolve` command! by cosmicexplorer · Pull Request #10748 · pypa/pip · GitHub until @pf_moore noted that the JSON output I was hacking together felt a lot like this PEP. I’m getting more familiar with the specification atm, but I will almost definitely be modifying the output of the pip resolve command to produce this file format instead. It also may not be called pip resolve; instead pip install --dry-run --report was proposed as an alternative which I think I like more.

3 Likes

I wanted to announce that the PEP has been rejected (@pf_moore messaged me privately about it). The reason for the rejection is the lack of sdist support was seemingly enough to cause a lukewarm reception overall to the PEP. I am now going to go think about what an opinionated lock file would look like from me and consider implementing it as part of my mousebender project.

Thanks to everyone who provided feedback! Thanks to @frostming @sdispater for providing early feedback. Extra thanks to @frostming for writing a proof-of-concept locker. And a special thanks to my co-authors, @pradyunsg and @uranusjr .

7 Likes

My thanks as well to everyone for their participation in the discussion here.

As Brett said, we had a discussion offline as to how to proceed with the PEP, and the conclusion was that it should be withdrawn/rejected.

This discussion has persuaded me that we definitely need a lockfile format that’s better than the current “pinned requirements file” approach[1], but I think we need a better understanding as a community of what we actually want in terms of functionality. The question of how and if we handle sdist support is the biggest issue, but a clearer understanding of use cases in general (many of which are “hidden” behind closed-source corporate environments) is also needed.

Thanks to the PEP authors, @brettcannon @uranusjr and @pradyunsg for their work on this, and I’m sorry we didn’t get the result you hoped for.


  1. Recent examples of supply chain issues with npm and similar demonstrate that locking is becoming far more critical. ↩︎

2 Likes

Hi, thanks for the update. What I’m not sure about is where next. If someone were to pick this up what would they do to get the ball rolling? You alluded to the sdist problem, but what about the rest? (I’m not sure how much corporate environment overview we can get here, from the communities POV)

PS. I’m personally sad that the PEP failed due to not supporting sdist. I was hoping we were taking an iterative approach as was the case with PEP-517 and then PEP-660. Start with wheel support, and let a future PEP extend/handle sdists. If this path is not feasible perhaps the rejection motivation should clearly state why not, so a future iteration can address it.

4 Likes

Honestly, I think the key think is for any new PEP to start from a very clear and well-defined list of scenarios that it is designed to support/address. This was where I felt PEP 665 fell down, it focused on reproducibility, rather than on specific use cases. And as a result, it was extremely hard to determine whether it intended to handle particular scenarios.

For example, I never really got a clear answer on whether the PEP was intended to support deployment of an app to a service like Heroku. In a broad sense, “obviously” it would, but when you get into details, suppose I had a dependency that’s not on PyPI as a wheel, how do I ship the combination of “lockfile + wheel” to Heroku in a way that works regardless of what lockfile installer Heroku chooses to use? That was the root of the “should we allow relative paths in the URL field” question, which was still open at the point where we decided to reject the PEP.

I think the reality is that someone is going to have to pick (or maybe even invent) some use cases that they believe are important, and write a PEP based on those, and be prepared to defend the PEP on the basis of those use cases. That would at least mean that readers could quickly see whether the PEP helps them in their particular situation (and then, if necessary, propose additional use cases that the PEP should add).

It wasn’t solely sdists. As I say, it was mostly a lack of clear definition of what use cases the PEP was trying to support, and “not all of our dependencies are distributed as wheels” is part of that.

The PEP was still (in my mind) perfectly viable when the scope was reduced to wheel only. In fact, on a personal note, I prefer not requiring all installers to be able to build from source. What finally tipped the scales, though, was the question of how to distribute a lockfile with a set of wheels, which is important for the use case where the locker hits some sdist-only packages, and wants to build wheels for them and provide those wheels with the lockfile. That problem is solveable in a number of ways, but we got stuck on the question of whether that is a problem the PEP even wanted to solve.

I’m still perfectly happy with that approach, as long as the initial (wheel support only) phase is clearly viable in its own right, and not simply a placeholder that no-one will use until sdist support gets added.

2 Likes

I’m not certain everyone saw @cosmicexplorer’s post above, but it seems clear to me that whatever comes next will surely be based on/utilize the functionality introduced in metadata-only resolve with `pip download --dry-run --report`! by cosmicexplorer · Pull Request #10748 · pypa/pip · GitHub

As someone who writes python at work and who was uninvolved with this PEP and who just found out about it being abandoned, I am just thoroughly disappointed in the python ecosystem for apparently being uniquely unable to adopt a standardized lock file format in over three decades, despite having a “standard” package repository. This, together with pyproject.toml peps, is arguably much more important to the python ecosystem than anything that was added to the python language since 2.7 combined.

Imho, it is single handedly a reason to avoid Python as much as possible in both small and large projects. While I can write python fairly quickly and it is one of my preferred languages, the time savings are usually eaten up by having to deal with dependency hell both immediately and over the lifetime of the project, so I am literally better off using Rust at work in many cases purely because of the latter having a sane deployment story thanks to Cargo.

So my user feedback is, please, please please prioritize this or something like this and do not bikeshed it. This is incredibly important. It does not need to be perfectly backwards compatible with every single piece of the python ecosystem, it just needs to be good enough to be adopted so that dependency hell eventually disappears and so that Python-the-ecosystem as opposed to Python-the-language becomes less horrible to deal with.

3 Likes

To be clear, I’m not sure how much of the concept of a “lock file” existed back in 1991 when Guido released Python for the first time. And PyPI didn’t exist until the early 2000s anyway. And even then, I still don’t know how widely the lock file concept was known or thought about at that point.

I’m still working on this issue, but honestly this sort of tone is not motivating. Please keep the tone positive and do not denigrate all the hard work everyone has put into this community, even if it isn’t perfect.

18 Likes

Is there an ecosystem that adopted a standardised lock file format? Those seemingly have only one lock file format are such because there is only one de facto tool for the ecosystem, while the format is still technically an implementation detail of the tool, and subject to change without notice. From what I know, Python is the first ecosystem that attempts to standardise such a format.

2 Likes

The great schism of the Python ecosystem: those involved in packaging think having multiple tool chains is a feature, while those outside think it’s a bug.

4 Likes

To be fair, plenty of those outside of packaging would also think it was a bug if it didn’t, to, because it would almost inevitably mean such a tool would likely only adequately be able to serve the large and varied range of requirements, use cases and application domains that the Python tools do as the most prominent “glue language” out there right now, and popular in a variety of very distinctdomains (web, science, scripting, AI, applications, CLI tools, sysadmin, education, etc).

Its hard to imagine a single tool being able to span all the way from making pure-Python library packaging as simple and trivial as possible for non-experts (Flit), to implementing novel PEP 582-inspired packaging concepts (PDM) to application packaging (Ppipenv/Poetry) to platform-specific deployment (Pyinstaller, Py2App/Exe) to building C, C++ and Fortran-compiled binaries and extensions with CMake (Scikit-Build) or Meson (Meson-Python), to building Rust-based Python packages (Maturin) to deploying WASM browser/NodeJS applications (Pyodide) to packaging for supercomputing clusters (Spack). Is there another modern ecosystem that spans this wide a gamut when it comes to the requirements of their packaging space?

There’s certainly a lot we can do to improve the situation, as is being discussed on various other more appropriate threads, and in an ideal world, the optimal number of tools to cover these use cases would be a good deal less than we have (and they would integrate more tightly, i.e. this proposal, and with clearer documentation making it clear which users should choose, etc.). However, given the requirements above, it’s hard to argue that optimal number could realistically be “one”.

4 Likes

I do not disagree with you,[1] and I wasn’t necessarily passing judgement on the situation. But from a user-facing perspective, when the very reasonable question “Why can we not have a lock file like so many other languages?” [2] is met with the answer “because we have N distinct ways to build and distribute packages”, that does not sound like such a good thing.

Again, I don’t dispute the fact that there are good reasons for the situation being what it is, but on topics like this, i.e. PEP 665, it always looks like Python is the one language where things are very hard that elsewhere are (appear?) very easy.


  1. Except maybe with your point that no one tool chain could ever handle everything from the very simple to the very complicated – why not? Every complication is, after all, optional. ↩︎

  2. I say “languages” because no one, realistically, makes a distinction between the language and its packaging ecosystem. ↩︎

2 Likes

If the ecosystem has only one tool to interact with lock files, it does not matter whether there’s only one format or not (in fact, most ecosystems have multiple formats and silently convert without you noticing). And for those ecosystems that do have multiple formats (e.g. Node), they never unify at all.

So the question I want to ask is, why is having one lock file format or not meaningful to you? The original intention of the PEP is toward exchanging between tools, which is somewhat unique to Python because other ecosystems do not have as many competing tools in the first place. Whether having multiple tools is a good thing or not is a topic you can argue on (I’m not taking a position here), but using that as the reason to complain about having multiple lock file formats does not make sense to me since those are orthogonal topics.

1 Like

Those with an understanding of the history of packaging remember when “setuptools is the only build system” was the bug that users were complaining about, while those arriving new to the scene think that “there are many build systems” is the only bug users have ever complained about.

:slightly_smiling_face:

In all seriousness, it’s all about finding an acceptable compromise. And the standards process is about allowing people to iterate on different parts of the puzzle independently. If a single “best of breed” solution exists, I hope we find it and it becomes the consensus solution. But until it does, I’m glad we have the freedom to experiment.

Remember - if the “we want one opinionated solution, not choices” philosophy worked, we would still have setuptools, eggs and easy_install as the packaging solution.

11 Likes

Could someone summarize the issues with this effort without me having to read and understand two years’ worth of discussion history?

I tried following it, and from what I gathered, objections regarding sdists and VCS checkouts basically amount to the fact that, even if the sources themselves can be verified, the compilation result might change due to differences in the compilation environment. Did I get that right? Did Poetry solve that problem? From the looks of it, it didn’t. Is this a showstopper problem for the PEP?

Yes, from I recall that is the main objection. Poetry, or other workflow tools (within or outside of Python), does not solve it either.

I think this problem is practically a showstopper for the PEP to be advertised for a reproducibility tool, unless without the community vastly change how reproducibility is considered. The PEP is not dead per se, but it would need to be “rebranded” under another sufficiently convincing use case, and I don’t have one now.

Fundamentally this is correct. It’s a symptom of a more underlying issue, though, which is that while we have consensus that “we need a lockfile standard”, we don’t appear to have consensus on what a lockfile is. Reproducibility was the key feature targetted by PEP 665, and in that context sdists are the showstopper. Other people were insisting that sdist support was mandatory, but they weren’t too concerned about (absolute) reproducibility, being more interested in having something that replicated the “fully pinned requirements file” model as far as I could tell.

Until we get consensus on what we mean in the first place when we say “we want a lockfile” I fear we’ll repeat this cycle.

1 Like

Ok, thanks. IIRC someone also suggested that wheels built from those sources could be used in such cases to calculate the necessary checksums. If someone wants to have such strict reproduceability guarantees, then I don’t really see any better alternative. If there’s someone with such a use case and they can’t live with that solution, they should suggest something else. We shouldn’t get hung up on this niche use case (niche as in nobody has produced tooling for this yet).

At any rate, I feel like users should be given a choice between varying levels of reproduceability, from simple version pinning (pip-tools), to relaxed checksumming based on sdists/vcs commit IDs/wheels, to strict guarantees (as discussed above, and which no tool provides yet).

1 Like

As I see it, the biggest difference between pip-tools style requirements files and poetry-style lock files is the presence of checksums for files. I’m not 100% sure that we even need pip-tools style simple pinning. If we implement poetry-style lockfiles, with additional (opt-in) guarantees regarding absolutely reproduceability, I don’t think anybody would be left missing anything, yes?

As for the definition of a lock file, what I’m thinking is something that can be used by an installer to deploy a consistent set of packages in a Python environment. The meaning of “consistent” would be that it installs the same exact packages every time, over time, on the same OS/architecture combination. Would this definition be agreeable?

1 Like