Structured, Exchangeable lock file format (requirements.txt 2.0?)

This is mentioned during an exchange on Twitter with @brettcannon (which later spun into the Big Picture thread). I figure I should probably open a dedicated thread about this so more people know about it, and to summarise my ideas.

The situation

Python recently saw the emergence of several project management tools that deal with dependency management as part of their features. The most known are Pipenv, Poetry. These tools operate around the idea of a lock file that can be used to replicate dependencies across environments, and provide tools for the user to easily update/generate it.

The problem is that each tool has different ideas regarding the lock file format. These independently-created lock files are similar in nature, but different in structure and format, and creates a sort of vendor lock-in.

The idea

Given that lock files generated from tools contain more or less the same information, it is possible to create a universal lock file structure, so lock files generated by any tool can be used to bootstrap an environment with others, including pip.

Considerations

  • Requirement installation via pip can be manifested via a Requirement File format (aka requirements.txt), but it is not enough.
  • It should be able to group requirements into different optional sections, similar to setup.py’s install_requires and extras_require.
  • The file is expected to be machine-generated, but possible for user to manually inspect and modify.
  • The structure should represent a directed acyclic graph. (why cycles are allowed)
1 Like

I personally am of opinion that we should not have a such file format yet. Such file would be highly application specific, and mostly aimed at applications at best rather than library development. (e.g. I would like to have separate lint/format/tester/run environment both for my applications and libraries). pipenv e.g. though somehow thinks a single environment is all you need, which I’m not on board with yet.

Maybe we do or maybe we don’t. But I feel like we’re rushing ahead. First we would need to clearly define how source trees/distributions can tell their requirements (both install and extras) to frontends/tools, which we still lack as of today.

So I would propose let’s start with a PEP that defines how a build backend provides this information. We could extend for this the backend API defined under https://www.python.org/dev/peps/pep-0517/. We can/should probably do something that builds on what we already have for wheels, see https://www.python.org/dev/peps/pep-0566/#id11.

Now once we agreed how to provide library requirement information in general we can start iterating on how to provide it for an application that needs to merge/pin together essentially a list of these.

@pf_moore @pradyunsg @pganssle @dstufft @ncoghlan @steve.dower @dustin are probably a few of the people who should be involved in this

1 Like

Indeed this is very application-specific, and I would even say this is completely aimed at applications, and has little to know use for library development (except for maybe to set up a test runner and build docs reliably).

I feel that we are kind of talking about different things, however. While it is indeed important to figure out a way how distributions to communicate their requirements, this file is focused on how those requirements, once provided, should be stored. We already have specifications on how the requirements would look like once provided (in PEP 508 et al.), so one is not necessarily blocked by the other.

While it is the most simple to work on one thing at a time, the reality is that developers still have applications to develop while we speak, and they are already figuring out ways to pin things in the wild, and currently the effort is not organised particularly well together. I feel it would be more beneficial if we can offer things we can, and tell people this is how the resulting list should look like, and let them do their things. And who knows, maybe someone would have already figured it out when we come back.

FYI, I disagree. I desperately want usable locking for the libraries I develop. When it comes to library development, the problem with pipenv’s locking isn’t that it does too much, it’s that its not enough. E.g., one library I work on has install_requires (constrained but not locked), then those are also manually copy-pasted into a test-requirements.in and a rtd-requirements.in with extra things added, and then we use pip-compile + dependabot to compile those into lock files. And then I use the test-requirements.txt lock file repeatedly to create multiple environments with different Python versions. And really this still isn’t enough lockfiles – when that project switches to black for formatting then we’ll need to add another one (because black requires python 3.6, but we still have 3.5 in our CI matrix, so we’ll need separate lock files for running black and for running tests).

Hey @techalchemy, remember how you asked me like 6 months ago on distutils-sig why pipenv’s hard-coded two environments aren’t enough, and I never answered? Better late than never I guess :slight_smile:

The pain of managing locking in library development is a major part of my motivation for this writeup: Mailman 3 [Distutils] Notes from python core sprint on workflow tooling - Distutils-SIG - python.org

I do agree though that we might want to wait to see how things develop before trying to standardize locking. Like, I think what I want is a conventional place to store install_requires (and probably build-system.requires too), and then to store their locks, and then also an arbitrary number of additional environments, and their locks, plus some way to incorporate the install_requires into those environments… it’s pretty complicated actually :slight_smile:

I guess I underestimated the usefulness of the proposal, then :grin:

I am under the impression we might not need to go that far in one shot, though. Like pyproject.toml, we can start with the absolute minimal case “given an environment, how do I specify sets of locked dependencies for a tool to install into it” and work from there. As long as we keep the format well structured and extendable (like pyproject.toml, unlike requirements.txt), new things can be added as needed.

1 Like

How is that different from pipfile.lock pipenv uses nowadays? I would rather solve the general problem rather than lock ourselves in into a bad format that would solve the issue for a small segment of the users.

1 Like

Has each project using such a file documented their format? Can someone provide links to those formats so people can compare and get a sense? At the least it seems like one could see how easy it would be to combine just those formats while letting each project maintain their existing functionality.

2 Likes

I think in case of poetry the lock file is considered in all cases an implementation detail and not really documented other than what the current code state has, plus backward compatibility https://github.com/sdispater/poetry/blob/master/poetry.lock.
For pipenv you have https://github.com/pypa/pipfile and https://pipenv.readthedocs.io/en/latest/basics/#example-pipfile-lock.

1 Like

Not much, really. My main intention behind this is not to introduce things for unsolved problems, but to standardise how the problem is being solved, so the solution can be shared by different tools to make them more useful for more people. For example, SaaS won’t need to implement both Poetry and Pipenv adaptors so they can know how to bootstrap a website; one adapter can be used to install stuff no matter what tool was used to generate that lock file.

1 Like

Unless I’m misunderstanding you, I don’t think pip has a problem with cycles in dependency graphs. To demonstrate, here’s a gist.

Putting each of those setup.py files in their own folders and adding a minimal pyproject.toml to each one, I did:

$ python -m pep517.build --binary pkga --out-dir $(realpath wheels)
...
$ python -m pep517.build --binary pkgb --out-dir $(realpath wheels)
...
$ python -m pep517.build --binary pkgc --out-dir $(realpath wheels)
...
$ pip install --no-index -f wheels -t tmp pkgc
Looking in links: wheels
Collecting pkgc
Collecting pkga (from pkgc)
Collecting pkgb (from pkga->pkgc)
Installing collected packages: pkgb, pkga, pkgc
Successfully installed pkga-0.1.0 pkgb-0.1.0 pkgc-0.1.0

Lock files shouldn’t require any dependency resolution to use, so realistically it could just be a list of file URLs direct to wheels or sdists (I think we consider non-packaged sources to be “unlocked”, right?)

The more interesting part is dependency resolution from constraints, particularly as we start seeing more “suitable” packages in some contexts. For example, being able to prefer either a GPU library or a CPU one based on current hardware, or depend on any one of a set of packages (whichever is easiest to install or satisfy other requirements). Automatically assembling an environment is going to require this flexibility very soon, if it doesn’t already.

Just in terms of definitions, I think it makes a lot of things easier to talk about “dependencies to be installed so I can use the library” as “application”, despite the slight mismatch in terminology. I consider all of your examples after this point to be application, because you are resolving the constraints in order to apply them to an install.

People already assume that “library dependencies” are the constraints that a particular library will bring to your personal dependency resolution task (a.k.a. install requires). And it took a long time to get there, so let’s not spoil it now.

This is totally unrelated to certain tools defaulting to restrictive naming schemes :slight_smile: Those are just features on the tool that haven’t been implemented yet, or a signal that your case requires a different tool.

You are correct, I don’t know where I got the impression that pip can’t resolve this. I guess the requirement is not needed, but maybe @pradyunsg can provide some insights on the upcoming new resolver to decide whether it’d help things to keep the acyclic part.

There are other things involved when deciding where to download a specific package (PyPI mirrors to improve download speed, for example), so these need to be references instead. The actual URL can only be determined at installation time, and the reference should contain metadata (hashes) to ensure what is found then matches the lock file’s expectation. :slight_smile:

pip’s documentation somewhere notes something along the line of “pip can handle cycles in a dependency graph, breaking at an arbitrary point in the cycle”. The new resolver will need to handle this situation in a manner compatible with that.

edit

Found 2 places with that:

1 Like

@pradyunsg For this magic resolver we’ll need source trees/distributions to communicate their dependencies upfront, not? (what I hinted at my first post above)

Well, it’s not necessary. We can build a wheel and extract dependency information out of it, FWIW.

But yes, having a direct way to get that information will save unnecessary work. IIRC, PEP 517 did have an optional hook to just give the final list of install-time dependencies, but it was decided to remove it to avoid adding a hook to the PEP that won’t be used until an actual resolver exists.

I would say let’s try to push for this, even tox would want to have this to be able to detect when it needs to re-create/update envs, and I would guess poetry/pipenv would also benefit. Building a wheel can be very-very expensive (especially for C-extension applications).

1 Like

I kind of feel like the lock file is something that makes sense to be tool specific, I don’t think it’s an unreasonable burden to say that switching tools like that at a per project level would require re-resolving the initial set of dependencies once to switch to that lockfile syntax. You’re likely going to need to switch to update the Pipfile or [tool.poetry] section anyways.

So the only real benefit here, in my opinion, is that maybe something like Heroku can learn to deploy from only that lock file? But it looks like the pipenv integration in Heroku will also work with just a Pipfile instead of a Pipfile.lock, and I expect other providers would do the same. So I don’t feel like there’s a huge win to standardizing the lock format.

I think this is the real advantage to a standardized lock file, and the reason why a given service (say, Google App Engine or Google Cloud Functions) doesn’t currently support Pipfile/Pipfile.lock. As long as the lockfile is an implementation detail of a given tool, service providers not going to build support for N different lockfiles.

Great to hear that Heroku works with just a Pipfile, but I feel like if you’re using a Pipfile without a lockfile, you’re missing out on the most valuable feature (fully specified, deterministic dependencies), so what’s the point?

3 Likes

This is actually a key motivator for me as well. I can’t go to projects in Azure and say “standardize on Pipfile.lock” since it’s tool-specific, it isn’t a standard, and it leaves e.g. Poetry out who has said they could consider switching to a standard. And asking projects to standardize on requirements.txt is fine, but it isn’t the same thing unless you are very explicit about using pip freeze > requirements.txt. Plus this all assumes you froze into requirements.txt instead of e.g. requirements.lock.

IOW there is a ton of convention right now around the concept or providing an explicit list of packages you want installed which makes tooling and universal support flat-out difficult.

1 Like