Structured, Exchangeable lock file format (requirements.txt 2.0?)

Not much, really. My main intention behind this is not to introduce things for unsolved problems, but to standardise how the problem is being solved, so the solution can be shared by different tools to make them more useful for more people. For example, SaaS won’t need to implement both Poetry and Pipenv adaptors so they can know how to bootstrap a website; one adapter can be used to install stuff no matter what tool was used to generate that lock file.

1 Like

Unless I’m misunderstanding you, I don’t think pip has a problem with cycles in dependency graphs. To demonstrate, here’s a gist.

Putting each of those setup.py files in their own folders and adding a minimal pyproject.toml to each one, I did:

$ python -m pep517.build --binary pkga --out-dir $(realpath wheels)
...
$ python -m pep517.build --binary pkgb --out-dir $(realpath wheels)
...
$ python -m pep517.build --binary pkgc --out-dir $(realpath wheels)
...
$ pip install --no-index -f wheels -t tmp pkgc
Looking in links: wheels
Collecting pkgc
Collecting pkga (from pkgc)
Collecting pkgb (from pkga->pkgc)
Installing collected packages: pkgb, pkga, pkgc
Successfully installed pkga-0.1.0 pkgb-0.1.0 pkgc-0.1.0

Lock files shouldn’t require any dependency resolution to use, so realistically it could just be a list of file URLs direct to wheels or sdists (I think we consider non-packaged sources to be “unlocked”, right?)

The more interesting part is dependency resolution from constraints, particularly as we start seeing more “suitable” packages in some contexts. For example, being able to prefer either a GPU library or a CPU one based on current hardware, or depend on any one of a set of packages (whichever is easiest to install or satisfy other requirements). Automatically assembling an environment is going to require this flexibility very soon, if it doesn’t already.

Just in terms of definitions, I think it makes a lot of things easier to talk about “dependencies to be installed so I can use the library” as “application”, despite the slight mismatch in terminology. I consider all of your examples after this point to be application, because you are resolving the constraints in order to apply them to an install.

People already assume that “library dependencies” are the constraints that a particular library will bring to your personal dependency resolution task (a.k.a. install requires). And it took a long time to get there, so let’s not spoil it now.

This is totally unrelated to certain tools defaulting to restrictive naming schemes :slight_smile: Those are just features on the tool that haven’t been implemented yet, or a signal that your case requires a different tool.

You are correct, I don’t know where I got the impression that pip can’t resolve this. I guess the requirement is not needed, but maybe @pradyunsg can provide some insights on the upcoming new resolver to decide whether it’d help things to keep the acyclic part.

There are other things involved when deciding where to download a specific package (PyPI mirrors to improve download speed, for example), so these need to be references instead. The actual URL can only be determined at installation time, and the reference should contain metadata (hashes) to ensure what is found then matches the lock file’s expectation. :slight_smile:

pip’s documentation somewhere notes something along the line of “pip can handle cycles in a dependency graph, breaking at an arbitrary point in the cycle”. The new resolver will need to handle this situation in a manner compatible with that.

edit

Found 2 places with that:

1 Like

@pradyunsg For this magic resolver we’ll need source trees/distributions to communicate their dependencies upfront, not? (what I hinted at my first post above)

Well, it’s not necessary. We can build a wheel and extract dependency information out of it, FWIW.

But yes, having a direct way to get that information will save unnecessary work. IIRC, PEP 517 did have an optional hook to just give the final list of install-time dependencies, but it was decided to remove it to avoid adding a hook to the PEP that won’t be used until an actual resolver exists.

I would say let’s try to push for this, even tox would want to have this to be able to detect when it needs to re-create/update envs, and I would guess poetry/pipenv would also benefit. Building a wheel can be very-very expensive (especially for C-extension applications).

1 Like

I kind of feel like the lock file is something that makes sense to be tool specific, I don’t think it’s an unreasonable burden to say that switching tools like that at a per project level would require re-resolving the initial set of dependencies once to switch to that lockfile syntax. You’re likely going to need to switch to update the Pipfile or [tool.poetry] section anyways.

So the only real benefit here, in my opinion, is that maybe something like Heroku can learn to deploy from only that lock file? But it looks like the pipenv integration in Heroku will also work with just a Pipfile instead of a Pipfile.lock, and I expect other providers would do the same. So I don’t feel like there’s a huge win to standardizing the lock format.

I think this is the real advantage to a standardized lock file, and the reason why a given service (say, Google App Engine or Google Cloud Functions) doesn’t currently support Pipfile/Pipfile.lock. As long as the lockfile is an implementation detail of a given tool, service providers not going to build support for N different lockfiles.

Great to hear that Heroku works with just a Pipfile, but I feel like if you’re using a Pipfile without a lockfile, you’re missing out on the most valuable feature (fully specified, deterministic dependencies), so what’s the point?

3 Likes

This is actually a key motivator for me as well. I can’t go to projects in Azure and say “standardize on Pipfile.lock” since it’s tool-specific, it isn’t a standard, and it leaves e.g. Poetry out who has said they could consider switching to a standard. And asking projects to standardize on requirements.txt is fine, but it isn’t the same thing unless you are very explicit about using pip freeze > requirements.txt. Plus this all assumes you froze into requirements.txt instead of e.g. requirements.lock.

IOW there is a ton of convention right now around the concept or providing an explicit list of packages you want installed which makes tooling and universal support flat-out difficult.

1 Like

So that sounds like to me what you need are lock files which are scoped like wheel files which is an idea I’ve had in my head for this. That way you know that the lock file is specified for a specific version of Python for an ABI and platform as appropriate.

It sounds more to me like “distributing a Python application” (e.g. rtd, black) isn’t easy enough.

To install and use one (which is all Nathaniel is trying to do here) you make yourself into a system integrator, carefully locking the dependencies and runtime versions of each tool and setting up an independent venv, because that’s the best we have to offer for apps.

I assume if it looked more like “apt install black && black” then it’d be fine (because in this case, someone else has been the system integrator and you get to reap the benefits of their work).

1 Like

It’s true that distributing Python applications should be easier. And @ambv has some choice words to say on the matter :-). But in my case, it’s a little more subtle: my goal is that contributors can easily run the same black that I do, and apt install black doesn’t help with that.

Like, hopefully black’s output doesn’t change from version to version that much (in fact this is an explicit project goal), but I assume that sooner or later they’ll fix some bug or another that affects the output. And then anyone who’s using the wrong version of black will be locked out from contributing to my library, because our CI checks that their formatting matches what we expect. (In fact we currently use yapf, and this is a real problem, because yapf regularly tweaks their formatting from release to release, and there’s even a bug where it produces different output depending on which version of the Python interpreter you use to run it.)

Or maybe a better example is running pytest: yeah, sure in some sense pytest is an app that you could apt install, but in practice when running tests I need to control the pytest version, which versions of which pytest-* plugin packages are installed, and which plugins aren’t installed, since variation in any of those things can and does create spurious test failures.

1 Like

Absolutely. It is fine for services to support a manifest (e.g. Pipfile) without locking, but that should be treated as an exception. Most people using these services (CI, websites, etc.) only wants to (and should!) install from lock, and a universal lock would enable services to provide support for this normal use case (installing from lock) without committing to a specific tool.

1 Like

Sure, but now you’re arguing specific tools (apt) rather than the problem. People collaborating via Excel also want the same versions (or trust that it produces the same output, which as you point out is guaranteed to fail eventually).

The problem isn’t Python packaging here but app packaging. If Python generated statically-linked standalone executables, and Pytest had a plugin model that didn’t overlap with its own installation, this wouldn’t even have come up. It’s only because we’ve been conflating apps and their development environment for so long that “dev+test” requires six virtual environments rather than two environments and four apps.

And it is possible to do apps - the Azure CLI is a Python app that installs extensions using pip/wheels, but it’s packaged in a way that people don’t have to think about that (I linked the install instructions). And they have a regular looking dev environment, so it’s not like they’ve fallen far from any other Python project, apart from having enough resourcing to invest in making the app-ness work.

I think we need these categories to be able to sensibly talk about workflows. We’re lucky that Python can easily be used to write apps libraries, plugins, and more, but trying to treat them all the same doesn’t help us make all of them better.

Sorry, Steve, I don’t understand what you’re saying at all.

My point is that project-specific apps like this need to be chosen per-project and pinned. So they have to be managed by a project management and pinning tool like pipenv, not a generic, project-oblivious, app installer like apt. It’s not a problem with apt per se; it’s a fundamentally different model.

Sphinx is another example of an app that requires a complex, project-specific environment. It frequently breaks backcompat in new releases, it has a complex plugin architecture, and in many projects it has to import the project code while running (to read docstrings). Also, we need to be able to set up this environment on RTD, which has limitations that make it hard to blindly re-use a generic “dev” environment. (E.g., it’s slow to update to new python versions, it sometimes requires specific package versions.)

So I think we’re agreeing on the things that you’re saying, and (perhaps) disagreeing on the things that you aren’t (or are implying by naming specific tools).

Perhaps I can phrase this as two hypotheses (where I believe the first is true and the second is false):

Project-specific apps that are written in Python should be treated the same way as project-specific apps that are not written in Python.

Project-specific apps that are written in Python should be treated the same way as the project

Your original post suggested that a failing of a single environment lock file is that it can’t handle application environments, which suggests you’re trying to treat them in the same way as your project.

My contention is that you ought to be treating them as any other app (e.g. say you need a particular version of gcc), in which case the single environment lock file is not at fault - the fact that we don’t have a good way to treat them as a totally standalone app separated from your development environment is at fault.

(None of this is meant as a criticism of the way you’re developing your project, by the way - mine all look much the same. You just happened to bring up what I see as a good example of one of the impedance mismatches Python has right now.)

It’s much like all the discussions we had about setup.py vs. requirements.txt: if someone knows what category they are in, then we can tell them which one to use. But there’s no single answer that applies for all scenarios, and there are endless ways to “solve” a scenario by [mis]using the wrong tool. I believe once we have a categorization for “build tool” distinct from “build dependency”, we can start designing tools that properly target each of these (and some of this has been going on with the PEP 517 discussions already, so I don’t think I’m totally off in la-la-land here).

I have very little stake in this discussion (the work I do doesn’t tend to hit this type of problem) but this statement resonates strongly with me1.

To put it another way, if a project depends on a particular version of black, why would it matter whether black were written in Python or in (say) rust?

Of course, given that I know of no good tools for setting up a project environment with the right versions of development tools like gcc (or black), I can see how it might be convenient to use Python’s dependency management to fill that gap for whatever part of your toolset is written in Python.

1 Actually, on reflection, that’s not true. I do have a stake in this, but it’s not on the dependency management side. For me, what’s the most frustrating thing is that Python development tools like black, pyflakes, mypy, tox, pytest, … are not available as standalone applications, but have to be installed in an environment somewhere. There’s no good reason for that (and a number of problems that stem from it) in my experience.