Structured, Exchangeable lock file format (requirements.txt 2.0?)

brettcannon · February 13, 2019, 12:23am

So that sounds like to me what you need are lock files which are scoped like wheel files which is an idea I’ve had in my head for this. That way you know that the lock file is specified for a specific version of Python for an ABI and platform as appropriate.

steve.dower · February 13, 2019, 2:23am

It sounds more to me like “distributing a Python application” (e.g. rtd, black) isn’t easy enough.

To install and use one (which is all Nathaniel is trying to do here) you make yourself into a system integrator, carefully locking the dependencies and runtime versions of each tool and setting up an independent venv, because that’s the best we have to offer for apps.

I assume if it looked more like “apt install black && black” then it’d be fine (because in this case, someone else has been the system integrator and you get to reap the benefits of their work).

njs · February 13, 2019, 3:04am

It’s true that distributing Python applications should be easier. And @ambv has some choice words to say on the matter :-). But in my case, it’s a little more subtle: my goal is that contributors can easily run the same black that I do, and apt install black doesn’t help with that.

Like, hopefully black’s output doesn’t change from version to version that much (in fact this is an explicit project goal), but I assume that sooner or later they’ll fix some bug or another that affects the output. And then anyone who’s using the wrong version of black will be locked out from contributing to my library, because our CI checks that their formatting matches what we expect. (In fact we currently use yapf, and this is a real problem, because yapf regularly tweaks their formatting from release to release, and there’s even a bug where it produces different output depending on which version of the Python interpreter you use to run it.)

Or maybe a better example is running pytest: yeah, sure in some sense pytest is an app that you could apt install, but in practice when running tests I need to control the pytest version, which versions of which pytest-* plugin packages are installed, and which plugins aren’t installed, since variation in any of those things can and does create spurious test failures.

uranusjr · February 13, 2019, 7:34am

Absolutely. It is fine for services to support a manifest (e.g. Pipfile) without locking, but that should be treated as an exception. Most people using these services (CI, websites, etc.) only wants to (and should!) install from lock, and a universal lock would enable services to provide support for this normal use case (installing from lock) without committing to a specific tool.

steve.dower · February 13, 2019, 2:47pm

Sure, but now you’re arguing specific tools (apt) rather than the problem. People collaborating via Excel also want the same versions (or trust that it produces the same output, which as you point out is guaranteed to fail eventually).

The problem isn’t Python packaging here but app packaging. If Python generated statically-linked standalone executables, and Pytest had a plugin model that didn’t overlap with its own installation, this wouldn’t even have come up. It’s only because we’ve been conflating apps and their development environment for so long that “dev+test” requires six virtual environments rather than two environments and four apps.

And it is possible to do apps - the Azure CLI is a Python app that installs extensions using pip/wheels, but it’s packaged in a way that people don’t have to think about that (I linked the install instructions). And they have a regular looking dev environment, so it’s not like they’ve fallen far from any other Python project, apart from having enough resourcing to invest in making the app-ness work.

I think we need these categories to be able to sensibly talk about workflows. We’re lucky that Python can easily be used to write apps libraries, plugins, and more, but trying to treat them all the same doesn’t help us make all of them better.

njs · February 13, 2019, 6:37pm

Sorry, Steve, I don’t understand what you’re saying at all.

My point is that project-specific apps like this need to be chosen per-project and pinned. So they have to be managed by a project management and pinning tool like pipenv, not a generic, project-oblivious, app installer like apt. It’s not a problem with apt per se; it’s a fundamentally different model.

Sphinx is another example of an app that requires a complex, project-specific environment. It frequently breaks backcompat in new releases, it has a complex plugin architecture, and in many projects it has to import the project code while running (to read docstrings). Also, we need to be able to set up this environment on RTD, which has limitations that make it hard to blindly re-use a generic “dev” environment. (E.g., it’s slow to update to new python versions, it sometimes requires specific package versions.)

steve.dower · February 13, 2019, 6:49pm

So I think we’re agreeing on the things that you’re saying, and (perhaps) disagreeing on the things that you aren’t (or are implying by naming specific tools).

Perhaps I can phrase this as two hypotheses (where I believe the first is true and the second is false):

Project-specific apps that are written in Python should be treated the same way as project-specific apps that are not written in Python.

Project-specific apps that are written in Python should be treated the same way as the project

Your original post suggested that a failing of a single environment lock file is that it can’t handle application environments, which suggests you’re trying to treat them in the same way as your project.

My contention is that you ought to be treating them as any other app (e.g. say you need a particular version of gcc), in which case the single environment lock file is not at fault - the fact that we don’t have a good way to treat them as a totally standalone app separated from your development environment is at fault.

(None of this is meant as a criticism of the way you’re developing your project, by the way - mine all look much the same. You just happened to bring up what I see as a good example of one of the impedance mismatches Python has right now.)

It’s much like all the discussions we had about setup.py vs. requirements.txt: if someone knows what category they are in, then we can tell them which one to use. But there’s no single answer that applies for all scenarios, and there are endless ways to “solve” a scenario by [mis]using the wrong tool. I believe once we have a categorization for “build tool” distinct from “build dependency”, we can start designing tools that properly target each of these (and some of this has been going on with the PEP 517 discussions already, so I don’t think I’m totally off in la-la-land here).

pf_moore · February 13, 2019, 7:58pm

I have very little stake in this discussion (the work I do doesn’t tend to hit this type of problem) but this statement resonates strongly with me¹.

To put it another way, if a project depends on a particular version of black, why would it matter whether black were written in Python or in (say) rust?

Of course, given that I know of no good tools for setting up a project environment with the right versions of development tools like gcc (or black), I can see how it might be convenient to use Python’s dependency management to fill that gap for whatever part of your toolset is written in Python.

¹ Actually, on reflection, that’s not true. I do have a stake in this, but it’s not on the dependency management side. For me, what’s the most frustrating thing is that Python development tools like black, pyflakes, mypy, tox, pytest, … are not available as standalone applications, but have to be installed in an environment somewhere. There’s no good reason for that (and a number of problems that stem from it) in my experience.

bernatgabor · February 13, 2019, 9:19pm

Maybe we need a zipapp that includes the interpreter itself, but how does one do that cross platform?

steve.dower · February 13, 2019, 9:33pm

Exactly. And doing it cross platform is the heart of the problem. Briefcase is the best I’m aware of, but even that has limitations and problems (at least on Windows, where I don’t think they can be solved by simple bugfixes) (aside: BeeWare already have a project categorization kind of like what I’ve been talking about)

(I have many more thoughts on this topic, which I’ll save for another thread if/when we start one here. bpo-22213 is where we’re at right now.)

uranusjr · February 14, 2019, 3:26am

I feel that not all apps are created equal. On one hand we have tools like Black etc. (even pip) that are only useful as an executable, and PyPI isn’t really the best way to distribute it, but on the other there are tools that sit in the middle, e.g. Pytest, that need to work both as a command and a library (I assume most Pytest users import pytest—at least I do). Those are still best managed by the project-level lock file IMO.

Would it be a good idea to split standalone app discussions into a dedicated thread? I feel that it is an important topic (and want to participate in it as well), but not mutually exclusive with the lock file.

steve.dower · February 14, 2019, 5:29am

Yeah, definitely a separate topic to go deeper into. The relevance here was identifying what I consider a non-problem for lock files (having one for each app used by a project) - in generalizing a format across tools, I don’t think we need to account for that.

techalchemy · February 15, 2019, 3:28am

I think there is a clear need that isn’t being met by requirements files alone, and I think that is demonstrated by tool adoption in python and in other languages. Dependency resolution and environment reproduction, whether it is application specific or whether it is being used by a library developer to determine if the specified constraints are even valid or need to be narrowed, is obviously meeting some important needs (like avoiding conflicts or broken environments).

IMO the question is whether it’s important to standardize on lockfiles specifically – which I’m not personally convinced of I guess… I don’t really have a problem with it but I don’t see a pressing need either

njs · February 15, 2019, 5:13am

I think we’re talking past each other here. Let me expand a bit, to explain how I see these different pieces fitting together. (This is partly inspired by these notes that some of you have seen before.)

Let’s assume we have some concept of an “isolated environment”. You know, the kind of thing you can install stuff into, and then later run the stuff, and it doesn’t interfere with the rest of your system. Maybe it’s a virtualenv, maybe it’s a conda environment, maybe it’s, I don’t know, a docker container using apt to manage packages. Whatever. But let’s say we have a system for describing environments like this, what’s installed into them (packages and versions), commands to run using these environments, and ways to store these descriptions alongside a project, and a smooth user interface for all of that.

This is really useful to a whole set of different users:

It gives beginners a simple way to run their scripts, or the python REPL, or jupyter, in an environment that they can control, and where it’s easy to install third-party libraries like requests without the problems caused by sudo pip.
It gives application developers a way to describe their dev and production environments, and share them across the team, share them deployment services like heroku, etc.
It gives library developers like me a way to describe different test environments, associated services like RTD, tooling that new contributors need, etc.

Notice that everything I’ve said is true regardless of how the applications are packaged – if I can download Black as a single-file standalone binary, then that’s great for a lot of reasons, but in this context I still want a tool that can pick the correct version of that standalone binary and drop it into an isolated environment. Also, everything I said so far applies the same regardless of what kind of environments we’re talking about, whether it’s virtualenvs or conda or whatever. Installing a specific version of gcc into an isolated environment? Sure, conceptually it makes total sense. (And conda users actually do this all the time.)

But then on top of this core idea of course any particular implementation has to make some choices, and these add extra constraints and complications.

Digression: The core difference between pip and conda, is that pip knows how to talk about PyPI packages, and conda knows how to talk about conda packages. This sounds inane when I write it, but it’s actually a deep and subtle issue. They have two different namespaces; they use the same words to mean different things: to pip, the string "numpy" means “the package at https://pypi.org/project/numpy, and the equivalent in other channels that share the pypi namespace”. To conda, that same string "numpy" means "the package in the conda channels under the name "numpy"". Which in this case is the same software, but our tooling doesn’t know that. Another example: to pip, the string "gulp" means “a decorator to make debugging easier”, and to conda it means a javascript build system. These incommensurable namespaces are why there’s no way for wheels to declare dependencies on conda packages, or vice-versa, and why using pip and conda in the same environment screws everything up. Both sides find the resulting environment literally impossible to describe. They’re each missing some of the vocabulary they’d need.

So back to the core idea of pinning and project-specific environments. One way to implement it would be to make our isolated environments virtualenvs. That’s the natural thing if your environment descriptions are written using the PyPI package namespace. And if you’re, say, developing a library to be uploaded to PyPI, then this is a very convenient namespace to use, because (1) your project’s own dependencies have to be expressed in this namespace, and you want to re-use them in the environment descriptions, and (2) it means you can easily talk about all the versions of all the packages on PyPI.

Another way to implement the core idea would be to make the isolated environments be conda environments. This would be super awkward for me, since I write libraries that get uploaded to PyPI, and so I’d have to hand-maintain some mapping between my PyPI-namespace dependencies and my conda-namespace dependencies. For our other hypothetical users though – the beginners, the application developers – it’s really going to depend on the specific user whether a virtualenv-based or conda-based approach is more useful. They have different sets of packages available, so it just depends on whether the particular packages that you happen to use are better supported by virtualenv or conda.

Now, the folks working on the tools that use the pypi namespace mostly don’t talk to the folks working on the tools that use the conda namespace. Which is unsurprising: in a very literal sense, the two sides don’t have a common language. So, by default, Conway’s law will kick in: the pypi namespace folks will implement a pinning/environment manager that uses the pypi namespace to describe environments, and that will certainly be a thing that helps a lot of us solve our problems. And the conda namespace folks will do whatever they decide to do, which will probably also help people solve slightly different problems. And that’s not a terrible outcome. More things to help people solve problems are good!

But… there’s also a third possibility we might want to think about. The “original sin” that causes all these problems is that PyPI and conda use different namespaces. What if we invented a new meta-namespace, that included both? So e.g. "pypi:gulp" would mean “what pypi calls gulp”, and "conda:gulp" would mean “what conda calls gulp”, and now we can use both vocabularies at the same time without namespace collisions. And then:

We could describe the state of hybrid environments, on disk or in lock files: “the packages in this environment are: pypi:requests == 2.19.1, conda:python == 3.7.2, …”
A sufficiently clever package manager could do things like: when someone requests to install pypi:scikit-learn from the package source https://pypi.org, then it downloads the wheel, and discovers that it has metadata saying Install-Requires: numpy. Since this is in a wheel, our package manager knows that this really means pypi:numpy. Next it checks its package database, and see that it already has a package called conda:numpy installed, and the conda:numpy package has some metadata saying that it Provides: pypi:numpy. Therefore, it concludes, conda:numpy can satisfy the this wheel’s dependency.
We could add wheel platform tags for conda, e.g. cp37-cp37m-conda_linux_x86_64. And then since we know this wheel only applies to conda, it would be fine if its metadata included direct dependencies on packages in the conda: namespace, like Install-Requires: conda:openssl==1.1.1.

CC: @pzwang

encukou · February 15, 2019, 12:33pm

Fedora does that with RPM. I put some details in a new topic:

pzwang · February 20, 2019, 3:51am

FYI I have not forgotten about this topic. It is pinned in a browser tab Thank you very much for starting the discussion, and I owe you a thoughtful reply; but I am desperately firefighting a few things right now.

pf_moore · February 20, 2019, 3:38pm

Did this ever happen? I feel like it did, but can’t find the thread now.

I’ve just hit this again, hard. VS Code has the immensely annoying default of expecting you to install pylint, black etc in every environment that you want to run code in. I know you can specify an explicit path to the tools, but to do that you need an exe somewhere - and that’s exactly the “standalone app discussions” issue that I’d like to discuss further. At the moment, I need to set up and manage some sort of “tools” virtualenv (or in practice a standalone copy of Python) and there’s nothing in the ecosystem to encourage the authors of tools like pylint and black to offer anything more user-friendly

scopatz · February 27, 2019, 5:15pm

For what it’s worth, conda environment files are documented here

dhirschfeld · March 11, 2019, 12:24pm

As someone who uses conda for both python and R dependencies I think it desperately needs the concept of namespaces.

There’s a proposal to add such but it hasn’t yet been a priority

gist.github.com

https://gist.github.com/mcg1969/da5aec380d2ed083b79ddcf151ca16f1

spaces.md

# Conda Proposal: namespaces

## Motivation

We would like to position Conda as a language-agnostic package manager, but at
present it maintains a distinct bias towards Python. Given its origins this was
expected and, frankly, reasonable. Nevertheless, as we begin to use it to subsume
other packaging ecosystems, such as CRAN, NPM, Ruby Gems, etc., we are going to
want to overcome this history; and one key challenge is to address naming conflicts
across platforms.

This file has been truncated. show original

I don’t think the proposal there would implement the concept of a meta-namespace as you describe but it might allow for the meta-namespace concept to be layered on top.

njs · April 6, 2019, 11:29pm

Hey Peter, any updates?