PEP 650: Specifying Installer Requirements for Python Projects

steve.dower · February 4, 2021, 9:53am

This is not the target scenario though, because these users wouldn’t be using a standardised lock file either.

In cases where a standardised lock file would make sense, this substitutes a non-standardised lock file and a standardised way to trigger the restore of the file.

rgommers · February 4, 2021, 10:42am

That’s not necessarily true, as I pointed out higher up in my answer people will do pip install -r requirements.txt (or dev-requirements.txt or similar) into existing environments. And you will make that more fragile through this PEP.

Again not true, you are making assumptions about the “correct” way of using lock files without that being explicitly stated anywhere. Most users do not understand this stuff, and you will make workflows that you consider “not the target scenario” worse.

This proposal is strictly worse than a standardized lock file would be, not just an alternative to it.

I can be convinced my concerns aren’t warranted, if the PEP would state it’s for applications in fresh environments only and explicitly discouraged projects like NumPy and others that are widely depended upon (and contributed to) to use it because it isn’t aimed at their workflows. But if you want it to replace all *-requirements.txt everywhere, that’s not a good idea.

pf_moore · February 4, 2021, 11:04am

I think these are legitimate concerns. The PEP seems to have a fairly specific set of workflows in mind, which is perfectly fine in principle (every PEP needs a scope) but the lack of clarity over what those workflows are, means that people will try to use the feature in inappropriate situations, and get in a mess.

This may be an appropriate thing to include in the “How to teach this” section (a standard PEP section that’s missing from PEP 650). Explaining when it’s appropriate to use this interface, and when it’s not, is clearly a non-trivial problem, just from the amount of discussion here, so documenting how the authors propose to address that is important. As an example, there’s a massive assumption throughout this discussion that people understand what a “lock file” is. I’m not sure that’s true - I suspect many people will presume it’s “like a requirements file” while not being clear where that analogy breaks down. I think that a lock file is equivalent to “a requirements file generated by pip freeze”, but as I almost never use pip freeze, because I find it doesn’t fit with my workflow, I could be wrong about that.

steve.dower · February 4, 2021, 2:57pm

Your decision to encourage/discourage only applies to NumPy contributors, e.g. those who clone your repo and have to install dev dependencies to build it locally. And if you don’t want to use it, that’s fine - just describe in your dev docs how to set up an environment (presumably with a recommendation as to which environment tool is going to work best or has “supported” requirements files included).

Users of NumPy, e.g. those who “pip install numpy”, are in a different space. If they’ve found/created a requirements file, they need to know how to use it, and this PEP offers a way to encode that information in a way that a tool can offer a single button to “just do it for me”.

But yeah, it could obviously be clearer in the text.

rgommers · February 6, 2021, 7:10pm

Then that single button should include environment creation. We’ve had to teach users for years, over and over, not to mix installers. Most importantly not mixing pip and conda - but it applies to any two installers. Now having one installer potentially invoking another installer and making the chance of mixing installers (silently, while the user only uses a single tool) higher is not really a good thing.

finswimmer · February 7, 2021, 12:40pm

IMO this is an implementation detail of the “universal installer” and not part of PEP 650, which describes only the API that can than be used by universal installer. I think it depends on the aim of the universal installer, if it would like to create a new venv or not.

steve.dower · February 7, 2021, 2:55pm

This is a very accurate point, but I think missing the intention of this proposal slightly.

I don’t think pip should be a universal installer. I think it should be explicitly stated in the PEP that any existing installers are probably not universal installers, nor are they likely to become one.

Virtualenv might become a universal installer. Tox probably should. GitHub Actions DEFINITELY ought to, and it sounds like VS Code will. These are better examples of what this is for.

None of them necessarily lead to mixing and matching installers any more than the current state of things, because there’s only one relevant pyproject.toml per environment for this, which implies only one inferred lockfile for one explicitly listed install tool.

This certainly shouldn’t be spreading the “trigger pip from conda” feature any further. It should spread “one environment per project” further, since tools will be able to make their one-click include that (if they want).

bernatgabor · February 7, 2021, 3:27pm

Wait, what I’m not following what being a universal installer means and why those tools maintain should implement that. Can someone explain with reference how virtualenv/tox uses pip/conda? Thanks!

pf_moore · February 7, 2021, 3:37pm

Hang on - earlier in this thread, we established that pip isn’t even an installer backend, in terms of the concepts defined in this PEP. It’s a bit of infrastructure that an installer backend calls to do the installation. (There was a suggestion that shipping the backend with pip might be reasonable, but that’s as far as it went). So who’s now suggesting pip is a universal installer?

@rgommers is saying that we’ve had a hard time teaching people to not mix installers. Expecting people to use a tool branded as a “universal installer” alongside “installer backends” and “real installers” (like pip) is just going to confuse that whole message again. That’s the key point here, IMO - can we please change the terminology so that it makes clear the layers involved here? If nothing else, can we please have the term “installer” back so that I don’t have to keep referring to pip as a “low-level installer” or “real installer” or some other made up term?

steve.dower · February 7, 2021, 6:43pm

Yes, I totally agree (and argued the same in my very first reply ).

From the PEP abstract:

Note the last sentence in particular: it allows users to use an installer as if they were invoking it directly.

In other words, for a given clone, I might do git clone foo; python -m venv env; env/scripts/pip install -r requirements.txt. For another, I might do git clone bar; python -m poetry --whatever, or git clone baz; conda create -f environment.yml. Notice how I invoke each installer directly?

With this PEP, and some “universal installer” such as VS Code or GitHub Actions, I would do git clone ...; universal-installer. The universal installer tool looks at pyproject.toml in the current directory and chooses the right command that I used manually above.

Tox essentially offers the same thing for “universal way to run scripts based on a file in the project”. It could take advantage of PEP 650 by adding a “do install” command that finds and executes the command in the current project without the user having to write it out themselves in tox.ini.

It may seem like a small thing, but consider that every single deployment tool and IDE needs to do the same thing, and Tox is not a universal standard that offers a universal “just install packages into the current environment” command. That’s what this PEP defines.

(Edit: for virtualenv, it could offer a “create environment and install packages” command that doesn’t require any additional configuration if PEP 650 has already been used. It wouldn’t even require the developer to know that virtualenv might be used on it one day! It’d just work, because virtualenv could create the environment and then trigger the right installer.)

bernatgabor · February 7, 2021, 7:23pm

This won’t do because this works only if you start from scratch all the time. That would be really slow for iterative use. tox needs to know what installers how it were invoked so it can be reasonably fast and cache operations (e.g. skip dependency install if no new dependencies have been added). Unless we get to a world where universal installers can be smart enough to figure out how to sync some config with their current state.

sbidoul · February 7, 2021, 8:42pm

Something nags me with this PEP and I think it is because in my mind installing is essentially a solved problem: build a wheel using PEP 517 and then install the wheel and the dependencies it specifies in its metadata. All this can be done with well defined standards that we now have.

To put it another way, project authors have to explains how to build its project by declaring the build system, that is reasonable and well understood. But project authors should not be bothered with explaining how to install: that should be doable in any way installers decide to deploy the built wheel.

In the motivation section of PEP 650 all the use case that stick in my mind seem to relate to specifying what to install rather than how and specifically which dependencies, hence the concept of dependency group.

Could these use cases be addressed by introducing the concept of dependency groups at the metadata level and let build back-ends take care of populating them ?

For instance we could have freely named dependency groups (a variation on extras), and a well known group name or naming scheme that refers to locked requirements. The build back-end would then be free to populate these dependency groups metadata from their lock file, the lock file format remaining a back-end implementation detail.

So in a nutshell, I feel this PEP is introducing an additional component type in the packaging stack to address use cases that perhaps we could solve by extending a little bit a layer we already have (i.e. the build/metadata layer).

brettcannon · February 9, 2021, 10:09pm

That’s just a lock file, and it was difficult to try and make a standard work the last time it came up. Plus not all projects deploy as a wheel, so you can’t assume a METADATA file will exist to read from. And putting in pyproject.toml still makes this idea a standardized lock file.

Having said that, some discussion are going on behind the scenes with some folks to see if a lock file standard is doable so we can present a concrete proposal to cut down on the bikeshedding and disagreements instead of just saying, “what do people think of a lock file?” and coming to no agreement of where to even start.

bernatgabor · February 10, 2021, 8:49am

This approach seemed to work for PEP-621 so looking forward to this too.

sbidoul · February 10, 2021, 9:02am

Not really. It is the idea of putting pinned dependencies in metadata (lowercase). The fact that it will end up in a file (METADATA or other) is ancillary.

Why wouldn’t they ? Assuming we have PEP 517 editables (we are very close to), what use cases are not covered with “build (editable) wheel and install it” ? If project authors could be bothered with declaring an installer backend, can’t they be bothered with declaring a build backend instead ?

Nothing in what I wrote implies putting it in pyproject.toml (which is not the canonical source of metadata anyway).

Also it occurs to me that having pinned dependencies in metadata (still lowercase) has value for tools that are distributed on PyPI. If I could have the option to pipx install tool and be sure that it comes with the dependencies versions the author has tested it would be a definite plus.

Anyway, looking forward to reading what comes out of behind the scene.

brettcannon · February 10, 2021, 8:16pm

As an example, Azure Functions can deploy using a zip file of your source code. It just happens to also support looking for a requirements.txt file in that source to get you your dependencies. Amazon Lambda also deploys using a zip file, so this isn’t an Azure-specific thing.

Then I’m afraid I’m not understanding what you’re proposing. Where is these pinned dependencies to be kept? And how are they to get there?

sbidoul · February 10, 2021, 10:05pm

Ok. What I’m saying here is that instead of asking users of such tools to declare a PEP 650 installer backend in their project, we could ask them to declare a PEP 517 build backend, and then let Azure functions or Amazon lambda build and install from that, possibly in editable mode.

That can be left for build backends to decide what is best for their users, as long as they can expose them to frontends via PEP 517 prepare_metadata_for_build_wheel.

Of course that requires agreeing on a representation in metadata, but a MVP for that should be tractable (something like dependency groups in locked and open variants?). A minor challenge will be finding a representation that fits in RFC822 but that sounds doable. And as that format will not have to be authored nor read by humans it might be easier to design, in a tool-independent neutral ground, while providing a level playing field for build backends to research and compete for the best UX.

That should meet all use cases via a reasonably cheap call to prepare_metadata_for_build_wheel (IDEs can discover dependency groups in metadata; Azure functions, Amazon lambdas, pipx, pre-commit can install locked dependencies; dependabot can obtain locked dependencies without installing, etc).

I guess what worries me in the end is that installer backends are additional moving parts to the packaging machinery that will have a very high cost, if only in terms of teaching. While we may do with an upgrade to a part we already have.

steve.dower · February 10, 2021, 10:10pm

Except many (most?) of these deployments don’t look like a module - they look like a script. And hopefully, they’re just the script, and not even any imports.

So this would increase the minimum complexity of using Python from “provide a script file” to “choose/test/use a build backend that installs a module”. That is such a significant jump that we’d almost certainly continue to support a package-less deployment process, which would look like the status quo (i.e. you must use pip).

(Minor aside: editable mode is irrelevant here because it’s a production deployment. Nobody is editing or rebuilding this code, but someone has to ensure dependencies are installed and know how to launch it.)

sbidoul · February 10, 2021, 10:24pm

With PEP 650 don’t you need to explicitly chose an installer backend ? You could similarly chose a PEP 517 backend that spits out a metadata-only wheel from, say, requirements.txt (with very minimal metadata in the wheel, i.e. the locked dependencies). If the deployment environment has a convention to locate the main script, that convention could remain the same.

In a production context I consider editable installs useful to avoid copying the project code to a somewhat obscure location (site-packages) and have easier to understand stack traces as a result.

steve.dower · February 10, 2021, 10:30pm

Unfortunately, yes. But you don’t have to restructure your code. And right now impacted users are explicitly choosing an installer anyway, so really it’s just a codified way of recording that choice (rather than all of these platforms going “we found a requirements.txt, guess they meant pip” or “we found pipfile, guess they meant pipenv”, etc.)

Your production context doesn’t look like everyone’s, I guess. For users publishing to Azure Functions, they have an “app” directory that contains their code, and dependencies go into some other site-packages directory they don’t have to care about. But their code is not an importable module, it’s a script, so it was never going to be put into site-packages anyway.

If you’re deploying your application as a module, that’s up to you, but it’s neither a required nor universal approach. So (ab/re)using module metadata for environment creation is going to significantly complicate many users’ setups, while a simple declaration like this (or better yet, a standard lockfile so that any installer can be used) is not as invasive.