PEP 650: Specifying Installer Requirements for Python Projects

uranusjr · January 20, 2021, 7:48am

This brings up the topic, how is the hook going to be called? Will it be called in-process, so the backend should use subprocesses to call the actual installer? Or will it always be called in a clean process (like PEP 517 currently guarantees) so the installer can be invoked as a Python function directly inside the hook? The PEP does not seem to clarify this at the current time (or I missed it).

(Personal opinion) I think the frontend should call the hooks in-process. This makes the playground most level for installers not implemented in Python (e.g. pyflow). If the hooks are always called in a clean process, non-Python installers will need one extra level of subprocessing.

bernatgabor · January 20, 2021, 8:21am

Overall I’m +0.3 on this PEP. I’m not in love with it, but see where it can help.

FWIW big +1 on this. The topic opens the door for endless bikeshedding of details though and disagreements. IMHO if we look for universal approval we’ll never get there, and in this case lacking a solution hurts more than having a not perfect solution for this. Now this might be controversial, but I think as long as the pip maintainers are happy to implement the new lock file spec I’d be happy for the steering council to push it through (even if it’s a solution I don’t agree with). I’m certain other tools/projects will follow in line, even if their maintainers object to parts of it. If no one does it until then, in a post tox 4 world (read Q3) I’d pick up that thread again.

brettcannon · January 20, 2021, 7:50pm

I don’t know @dustin 's view, but currently I’m assuming in-process as there isn’t any other good way to specify what environment to install into ATM in general. Now the API might be able to grow to support such a thing, but once again that’s getting under-specified as I’m not sure what to pass in without making this very virtual environment-specific.

I wrote the above before reading this, so glad we seem to be on the same page.

brettcannon · January 20, 2021, 7:55pm

Thanks for the feedback!

I have chatted with some people privately about this and I think scoping is another concern. E.g. if I dev on macOS but deploy to Linux, do I need two lock files or just one that’s “good enough” that still has some potential resolution at install time? Doing a fancier pip freeze for the current system versus a hypothetical freeze as if I'm on this other type of system are very different (and become harder in the face of pre-PEP 643/dynamic in sdists and thus actually knowing what a project wants on various platforms).

I will be updating the PEP with the feedback that everyone has left so far, but I am taking into consideration whether the effort of standardizing a lock file is worth it instead or of or in parallel to this.

bernatgabor · January 20, 2021, 10:14pm

My impression is here you’d have a header within that requirement file that encodes the target - similar to how the wheel has a platform tag. Or even it can be within the requirements file name, similar to how we encode the platform tag within the wheel files names. Some projects might be able to generate universal requirement files, similar to how it’s possible to generate a wheel that’s universally installable. The tool generating the requirement file can probably determine the type of the requirement file generateed similar to how wheel project can determine at build time the wheel tag. This would follow precedent on how we handle wheel generation, the requirements file generation would follow suit to lessons learned from there. Tools generating requirement files could do a two phase operation, in the first parse the dependency tree to determine the requirement file tag (perhaps can be universal, perhaps it’s Linux only because finding out what’s Windows/macOS requires a Windows/macOS machine), and then based on the tag might generate or omit some platform-specific dependencies (by using the environment markers already introduce in PEP 496 – Environment Markers | peps.python.org). You get the idea. E.g. : requirements-dev-py3.txt, requirements-dev-cp36-cp36m-manylinux2014_aarch64.txt.

But this is off-topic and probably more relevant in another thread…

brettcannon · January 20, 2021, 10:32pm

That was what I was thinking as well.

Yep.

brettcannon · January 23, 2021, 1:15am

I have updated the PEP based on the feedback in this topic. Mostly clarification, a loosening of the return types, and an “Open Issues” section.

rgommers · January 30, 2021, 9:35pm

Hi,

It would be very useful if this PEP stated explicitly that it is not meant to be used by widely used Python packages to specify runtime dependencies for their users. I had to read this PEP twice to be sure, the title and mentions of PEP 517/518 made me think otherwise at first.

Right now we have in a project like, for example, SciPy:

install_requires in setup.py for runtime dependencies
[build-requires] in pyproject.toml for build dependencies
various [dev|doc|test]_requirements.txt files for specifying development dependencies

Plus CI config files, a Dockerfile and an environment.yml (for conda/mamba) where those lists of dependencies may get repeated. Given that we don’t want to prescribe use of a certain installer to either users or contributors/maintainers, the only thing that’s possibly relevant is perhaps some streamlining of CI config files?

It looks to me like the comparison with PEP 517 is misleading too. PEP 517 made sense mainly for Python packages (and not for most of the stakeholders of this PEP), because a package invariable needs one particular build system. For installers on the other hand, a package author doesn’t want to know - they want both their users and contributors to use whatever they like.

Apologies if I’m misunderstanding something.

Cheers,
Ralf

bernatgabor · January 30, 2021, 10:11pm

@brettcannon pointed out earlier that in his experience this isn’t true, for example from what you described you’re using requirements.txt files that imply pip nowadays. What other installers would people be able to use? conda?

rgommers · January 30, 2021, 11:15pm

“we want you to use pip” is definitely not why most package authors add a requirements.txt. It’s simply that runtime dependencies must go somewhere, so contributors can find them. Buried in a complex setup.py isn’t great, and requirements.txt is the most obvious choice. That by no means implies the maintainers prefer pip over conda, poetry, pipenv or other tools.

Standardizing requirements.txt (e.g. as [install-requires] in pyproject.toml), which apparently is hard, would be very relevant to improve that situation. This PEP does not seem to be.

bernatgabor · January 30, 2021, 11:57pm

Each of those tools have their own way to define their dependencies, and it’s not requirements.txt. This is the whole point of this PEP. Standardizing the requirement file failed, so instead we standardize how we interact with the installer, and each project can keep using their own customary format (where we are today - poetry uses toml, pipenv lock files, pip requirements files, etc.).

rgommers · January 31, 2021, 5:50am

@bernatgabor I think you’re missing my point. Let’s make this concrete. Say I’m developing a machine learning project, for which I need numpy, pandas, scikit-learn, scipy and matplotlib. And I like to use Poetry. Then:

I, as the ML project dev, have a use for this PEP. It allows me to add an [install-system] entry in pyproject.toml, I can check in only poetry.lock (which I was doing anyway already probably) which lists these five requirements, and the upside is now VS Code and my cloud provider of choice will support that better.
The maintainers of numpy, pandas et al. have to make no changes. This PEP is not relevant to them.

I think a key point I was missing in the text is that it’s not possible, and not intended by this PEP, to mix multiple installers for installing different packages into a single environment. It’s not possible because installers have registries, make assumptions about environments, etc. So only the person (or team) in charge of the final application is the intended user here.

Another point: this PEP does assume that the installer can be installed from PyPI I believe. It mentions mamba, for which this is not true - mamba must be installed from conda-forge or with the Minimamba/Micromamba installers.

Here are some suggestions in addition to my first one to make the PEP clearer:

Change the “Python Projects” in the title to “Python End User Projects”, “Python Applications” or similar. As a package maintainer I understand “Python project” to refer to my projects (NumPy, SciPy), which is not meant here.
Remove the mention of mamba, since it does not apply.
Mention that installers must be able to be installed from PyPI.
Add a clear example like the ML project one I give above.
Mention that specifying dependencies for Python packages is out of scope or a non-goal.
Mention that it is not intended that one mixes multiple installers for installing multiple packages into a single environment.
Clarify or remove the “Developers working with other developers” section. It seems incorrect, since you cannot mix installers like that and even if that did work it’s not supported by a single [install-system] section in a shared project.

kpfleming · January 31, 2021, 1:07pm

or already installed in the target environment?

dustin · February 3, 2021, 6:16am

Sorry I’m late to this thread, lots of replies below (thanks @brettcannon for fielding most of these comments!)

Thanks for the suggestion. There is now an example in the PEP, but I’ve also added experimental PEP 650 support to the pip-api project. Pip is not really the best example here though, because it can only currently support two of the five API methods in the PEP (install and uninstall).

Do you think the PEP needs to change to allow this? The original intention was that a given tool could be both a universal installer and an install backend just by calling itself. At the very least, we could add something saying this is permitted (or encouraged, in the case of pip).

I’d agree on both points. I don’t see us coming to an agreement on any form of standardized lock file that will work for all tools (and that format then getting adopted by all tools).

brettcannon:

uranusjr:

This brings up the topic, how is the hook going to be called? Will it be called in-process, so the backend should use subprocesses to call the actual installer? Or will it always be called in a clean process (like PEP 517 currently guarantees) so the installer can be invoked as a Python function directly inside the hook? The PEP does not seem to clarify this at the current time (or I missed it).

I don’t know @dustin 's view, but currently I’m assuming in-process as there isn’t any other good way to specify what environment to install into ATM in general. Now the API might be able to grow to support such a thing, but once again that’s getting under-specified as I’m not sure what to pass in without making this very virtual environment-specific.

uranusjr:

(Personal opinion) I think the frontend should call the hooks in-process.

I wrote the above before reading this, so glad we seem to be on the same page.

I think we’re in agreement here.

At the end of the day, most installers are command-line tools, and the exit codes have meaning, which I think is worth preserving. We’d also need to define a class of exceptions as part of this interface, and do that in a way that works for all tools.

What would you expect to be the return type if multiple groups could be specified?

Hey @rgommers, thanks for taking a pass at this PEP! I think that the mechanisms that this PEP is providing are meant to be used widely to specify runtime dependencies (via whatever dependency specification the maintainer wants), but I think what you mean is that the section in the pyproject.toml file that this PEP describes doesn’t specify runtime dependencies themselves (as this would basically be a solution to the “standardized lockfile” problem if that were true).

Correct - also because you can only specify one install-backend in pyproject.toml.

The PEP means to indicate that you could write a backend around mamba (that executes it as a subprocess). This backend could also ensure that mamba is available (either by bundling it or knowing how to install it).

This is not really true. An application could have a internal/private installer requirement if the universal installer is configured in a way to satisfy dependencies from a non-PyPI repository.

This section needs clarity, it isn’t about mixing install backends, but rather being able to use any universal installer the user wants. E.g. if both pip and poetry become universal installers, it doesn’t matter what lockfile is in use or what is in pyproject.toml: user A can pip install and user B can poetry install and the result will be the same.

It’s up to the universal installer to determine if the install requirements are satisfied or not, but the idea here is that a universall installer should be the only thing necessary to satisfy all install requirements (either it provides them or installs them). I think having install requirements that need to exist before the universal installer is invoked would probably be an anti-pattern here.

rgommers · February 3, 2021, 11:13am

Thanks for the replies @dustin!

No, I meant it broader. This PEP solves a much narrower problem than standardized lockfiles would, and from the way the PEP is written and your replies that’s not all that obvious.

Your reply hints at one of the main issues I see. You say “maintainer” - singular. This indeed can work as long as there’s either one maintainer, or a team where behaviour can be agreed upon or dictated. The latter may be true for example in corporate teams with a single tech lead. It is certainly not true in open source projects with multiple maintainers and a community.

From what I can tell, to use this PEP I would have to propose to my project: let’s add a [system-install] with our dev requirements, and let’s put poetry in it. And the replies will be:

I prefer conda, this won’t work with conda envs
That’s nonstandard, we want pip as the default
Why do we have to agree on a tool at all?

Mamba cannot install into virtualenvs though, only into conda envs. So you can write some wrapper to install mamba somehow, but what are you then going to do with it? Or is there some plan to bridge the conda-PyPI gap that’s not described in this PEP?

This really needs a worked out example of what will happen. Let’s see if I understand it:

All packaging tools add support for this PEP (I’m not too interested in hooks and mechanisms for how that happens).
The maintainer of some package adds

[install-system]
requires = ["poetry"]

and adds a poetry.lock file to their repo.

A user or potential contributor types pip install . or pip install this_package.
pip sees what’s in pyproject.toml, and (not sure) calls some API for a universal installer
This call get rerouted to poetry; poetry is the tool that ends up installing the packages into the active environment.

This will work, as long as users create fresh environments whenever they do things. If they go install different packages into the same env, then they may end up invoking a mix of installers, which is not going to end well.

Disclaimer: I am not at all sure I got this right, but the PEP desperately lacks such worked out use cases / workflows. When trying to fit in other concepts hinted at in this PEP like “dependency groups”, it gets even worse trying to work out what will happen.

uranusjr · February 3, 2021, 5:16pm

I’d expect the function signature to be

def invoke_install(
    path: Union[str, bytes, PathLike[str]],
    *,
    dependency_groups: Optional[Collection[str]] = None,
    **kwargs
) -> int:
    ...

The return code should be non-zero if any of the installations fail, or if any of the specified group does not exist. The universal installer can always call this multiple times with each group if it needs more fine-grained reporting, but the installer backend can (and already do, from my understanding) optimise the installation process if you provide all the groups in one go.

dustin · February 3, 2021, 5:55pm

uranusjr:

dustin:
uranusjr:

IMO dependency_group should be a list of groups, like PEP 508 allows multiple extras to be specified and installed together.
[/quote]

What would you expect to be the return type if multiple groups could be specified?

I’d expect the function signature to be
def invoke_install(
    path: Union[str, bytes, PathLike[str]],
    *,
    dependency_groups: Optional[Collection[str]] = None,
    **kwargs
) -> int:
    ...
The return code should be non-zero if any of the installations fail, or if any of the specified group does not exist. The universal installer can always call this multiple times with each group if it needs more fine-grained reporting, but the installer backend can (and already do, from my understanding) optimise the installation process if you provide all the groups in one go.

I was referring to the quoted discussion about get_dependencies_to_install, which currently takes a single dependency group and returns a list of dependencies in that group. If dependency_group is instead a list of groups, what would you expect the return value of get_dependencies_to_install to be?

I’ll agree that this PEP requires that the project maintainer(s) still need to make some choice about what installer backend they want to use. The advantage is that the end-user doesn’t need to know what that choice was, and can use any PEP-650 compliant universal installer to install the project’s dependencies in whatever manner the project’s maintainer(s) have decided.

I think a project that is unable to make that choice isn’t a project this PEP can support. I think the only thing that would work for that project would be a standardized, installer-agnostic lockfile, and since the motivation for this PEP is that such a standard is impossible to create, this is the tradeoff.

That project can either select an installer backend and support the subset of installers that are PEP-650 compliant, or continue with the status quo (no standardized lockfile, no one way to install the projects dependencies, it’s the end-user’s job to figure out how to do it).

Would a “disadvantages over a standardized lock file” section in the PEP which includes this satisfy your concerns here?

rgommers · February 3, 2021, 9:45pm

I think it’s important to emphasize that the end user now does need to know they can only use this install method if this is the only package they intend to install into this environment. Which I think is a major limitation?

In my experience most users (in the scientific / data science realm) don’t do this, they either don’t use virtual environments to begin with, or create environments and then install packages one by one into them over time (and may mix even pip install pkgname and pip install -r requirements.txt for different packages, that basically works fine today). If you advertise the new universal installer to users and it breaks those ways of working, then that may be a lot worse then where we are today.

That would be helpful indeed to be able to digest the PEP. It’s still important to also include workflows / use cases. Even just the one I provided in my last reply (steps 0 - 4) would help a lot.

I’m not sure it fully addresses my concerns about the idea in the PEP. I see little upside for any use case I have as either a project maintainer or a developer, and I do see the risk of more users mixing installers into one env and running into issues.

uranusjr · February 4, 2021, 5:59am

Ah, I see, thanks for the clarification. My suggestion is specific to invoke_install (and maybe can be applied to invoke_uninstall). IMO get_dependencies_to_install can keep accepting one singular group at a time. But if I must make it accept multiple groups for consistency, I would make the backend return one merged list of requirements from all specified groups.

finswimmer · February 4, 2021, 6:13am

Hello,

I’m in favor of this PEP

Just on question: Especially in CI/CD pipelines it is useful to decide whether dependency groups should be installed together with the project (and its dependencies) or without the project itself. Is this something that should be decided via the **kwargs in the invoke_install hook?