Pip plans to introduce an alternative (zipapp) deployment method

ofek · July 22, 2022, 11:38pm

Is there a way to achieve this behavior currently? I’d be quite interested in Hatch depending on pip and just invoking it with a --python option.

CAM-Gerlach · July 23, 2022, 1:27am

Yeah, I know…but I fear that without some sort of small reference module to handle this, for many smaller projects that make use of pip like this, it is the “easiest” solution that doesn’t require a fair bit of platform-specific hackery, manual user effort or simply breaking. Furthermore, unless developers are specifically aware of this change (which inevitably many if not most won’t be until something breaks), may not immediately realize what’s happening or what to do about it when they get user bug reports about the change, and their first reaction is likely to just tell people to install pip.

Large applications like IPython and Spyder can usually afford to implement one or another workaround, so long as someone understands the problem and is willing to spend a bit of time doing it. In fact this actually could be a significant boon for Spyder at least since we can just bundle the latest pip zipapp in our standalone installers and use that instead of relying whatever versions users may have scattered through their various working environments (though this isn’t really the same for IPython, since it must be installed in each working environment it runs in and isn’t a standalone application).

However, I maintain or am involved with several smaller projects (off the top of my head, Mjolnir/Brokkr, Lektor, Pyroma, etc.) that also rely on pip being present in different types of environments to do various things, and as they are generally pretty lightweight, often structured more as libraries than applications, and may be deployed without an internet connection, pretty much any approach imposes some sort of cost:

Adding a dependency on pip prevents using pip standalone, but otherwise is the least cost to return to the status quo
Not supporting pip in pip-less environments breaks the functionality or requires tedious user action which negates much of the benefits
Downloading and installing a zipapp is not trivial and requires an internet connection, which is not always available in the contexts which many of the above packages run
Adding a config option requires some form of config system and manual intervention from each user (unless built into automated deployment)
Attempting to find the user’s installed zipapp pip requires a non-trivial amount of platform-specific hacky code that may or may not be reliable, and may still not work in a many cases

I can’t speak for Mattias & co, but while the overall idea may be somewhat similar, it is potentially a lot more complex since conda environments are highly structured and centralized, and conda will be installed in either the environment corresponding to the current Python executable that IPython is running on (if its base or the user has an environment-specific conda/mamba install, which is not common) or in the base environment, which is not that difficult to deal with. By contrast, a standalone pip could be just about anywhere, and could be in a conda env, virtualenv, venv, pipenv, pipx, or somewhere on the path.

For Spyder, though, since it already needs to find virtualenvwrapper and conda envs on the system and the Python executable within them, it isn’t necessarily as much more complicated, though still not as reliable, unless you use some form of standardized install method rather than just handling people a standalone zipapp they can do anything they want with. And of course, its fairly easy if we just bundle it in our installers ourselves (though we still have to deal with it when Spyder is installed via pip/conda/Anaconda/WinPython/MacPorts/Linux distros/source/etc…).

We (Spyder) could assume sys.executable -m pip (i.e. that pip is installed in whatever user working env we were operating on), or even just use our zipapp copy if running from our standalone installers, etc., which would be more or less true for a while, and just issue a very clear error message if it wasn’t with instructions on how to configure the path/etc. in our preferences. Most likely, we’d at least try to find it via the above approach first, which should cover most cases and minimize user complaints to an acceptable level…hopefully.

However, for IPython it may not be as nice; they are likely not going to bundle a zipapp pip just for a few magics, and I’m not sure what the obvious way would be to set and store the user’s desired pip path, (particularly persistent across kernels and environments); its probably possible but AFAIK they don’t really store that much persistent cross-environment state like we do. Perhaps setting an env variable, but doing to that effort defeats much of the convenience and ease of user of the %pip magic. But that’s really up to them. For other projects, things may be even worse, as noted above.

Also, to note, just executing pip without calling it with the sys.executable of the target environment could very well run pip on the wrong environment, since there is no guarantee it will point to the same Python executable we’re running, particularly with conda environments where activation is quite non-trivial.

At least as I understand it, you have to pass the zipapp to Python, you can’t pass Python to the zipapp as it isn’t executable on its own.

pf_moore · July 23, 2022, 7:29am

Yes, it is. See GitHub - pfmoore/runpip for a script that builds a pip zipapp.

But please don’t make hatch depend on pip. As I said, pip isn’t set up to work as a dependency of other tools, and you will get bug reports from people who try to upgrade hatch and it fails because pip can’t update itself (typically because the script wrapper upgrade fails).

Probably the biggest thing that has come out of this discussion so far is that I want to explicitly add a note that pip does not support being quoted as a dependency of other tools (because it has no supported Python API).

pradyunsg · July 23, 2022, 7:30am

Actually, it would and pip install -U ipython won’t attempt to upgrade pip if pip is a new-enough version!

But, the other slightly-better reason is that the fundamental assumptions haven’t changed right now, so we don’t need to make this change.

pf_moore · July 23, 2022, 7:37am

C.A.M. Gerlach:

However, I maintain or am involved with several smaller projects (off the top of my head, Mjolnir/Brokkr, Lektor, Pyroma, etc.) that also rely on pip being present in different types of environments to do various things, and as they are generally pretty lightweight, often structured more as libraries than applications, and may be deployed without an internet connection, pretty much any approach imposes some sort of cost:

Adding a dependency on pip prevents using pip standalone, but otherwise is the least cost to return to the status quo

Not supporting pip in pip-less environments breaks the functionality or requires tedious user action which negates much of the benefits

Downloading and installing a zipapp is not trivial and requires an internet connection, which is not always available in the contexts which many of the above packages run

Adding a config option requires some form of config system and manual intervention from each user (unless built into automated deployment)

Attempting to find the user’s installed zipapp pip requires a non-trivial amount of platform-specific hacky code that may or may not be reliable, and may still not work in a many cases

Thank you. This is precisely the sort of feedback I was looking for. One further question, then. If you got a bug report today from someone trying to use these tools in an environment they had created using python -m venv --without-pip, what would you say to them? Because whether you are aware of it or not, you do currently have some policy on such environments, it’s just unlikely that it’s ever happened.

Yes, I’ve been using the zipapp recently for testing, and I find the need to remember to activate the environment before calling pip to be mildly frustrating. It’s a shame Python doesn’t have a "search for the script to run on PATH option, as python --use-path pip.pyz would be a pretty convenient replacement for python -m pip, but python /absolute/path/to/pip.pyz not so much…

pf_moore · July 23, 2022, 8:02am

Correct, I was over-simplifying, as we also recommend python -m pip install -U ipython. The command that does fail would be (from memory) pip install -U --upgrade-strategy=eager ipython.

But my main point was that I don’t think we want to start to see people adding pip to their dependency metadata (and certainly not just because of concerns that we might start encouraging people to use environments without pip). But let’s take that one to the pip tracker - I’ve added Having a runtime dependency on pip: make it explicit whether this is supported or not · Issue #11290 · pypa/pip · GitHub for us to have a more focused discussion on this one point.

pf_moore · July 23, 2022, 8:40am

Just to put this into context, if pip had originally been developed as a standalone app, we would never have had the situation where pip was installed into every Python environment, and people would simply have run pip via the pip command, simply assuming that command was present. Exactly the same as other commands like git.

I strongly believe that people shouldn’t be viewing an end goal of Python environments not needing pip installed as a problem. It’s the transition that’s the issue, and IMO we should be doing whatever we can to make the transition as painless as possible, not trying to avoid the change.

Edit: Just to be clear, this is very much my personal view. I don’t speak for the pip developers when I say this.

pradyunsg · July 23, 2022, 8:43am

That should work too – we check if pip’s a part of the requirement set for triggering that protection logic:

github.com

pypa/pip/blob/0d4e9eb72253c008f2790482e664ce92198c5240/src/pip/_internal/commands/install.py#L400


      
                  for r in requirement_set.requirements_to_install
              )
              if would_install_items:
                  write_output(
                      "Would install %s",
                      " ".join("-".join(item) for item in would_install_items),
                  )
              return SUCCESS
          
          try:
              pip_req = requirement_set.get_requirement("pip")
          except KeyError:
              modifying_pip = False
          else:
              # If we're not replacing an already installed pip,
              # we're not modifying it.
              modifying_pip = pip_req.satisfied_by is None
          protect_pip_from_modification_on_windows(modifying_pip=modifying_pip)
          
          check_binary_allowed = get_check_binary_allowed(finder.format_control)

If it’s not working somehow, then I’d rather we fix this. I’ll flag that this is something that’s a risk-factor for Windows users only though.

pf_moore · July 23, 2022, 8:48am

Oh wow, I didn’t know we’d fixed that - cool. Maybe we should halt this sub-thread, as it’s degenerated into me just showing my ignorance at this point

benji-york · July 23, 2022, 11:55am

Communication is hard. : )

Keep up the good work!

mbussonn · July 23, 2022, 1:15pm

From the IPython side of things, we provide the %pip (and %conda) magic mostly because many newcomers don’t make the distinction between what needs to be ran inside the Python interpreter and outside. In a growing number of cases users don’t even have access to the outside shell as they start spyder/jupyter from a Desktop icon. Note that the % is optional if there is no variable of the same name, so many users just copy and paste commands which “just work”,

On the Jupyter/IPython side we do not recommend using the magics, as in many case it does not work immediately (need to restart the kernel), and the installation is non-interactive, so you can’t prompt users for Yes/No.

I want to point out that napari also goes to great length to have a GUI to let user install/uninstall w/o the CLI.

I believe all of this are really just workaround around the fact there is no programmatic API to manipulate what is in an env and install/remove/update. Shelling out is not that hard, but getting interaction of install progress is not that straightforward. I’d love for something stable that works regardless of pip as a zipapp or not.

Personally as I mostly use conda, and packages are hardlinked, having pip installed 1 or 200 times does not change much.

Finally I believe that on the Jupyter side the pip magic mostly targets less advanced users than spyder. So the only reasonable option I see is to put more heuristics as other alternatives require users interaction and decision, and novices will have no clue what to answer.

I’ll need to re-read this thread, but has it been considered to have the “pip” module be much smaller/shallower, and have python -m pip either call into say “pip-core”, or the pip zipapp depending on which one is installed ?

CAM-Gerlach · July 23, 2022, 2:07pm

Great, glad it was useful!

We ourselves have been for a while now generally recommending running Spyder standalone rather than actually itself installed into their working environments, so perhaps the challenges we’ve faced and our solutions to them might also be helpful here:

Most of the initial struggles have related to more users having to specify the Python environment they want to use, if not the one running Spyder (like you have to in other IDEs that aren’t themselves written in Python). We’ve made this much easier by having Spyder autodetect conda envs and mkvirtualenvs on the system, so users simply select the one they want.

For pip, the equivalent is having to find, or the user specify, their pip zipapp. Providing a small library to do this could similarly help alleviate this concern.
Another big challenge is that there’s no real way to install Spyder plugins in the runtime env without bundling them in the initial installer, since the environment is frozen. We plan to solve it by providing micromamba with the installers (not sure of the specific details atm).

As far as I know, this isn’t a big issue with pip right now as it really doesn’t have plugins (right?), but you might want to at least consider what you’d do here.
It makes updating to a new version far less painless than just running conda update spyder or pip install --upgrade spyder; naively, users have to redownload and re-install it to update, which is a big step backward in usability. We are addressing this by adding an updater to the standalone installers.

This could be a potential issue for pip, especially if it displays the update nag on every run. While perhaps not as heavyweight as downloading and running a new Spyder installer, if pip is just going to be providing standalone zipapps and people are doing their own hacks to “install” them, they could have multiple copies floating around in random places on their machine or simply not bother to take the time and risk to of breaking their setup by doing it all again.
Finally, the long-term continuing challenge has been getting users to actually discover and use the standalone installers, as opposed to just continuing to use Spyder pre-installed with Anaconda/WinPython/etc, or just doing pip/conda install spyder as they always have. We’ve definitely stepped up our efforts to educate users on this and make it as painless as possible, but there’s a large population of people we simply cannot reach (without an intrusive message in Spyder itself, which would upset users) and probably never will, which we’ve had to accept.

This could be something to think about with pip as well, perhaps even more so, since right now it comes by default with nearly every Python environment, so the vast majority of users don’t even think about it, and probably never will even know its an option, unless you display a message on first use of a in-environment pip (which may provoke user backlash). And if pip starts not being installed by default with Python environments, which would actually get people to notice, you could face user anger en mass for making them have to go to the effort to download and install it themselves, especially if the UX is as non-smooth as it is now.

I’m really not sure, since AFAIK its never happened, especially for Spyder since the majority of our users use it with conda envs that get pip installed when they install the python package, so there’s no non-hacky way I know of to get Python without pip there. Right now, the case is so incredibly rare that it would be hard to justify anything but simply failing in that case, but depending on how common it may become, that could of course change. However, it is really hard to say as the percentage of users affected is a major factor in the calculus for what approach would be favored.

Have you thought about opening a CPython issue?

One difference, though, is that they act on somewhat different contexts. For Git, all it needs is to be in the working directory of the repository in question, which given many Git operations require specifying paths anyway, is already generally required to be known. By contrast, with pip, you have to ensure the correct environment is activated for it to work as intended, which could be by a variety of different mechanisms that may range from trivial to highly complex, and whose details are out of the scope of pip itself.

Also, this environment is not just the context that pip works within, but is also the runtime environment for pip itself, and thus must have a compatible Python version and facilities, which is not something that is true of git.

pf_moore · July 23, 2022, 3:59pm

That’s certainly a possibility that’s crossed my mind as a result of this thread. It may be that for purposes of a simpler transition, we’d have something like an installable pip-run package that provided an API that wraps a subprocess call to python -m pip and will be extended to auto-detect other ways pip can be available. But reorganising to a small pip and a larger pip-core might be possible longer term.

Yes, I have. I’ve a suspicion this has been requested before and rejected, though, so I’d want to do some research before trying again. The big problem is that it’s hard to get traction, because tools that work on the Python environment that run them are rare, and on Unix most other use cases are fine with just a file with a shebang. Windows is mildly tricky in that regard, as the user needs to configure things right for .pyz files to work as executables (and even then there are edge cases, but not enough to make it critical). So, long story short, most people have a solution, and pip’s need is rare enough that it might not justify a CPython change.

One other thing we’d like to do is to add a --python option to pip that allows it to work on an arbitrary environment. That might remove this issue, as well as the need to run pip with the “right interpreter”.

Lots of options in the air here, and the feedback is very helpful, so thanks.

ofek · July 23, 2022, 4:03pm

If pip install adds a --python option why not? Isn’t that the entire use case?

pf_moore · July 23, 2022, 4:13pm

Because if hatch depends on pip, then pip gets installed in any environment hatch is installed in. And that makes it impossible for people to have a single global pip executable that they use for all their environments (because the environment-local copy of pip will shadow it).

Regardless, I’ll repeat what @pradyunsg said - please don’t pre-emptively do anything here. Given the feedback we’ve received here, there are quite likely going to be other, better options.

ofek · July 23, 2022, 4:16pm

Hatch is also global, like tox, Poetry, etc.

edit: Basically, I don’t want users to have to install pip separately by default to get this feature.

pf_moore · July 23, 2022, 5:04pm

The hatch documentation suggests pip install hatch (as well as pipx, but pip install is first). So I hadn’t appreciated that. Installing pip in a pipx-managed environment would by default manage that environment, so I now understand what you meant by wanting a --python option.

I’d strongly suggest you wait. Presumably at the moment hatch just runs subprocess.run([target_python_exe, "-m", "pip"]). That’s going to continue working for a long while yet. I’m hoping that before that becomes a problem, we’ll have something that you can depend on^[1] which will let you do what you want without requiring your users to manage the pip installation that gets installed with hatch. In my experience, it’s not trivial to upgrade dependencies installed as part of a pipx-managed application. Do you really want to tell your users that when they get the “you should upgrade pip” message, they need to run pipx runpip hatch install -U pip?

Also, if you want to install pip in a pipx-managed venv, you should probably test if it works as you expect. pipx uses a “shared pip” installation to avoid having pip installed in individual environments, and I have no idea which pip would take priority, or what issues might arise, if there was also a pip in the environment.

I think this makes me feel that adding a --python option to pip might be a bad idea until we’ve worked out a clearer long-term strategy on how we want pip to be installed/managed. It’ll just result in people experimenting in ways that we might subsequently have to break, which won’t be a good experience for packaging end users.

Making sure of that is basically the whole point of this thread ↩︎

takluyver · July 24, 2022, 8:23am

Just to add another data point, I’ve used the some/path/to/python -m pip technique to install a package in a specified environment in Flit (e.g. here and here). I may have used it in other projects too, but that’s the only concrete example I can quickly recall.

I’ve also recommended that many times as the way to be sure which environment a package will be installed in - it’s not hard to get in a muddle about what running plain pip will actually do. Obviously recommendations that people have taken up are much harder to go back and change than mere code.

So far this has never come up, so I’d be inclined to say it’s an unsupported corner case. If it became much more common that python -m pip didn’t work, I guess I’d have to find some way of dealing with it. But it would be with a degree of frustration, because I’m used to relying on that.

oscarbenjamin · July 24, 2022, 11:21am

I see a lot of novice users having difficulty with pip installing into the wrong environment or in general not understanding what Python installations they have in their computer. If this proposal can help with that then that would be great. There is a risk though that everything just becomes even more confusing: with the proposal here even if you’re in the right environment you might not be using the right pip depending on whether your environment does or does not have pip and whether or not there is also a separate global install of pip that may or may not have been added to PATH etc.

One thing that would help a lot would be if the standalone pip itself had a way to show what Python installations and/or environments exist to help selecting the right one when choosing to install something. I’m not sure if that’s something that’s even possible though.

A natural next step if pip itself exists outside of any particular user Python installation would be to want pip to be able to install Python itself which would actually be very useful although I can imagine it is probably deemed out of scope.

pf_moore · July 24, 2022, 11:42am

I suspect it won’t - if you use the zipapp you’ll always install into the currently active environment, which is good, but the downside is that it’s a nuisance to install into an explicit environment (that isn’t activated). And forgetting to activate the environment you want is certainly something I do a lot I definitely wouldn’t recommend a zipapp for a novice at this point.

Things are already confusing, and adding more options will always make that worse, unfortunately. We’re still a long way from a simple, easily understandable solution, I’m afraid. Maybe making pip’s output better could help here. One option would be to clearly state at the top what environment we were installing into. That wouldn’t prevent errors, but it would help people work out what happened after the fact, I guess.

But I suck at UI design, so don’t trust anything I say on that matter

I’m pretty sure it’s not. You can create an environment anywhere and it’s not recorded in any sort of registry.

Very much so, for pip at least. I could imagine a “Python manager” tool along the lines of something like rustup, that handled installing Python, managing environments, and installing packages (for which I’d assume it would use pip under the hood). But isn’t that what conda does (for people comfortable with the conda ecosystem)? I’m not sure there’s much incentive to write “conda for people who don’t like conda” - but maybe someone might want to have a go.