Pip plans to introduce an alternative (zipapp) deployment method

pf_moore · July 17, 2022, 12:32pm

The pip developers are currently exploring options for a new deployment method for pip, probably in the form of a zipapp or similar standalone mechanism. This will not (at least in the short term) replace the existing approach of installing pip in your environment, but it will offer an alternative for people who prefer not to have copies of pip installed everywhere.

The new approach won’t appear before the 22.3 release of pip (October 2022), but we wanted to give advance notice, as this change has the potential to affect what people (and tools) can expect - at the moment, it’s reasonable to assume that pip is present in any Python environment, and python -m pip will run pip anywhere. With a standalone pip application, this will no longer be the case.

So we’d be interested in any feedback on how this could affect people’s workflows, or tools. To reiterate, we’re not expecting to change the official deployment method in the short term, but we will be offering (and supporting) other approaches, and we’d like to get a better feel of the impact so that we can determine how to plan the rollout and how to frame the announcements.

To be clear, we’re not asking for views on whether this would be a good idea^[1] - that’s going to be a decision we will make ourselves. But we do want to know about any examples of workflows that at the moment are tied to pip being installed in the environment, so that we can ensure that they are considered in any change that we do make. We’re also not planning at this point on any changes to the stdlib ensurepip module, nor are we expecting to change the stdlib venv module to stop installing pip by default. That sort of change may happen later, if the “standalone pip” approach proves popular and useful, but it’s not in our plans now.

In particular, responses along the lines of “I don’t agree with this approach” with no explanation, don’t really offer anything actionable that we can use. ↩︎

daniele · July 17, 2022, 1:14pm

I don’t know much about zipapps thus it is possible that I don’t completely understand the idea behind this, however, I infer that the idea is to have a single zipapp pip serve as the installer for more than one python environment. Given the complexity of the tool and the non-trivial dependencies this will definitely make individual venvs leaner. However, how would the zipapp pip know on which python environment to operate? Will pip gain a new command line argument to specify it? If this is the case, I can see a little bit of trouble in adapting installation instructions for many projects to the two different cases (python environment specific pip and zipapp pip). Most likely this already came up in discussions within the pip maintainers. Can you point to any relevant proposal/discussion/PR?

pf_moore · July 17, 2022, 2:28pm

The idea with a zipapp is that you run the .pyz file with the Python interpreter you choose. In much the same way that you currently run pip via python -m pip (py -m pip on Windows), with the zipapp you would run python /path/to/pip.pyz. Or, if pip.pyz is on your PATH and marked as executable, you can just run pip.pyz and it will be executed with the currently active Python interpreter.

Virtualenv has been distributing a zipapp for some time now, if you want to see how this works in practice.

Some OS-specific details:

On Unix, you can rename pip.pyz to pip and you’ll effectively have a standalone pip command that works in whatever environment you have active.
On Windows, you can add .pyz to your PATHEXT variable to get the same effect.
Unfortunately, if you want to run the zipapp with a specific Python executable, you have to use the full path to the zipapp, as Python doesn’t search your PATH for you.

Aliases, shell functions, and shell or Python scripts can wrap some of these details up for you, if you want. Ideally, core Python would provide more guidance on using zipapps, but at the moment it doesn’t.

EpicWink · July 17, 2022, 10:37pm

Our workflow is to install the system Python in a bare Ubuntu Docker image, then install pip via curl https://bootstrap.pypa.io/get-pip.py | python3. It sounds like I’ll need to add a symbolic link ln -s pip.pyz ~/.local/bin/pip.

oscarbenjamin · July 17, 2022, 11:05pm

If this thread is not the one to discuss whether this is a good idea then where is the discussion about that happening?

It’s not so much that I would want to contribute to that discussion but I would like to at least read what has already been discussed. I find it hard to see how this idea would impact anything without really understanding what anyone would use it for or how they might use it.

pf_moore · July 18, 2022, 8:10am

Sorry, maybe I came off too strong with this. There’s really not been such a discussion - most of the conversation happened in Ship pip as a standalone application · Issue #11243 · pypa/pip · GitHub, which built off some comments in a previous PR, which are rooted in a long-term feeling that there’s no real need for pip to be so closely tied to the target environment. So this is more of an ongoing goal which we’ve recently discovered is more practical right now than we thought.

What I was trying (badly) to say was simply that we’re not looking for votes here - we won’t make a decision based on popularity, and we’re not planning on taking away the old version, so people won’t be forced to change anything if they don’t want to.

To give an example of the sort of impact we’re interested in, Jupyter has (I believe) a %pip magic command, that installs packages into the running notebook. I presume it does this by running pip in a subprocess, but I don’t know how you tell it to find the right “pip” command. If it simply assumes that pip is installed in the notebook environment, and runs “python -m pip”, then that will no longer work if pip is installed standalone, and the notebook environment was created without pip. That’s equally possible now, but much less likely, so it may be that Jupyter has a solution already, or it may be that they discount that possibility. There are many ways to address this:

Ask users to install pip in any environment where they want to use %pip (but will this work for hosted environments?)
Simply don’t support %pip in environments without pip.
Download and install the pip zipapp on first use of %pip.
Add a config option to allow the user to specify how to invoke pip.
Look in “obvious” places (python -m pip, a standalone pip or pip.pyz command on PATH) and use the first one found, or report an error if none exist.
Probably others.

I don’t know which of these are reasonable for Jupyter, or even if this is even likely to be an issue for them at all. Hence the request for feedback.

J-M0 · July 18, 2022, 5:17pm

I’m pretty sure this change would not be compatible with pip-tools, which is a popular project for locking dependencies in an environment.

pf_moore · July 18, 2022, 6:46pm

Can you clarify what the problem would be? Would it not be sufficient for pip-tools to depend on pip, so that if pip-tools is installed, pip will be?

J-M0 · July 18, 2022, 9:21pm

Oh, yeah that would probably still work. I was coming at this from the angle of the someone who only wants one pip for their whole system and assumed there was some mechanism in place to prevent more copies of pip from being installed.

merwok · July 18, 2022, 10:57pm

As a pip-tools user, I expect to try one global install of pip-tools and pip! It should be not trivial but not too hard.

It seems interesting to remove pip and pip-tools from each project’s dev requirements; I would manage their install and update with my usual global devtools virtualenv, but would need to think about the impact on coworkers, especially in the cases when it’s not appropriate to update pip-tools as soon as a new release is out (some of us pip-tools users recently had a period of pinning an older version when the output format changed and we were waiting for an option to mitigate the diff to output files).

pitrou · July 22, 2022, 12:20pm

I’m curious, do people actually deliberately create pip-less environments?

pf_moore · July 22, 2022, 12:45pm

At the moment, almost certainly not, because you can’t install anything into an environment without pip. But with the new option to deploy a single shared copy of pip, and use it in any environment, it becomes a lot more attractive to not install pip in all your environments:

Only one copy of pip to keep up to date.
Environment creation is a lot faster if you don’t install pip (on my PC, it’s the difference between 6 sec and <100ms with the stdlib venv module).

As you suggest, pip-less environments are currently very uncommon, and we’re trying to judge the impact if we make a change that results in them becoming a lot more common.

zware · July 22, 2022, 1:13pm

Yes. We use a non-PyPI index that wants a huge variety of dependencies for auth. We create a pip-less venv in the project and use pip from a venv shared between projects to manage dependencies in each project.

It has some rough edges, but it keeps our project dependencies out of conflict with our index auth dependencies. It has also shown us that there are several popular projects out there that implicitly depend on setuptools, usually for pkg_resources.

benji-york · July 22, 2022, 7:01pm

If I’m understanding correctly, this would mean that pip becomes something managed outside of Python.

This would slightly increase work required to ensure pip is available in a cross-platform way. A build step can currently run “python -m ensurepip” and get the “standard” version of pip for the Python in use; in this future scenario the build would have to test to see if pip is available and if not, either stop and punt to the user to get it installed or know how to install it. That seems less attractive than the current situation.

davidism · July 22, 2022, 7:03pm

No one is saying that ensurepip or virtualenv with pip will stop working.

zware · July 22, 2022, 7:31pm

It means pip can become something managed on its own, not that it must. Paul has noted a couple of times already here that nothing is changing for any existing uses (yet, though it’s not been ruled out as a probably-distant-future possibility).

benji-york · July 22, 2022, 7:32pm

It seems that replacing the existing approach is on the table. That’s the situation I was trying to address.

pf_moore · July 22, 2022, 9:18pm

Sigh. Let me try again.

There are no plans at this point to desupport the existing options for installing pip. This is an alternative we’re considering offering. If you, personally or as an organisation, want to continue installing pip in all of your environments, exactly as you do now, then there is nothing stopping you, and no plans to prevent that. However, for people (like me!) who prefer not to install pip in all of their environments, they will now have another option, and so are more likely to create environments that don’t have pip installed (and will probably be unhappy with other tools if those tools prevent them from working that way).

That’s the only impact we’re talking about right now. If people with environments that you have no control over, stop installing pip in those environments, will that cause you problems? And if so, why, exactly? Our assumption is that very few tools will be affected by this possibility, but we’ve heard nothing back from the developers of such tools yet. We expect that very few if any people will have workflows or local automatiopn that will be affected - specifically because you’re in control of your workflows, and making a local rule that all environments must have pip installed is entirely possible in such a situation.

I understand the concern that at some time in the (as yet undefined) future the pip developers might decide to make the “standalone application” option the only supported way of using pip. But we’ve been as clear as I know how that we have no plans currently to do that. And if we did make such a plan at some point, we wouldn’t do it without consulting the community and discussing the impact. Exactly like we’re doing with this change.

In reality, we could have simply added the zipapp to the available install options, and not worried about the impact^[1] - precisely because we aren’t removing anything. So honestly, I feel a bit let down if, as a result of trying to do more than we needed to, we end up with people assuming there’s some sort of nefarious plan behind it. There really isn’t.

And frankly, I’m starting to wish we had… ↩︎

CAM-Gerlach · July 22, 2022, 9:46pm

Just to be clear, I am not really involved in the IPython project (or Jupyter) directly and don’t claim to speak with any authority; @mbussonn would be the one to ask about that. However, I do want to clear up some misconceptions and provide a bit of insight from the Spyder side of things, since this is relevant to us as we seek to add more package management functionality to Spyder itself, to avoid many user pitfalls in this department.

Not exactly — like all magics, %pip is an IPython command to install packages in the environment of the current running IPython kernel; it isn’t anything specific to Jupyter or notebooks (though it can be used in those that are running in Python and use IPython as their kernel, just like in Spyder, QtConsole, and other editors and IDEs that can use it, as well as through the IPython interpreter directly), and Jupyter can run other kernels and other languages that naturally lack the magic.

To make a long story short, the conjecture above is mostly correct, other than that we’re talking about the IPython interpreter and the environment in which it is installed and runs, rather than Jupyter and notebook environments (which could easily differ from one another, same as in an IDE like Spyder that uses IPython for its interpreters).

The specific call is just the equivalent of subprocess(sys.executable, "-m", "pip") to ensure pip is executing in the same environment as the kernel, and it simply assumes pip is present—setuptools is an explicit dependency, but not pip, since pip is currently guaranteed (by venv and conda) to be present in any environment in which IPython is currently likely to run.

The code

github.com

ipython/ipython/blob/0f4a73c91db71cde174275337fc5e226acc9dd7d/IPython/core/magics/packaging.py#L75


      
          
              Usage:
                %pip install [pkgs]
              """
              python = sys.executable
              if sys.platform == "win32":
                  python = '"' + python + '"'
              else:
                  python = shlex.quote(python)
          
              self.shell.system(" ".join([python, "-m", "pip", line]))
          
              print("Note: you may need to restart the kernel to use updated packages.")
          
          @line_magic
          def conda(self, line):
              """Run the conda package manager within the current kernel.
          
              Usage:
                %conda install [pkgs]
              """

To note, the handling of conda and conda envs is more complex, since conda is typically only installed in the base env and given the need to activate conda envs, actually getting the conda executable is non-trivial.

It seems the simplest solution for this basic use case would be to declare an explicit dependency upon pip in the environment (or produce an informative error message if one is not found).

A more complex alternative might be possible, since IPython (and we, on the Spyder level) employs somewhat similarly more complex logic to find the conda executable, but while a popular project like IPython could conceivably implement something like that, this seems to be a non-trivial burden on the average developer such that they’ll likely just break, raise an error message or require pip as an explicit dependency, none of which are ideal. Perhaps a small library, vendorable module or even (horrors…) a copyable standard snippit could help reduce the possible pain point here?

Maybe @mbussonn can comment further from the IPython side…

pf_moore · July 22, 2022, 10:33pm

That would be a shame, because it would defeat the whole purpose of having a standalone pip. But the point of this thread is to gather feedback on what tools would choose to do, so if that’s how IPython prefers to address this, we’d take that into account. But please don’t pre-emptively add a dependency on pip to anything at this point, as that would cause other issues (we have a bunch of special-case code in pip to ensure that pip install -U pip either works or at least fails gracefully, and that code wouldn’t work if pip install -U ipython started upgrading pip…)

Given that IPython has logic to find the conda executable, wouldn’t finding pip just be similar? Or couldn’t you assume that one of python -m pip or pip would work, and let the user manually override if not? (To be clear, I’m not trying to tell you how to solve the issue here, just trying to understand why my assumptions that doing it that way “would be easy” could be wrong).