Make Pip detect Pip/Python mismatch

kknechtel · October 1, 2023, 9:08am

There’s a common piece of advice to run python -m pip rather than just pip to avoid various problems, in particular the issue whereby python finds a different Python installation than pip does. However, the Pip scripts are all written with the idea that pip on the command line should “just work” (upgrading Pip with itself on Windows notwithstanding). For example, in the Linux implementation, the shebang lines for the wrapper scripts specify the corresponding Python explicitly. Further, people will try it anyway, as long as it’s at all advertised that it could work at all.

Assuming “deprecate and remove the ability to use pip as such at the command line” is not an option, I propose that Pip should be able to detect the situation where it is running a different Python, vs. the one that python would invoke in the current environment, and issue a warning and a [y/N] prompt to continue. It could do this on current versions of Python by something like:

from pathlib import Path
from subprocess import run
import sys

def warn_python(xdesc, xpath, ydesc, ypath):
    ok = input(f'{xdesc} (at {xpath}) does not match {ydesc} (at {ypath}). Continue? [y/N]')
    if ok != 'y':
        sys.exit(1)

here = Path('__file__').parent.resolve()
running_python = Path(sys.executable)
# This needs to be adjusted depending on the OS
corresponding_python = here / 'python'
# This might also need adjustment to account for `pythonx.y` commands?
found_python = run(('which', 'python'), capture_output=True).stdout.decode().strip()
if found_python != running_python:
    warn(
        'Python found by the `python` command', found_python,
        'Python currently being used by Pip', running_python
    )
elif found_python != corresponding_python:
    warn(
        'Python found by the `python` command', found_python,
        'Python installation that contains this Pip script', corresponding_python
    )

pf_moore · October 1, 2023, 10:19am

This isn’t an idea for Python as such, it’s a feature request for pip. If you want to see this, I suggest you raise it on the pip tracker. But it’s nowhere near as simple as you suggest:

You need to cater for the zipapp build, so you can’t use __file__ like this.
Just assuming that you can find python relative to pip is unlikely to work in general.
You can’t assume the Python executable is called python or is on PATH. The Window launcher is an example where this won’t be the case.
There’s no guarantee the user has a which command - you should use shutil.which.

So if you want this to go anywhere, you should create a PR that implements the functionality you want. Or if you just raise a feature request, I’ll copy this comment into it and mark the request as “awaiting PR”. But I doubt anyone else is likely to tackle it - we’ve had this issue/question for years, and no-one has been motivated to do anything about it (or if they have, they’ve failed).

Rosuav · October 1, 2023, 10:41am

The user may have a python/pip and a python3/pip3 separately. That’ll be less common now that Py2 isn’t as common, but due to different installation sources, it can definitely still happen (even if the two are both Python 3.x). No idea how you’d properly determine this.

kknechtel · October 1, 2023, 11:25pm

Thanks for the heads-up that it’s considered separately. I agree this isn’t anywhere near fully baked, so I’ve moved the thread back to the Help section.

I shouldn’t have written code to demonstrate the desired functionality, because I was mostly aware it isn’t really that simple. But this point surprises me. Is it not the point that each Pip is associated with a particular version of Python; exists because it was either installed along with Python or set up by ensurepip, and is therefore supposed to be in a specific place relative to the Python for which it will install modules?

I guess another option is to just have Pip always warn about which Python it’s about to install for (this will be the one described by sys.executable, right?) and give a [Y/n] prompt to back out of that and/or a command-line flag (which can then be baked in with a shell alias) to skip that check (and of course that would also be implied by running the code as a module, since Python would know that that’s happening… I think? That shouldn’t require any special-casing in runpy.py, right?)

This does partially point at the motivation for solving the problem, too, though. (Does Pip on Windows advise using py -m pip now?)

More to the point: I’ve been trying to figure out the primary causes why the commands get out of sync. It seems like a lot of people report this situation, especially on Windows - somehow they do have both a valid python on the PATH and a valid pip on the PATH, but they don’t correspond to each other.

(One other monkey wrench I’ve noticed here is people cargo-culting about the use of sudo and about the --user flag to Pip. It probably isn’t readily understood that using sudo runs in a separate environment with its own PATH - and that there’s a shell gotcha involved in verifying that - which will e.g. overrule any virtual environment activation.)

petersuter · October 2, 2023, 5:08am

On Windows, in a normal Python installation python.exe and pip.exe are not in the same location.
python.exe is directly in [PythonInstallDir]\.
pip.exe is in a subfolder [PythonInstallDir]\Scripts\.
This seems to be the root cause of the problem.

In a venv, python.exe and pip.exe are both in the same location, [.venv]\Scripts\.

Would it not eliminate or at least reduce the problem almost completely if that was the case also for the normal Python installation?
Preferably, the Scripts subdirectory would be eliminated and all executables would live in the toplevel install / venv directory.
If backward compatibility is an issue, duplicating them in both locations could maybe work.

pf_moore · October 2, 2023, 11:24am

No. Pip can be run from one environment and “pointed at” another. The --python option does this, but this is relatively new. It’s used by the zipapp version of pip, which is also fairly new. There’s also the --prefix and --target options, which have been round forever.

These days it’s entirely possible to have an environment that doesn’t contain pip, and manage it using pip.

Making pip require interaction by default on every use is a massive change, and is never going to happen. Far too many tools run pip internally to ask every one of them to change.

That’s not correct. It is an option, it’s just that it would cause too much disruption to do it. Less disruption than making pip ask the user on every invocation, though!

Rest assured, the pip maintainers are highly aware of these issues. The current situation isn’t ideal, but it’s the best we have been able to manage given the constraints (which include existing uses, transition and compatibility costs, resource availability, etc.)

hansgeunsmeyer · October 2, 2023, 12:56pm

As I read it, this is exactly one of the things that can cause problems – I was bitten by this recently when I was careless while running an install and ended of up with a library that should have been installed in a different venv – and that motivates Karl’s proposal! Currently pip does tell where it’s installing stuff, so simply reflecting which python it is using and whether the internal python version of pip and the env are the same (in those cases where that info can be easily found), could already be beneficial imo. And I assume this could be placed under the verbose flag, so it would not disrupt any ‘quiet’ use of pip.

I do agree that any interactive use (requiring user input) should never be the default and should be avoided, unless there is a special command line option for that.

kknechtel · October 5, 2023, 4:52pm

I’m aware of this aspect, but I thought the installer adds both paths to PATH when the option is checked.

But if it isn’t interactive, surely it’s too late at that point? Or at least, waiting for an installation and then uninstalling again is annoying, and then it still has to be redone properly.

hansgeunsmeyer · October 5, 2023, 5:06pm

That’s true - but if this a new default then that seems backwards incompatible behavior. I think I would in that case prefer to have it under a new command-line option (which I actually would always be using if it was there).

petersuter · October 5, 2023, 5:17pm

I don’t think merging the paths would not even help much.

A simple scenario:
Install Python 3.X. Add it to PATH.
Install Python 3.X-1. Add it to PATH.
pip probably refers to 3.X-1 because it was added to PATH last.
py probably refers to 3.X because it prefers the latest version.

Probably not everyone uses that. Nothing enforces them to remain in sync. The order of the paths in PATH also matters.

Some more (complex) reasons for a mismatch are mentioned here:

kknechtel · January 22, 2024, 9:09pm

Returning to this thread because a more recent post reminded me about it.

Sure, but my confusion is generally about users who are trying to use python, not py - after all, they deliberately had the Python installer modify the PATH (twice!). I’m fairly sure this is the common case - people running into the issue are unaware of py, and they find that pip list shows something that was just installed, but python myscript.py (or python followed by trying an appropriate import statement at the REPL) results in an ImportError. I’m pretty sure that the reports I’ve seen generally involve people trying python rather than py.

This is interesting, but I’m not sure I understand the exact conditions under which the “App Paths” key is used - i.e., what conditions would cause the mentioned API functions to be used. For my purposes I’m only interested in things that could happen at a command line (but if Powershell works differently from CMD then that’s definitely noteworthy), not e.g. using subprocess.

I just want to point out the magnitude of the problem while I’m here:

Dozens of other questions have been closed as a duplicate of this, because it seems to be the best available option for answering the question. And it’s… not good. The top/accepted answer there is still some cargo-culting about permissions and sudo - and other highly-voted answers are also less than ideal. There are even suggestions to hack sys.path to reach into the site-packages of other Python installations!

Other slightly-less-popular candidates are at least as bad. For example, I found a highly-voted answer on one of them suggesting that BeautifulSoup4 (specifically) should be installed using Pip for 2.x, but the system package manager for 3.x - using sudo either way.

Hardly anyone understands it properly, and those who do understand it properly have a rough time getting their voices heard.

Nevertheless, new Python installations ordinarily include Pip, even if there is another Python somewhere with its own Pip; and the new Pip will default to installing for the Python it came with - i.e., the wrapper executable has such a shebang on Linux, and to my understanding will choose the corresponding Python explicitly on Windows using whatever Windows API tools (since it’s a compiled program rather than a .bat… IIRC). That’s what I was getting at.

Thanks for the references for command-line options to Pip. However, it wasn’t clear to me from the documentation what the exact semantics of --prefix and --target are, how one would choose one over the other, or why one would want either of them rather than --python now that it exists (i.e., why would I want Pip to run using a different Python than the one for which I want it to install the package?). It’s also a little harder to find out about --prefix and --target because they’re specific to pip install - although I guess that doesn’t matter here.

Is there a central piece of documentation somewhere that explains what constraints are relevant and why the situation is as it is? If not, I’d be interested in doing the necessary research etc. for it. But hopefully there would at least be a good place to put it within the Pip documentation… ?

CAM-Gerlach · January 22, 2024, 10:39pm

To this point, I recall the pip maintainers specifically proposing having a single standalone version of pip as a zipapp, which the user would invoke to install into their different Python installations and environments. But that idea, as I understand it, got shelved at least for the time being due to a number of issues and concerns raised by the community. It seems it may partially address some of both the constraints and concerns that are relevant here, perhaps?

pf_moore · January 22, 2024, 11:20pm

Basically, all 3 of the options you mention do different things, and you need to choose the appropriate one for your requrements. If you have no use for anything other than --python, just use that. But --target is often useful for installing packages into a directory that is not the site-packages of any Python installation (for example, a library directory of an app that vendors its dependencies) and --prefix is useful to people who build Python installations and move them in place after building (system distributors, for example).

The docs aren’t wonderful, and they certainly don’t take the form of a how-to guide discussing “what option is best for me”. As usual, PRs improving things are welcome, but you’ll have to read the source to find out how things currently work…

No. But all I was thinking of was things like:

People use the existing functionality, so we can’t simply remove it (backward compatibility, basically)
If we offer a replacement, we’d need a plan on how to communicate the change, how to determine when “enough” users have transitioned so that we can remove the old form, and what to do about people who didn’t transition for whatever reason.
Why is this even worth doing when we have other, far more useful (IMO) changes that we don’t have enough maintainers to do (or even to review PRs for in some cases)?

So basically standard project management questions like you’d need to consider for any significant change to a widely-used open source project running on volunteer resource, but used heavily by commercial users.

It didn’t get shelved, no. In fact, the zipapp has been available (at https://bootstrap.pypa.io/pip/pip.pyz) for a while now, and the --python option was added at the same time. What didn’t happen is that we didn’t pursue the idea of promoting that distribution over installing pip in your Python environment. Too many workflows and tools assume that pip is available. But not having pip installed in your environment is an option that’s fully supported by the pip maintainers - we just don’t comment on whether other tools support it.

kknechtel · January 23, 2024, 12:02am

That’s unfortunate, but I guess the second sentence largely explains the first. Nevertheless, under the new “diataxis” framework, this seems like a clearly missing piece of documentation. I’d like to contribute here - the source code doesn’t seem too difficult to navigate, and I’m keenly interested in the overall problem. (It looks like there’s also a --root option that does something subtly different yet again…) I’d just like to make sure of a few things first:

When Pip runs in Python (whether due to a shebang line or explicitly using python -m), by default it will install to a location appropriate for the running Python, correct?
When pip.exe is used on Windows, does that simply spawn a Python process that imports a Pip module and proceeds as normal? Or is it somehow directly implementing the business logic? Or just what?
The documentation for the --user flag makes it sound as if a single user install directory is used - separate per-user, but common to all Python installations. Is that the case? If not, how does it disambiguate which packages go with which Python? (I don’t use this feature, because virtual environments already provide all the organization I want. It sort of looks like it’s intended to allow for distinguishing multiple versions of Python, but not multiple installations of the same version - such as with virtual environments.)

Okay, so that should be easy enough to describe. But under this heading, I was also thinking of more straightforward technical issues, e.g. “why is there a separate subdirectory for ‘scripts’?”, or “why don’t Windows installations put python.exe in the same directory as pip.exe, when python generally lives in the same directory as pip under POSIX?”. I.e., the original motivations for these decisions (which I’m pretty sure actually are Python decisions rather than Pip ones), rather than the backwards-compatibility burden of changing them.

Good to know, and definitely worth going over in the sort of guide I’m envisioning.

As an aside: a while back (over a year, apparently) I created a “package installation test” module on PyPI, and then promptly more-or-less forgot about it. Originally I was going to reference it in a self-answered Stack Overflow question, but I don’t write those any more (although I plan to do many of those on Codidact going forward). Subject to the appropriate auditing etc. and reworking the documentation, this seems like a good thing to mention in the guide, so that people can test their understanding of the material and verify their environment configuration.

pf_moore · January 23, 2024, 7:43am

Yes, these all seem like the sorts of things someone will have to do research to find out I don’t have the time or inclination to do that research myself, though…