Pip/conda compatibility

PythonCHB · February 26, 2023, 5:42pm

Hmm – to some extent this is simply unavoidable – newbies will get confused, and of course, more so when things change.

Look at at the number of REALLY basic questions based on misunderstandings that we get over and over again on “help” fora.

But I think this could also be helped by better, ironically less informative, help message, with “how to override” right at the top:

“”"
This environment python environment is externally managed.

If the package is not available in your system package manager, pass the --force flag:

python -m pip install --force package_name
‘’'"

Though from the question linked, it looks like that’s a linux-specific error message, so up to them what they do, and conda could do something different.

(and they should put “THIS IS NOT PYTHONS FAULT” up front

[Another lesson – requiring virtual environments is going cause a lot of questions!)

NOTE: Is there a force flag? I can’t seem to find the documentation for EXTERNALLY_MANAGED.

As pointed out–the big difference between LInux distros and conda is that conda provides its own environment system. But even so, I think pip should do as little as possible that conda-specific – it should just get out of the way and let the conda folks figure it out.

HMM – I just realized that Linux distros and conda might want to do exactly the opposite with a flag:

Linux distros want to enforce the use of a virtual environment
conda wants to prevent (discourage) the use of virtual environments – yet another flag? A

ALREADY_IN_AN_ENVIRNOMENT ?

sigh.

PythonCHB · February 26, 2023, 6:21pm

This is true, and mostly conda’s problem to manage, but it would be nice if there was an easy way to tell pip to let an external system handle it.

My advice to conda using folks that need a package is:

try to find a conda package first – you ALWAYS want to install something from conda if possible
if it’s really not available as a conda package [*]
- pip install it with --no-deps
- try to use it – if there are missing dependencies, lather, rinse, repeat. (you could probably be even smarter and do a dry run to see what pip wants to install)

Is there a way to automate that? It’s a major challenge, but a first step would be for pip to know that it shouldn’t just do the normal thing in this situation.

A note about the --no-deps: the real challenge with folks installing a single package into a conda environment with pip is that pip will, of course, try to resolve the dependencies – conda package builders do their best to make sure conda-installed python packages are pip-compliant, so if the dep is there, pip will find it. but if the dep is not there, pip will install it, and that might lead to a whole stack of dependencies, many of which may be available via conda – but, of course, pip doesn’t know that (nor should it).

All this is particularly challenging because the conda and pip namespaces are distinct: pip has the PyPi namespace – so package names in PyPi are guaranteed to be unique. conda has the conda-forge namespace (which is coordinated with default) so if you use those channels, you’re OK. But there is no clear match-up between PyPi and conda-forge packages.

Many package have the same names, e.g. numpy, requests, etc, but some do not. And some have the same name but it means different things, e.g. the PyPi name may be a python wrapper around a C lib, and the conda name is the C lib itself.

And packages with the same names could be built quite differently: PyPi wheels generally have the libs vendored, conda packages rely on another package for the libs, conda packages may included more or fewer optional features (does pyopengl-accelerate come with pyopengl?)

This may be an intractable problem that can’t be solved without some human intervention. Would a solution that mostly works be better than no system? I’m not sure about that.

[*] Ideally I encourage people to contribute a recipe – but that is a big lift.

PythonCHB · February 26, 2023, 6:29pm

quote=“Ralf Gommers, post:207, topic:21141, username:rgommers”]
A large fraction of packages on PyPI is pure Python, and those packages are compatible with Python itself and packages with extension modules from other package managers.
[/quote]

Yes, but: It’s not just the package itself – it’s also its dependencies – in a conda environment you don’t want to just pip install even a pure python package if it has any dependencies that may be available in conda. Or any of the full dependency stack.

grayskull makes it pretty easy to wrap a pure-python package in a conda recipe – and that could be done on-the-fly automatically, but that’s not foolproof without a human eye on it – dependencies don’t always have the same names in the two systems

PythonCHB · February 26, 2023, 6:51pm

Sorry for being thick – what is a “DIY” environment?

Also – while conda was created due to problems that the scientific computing community has – it can be very useful for a lot of other use cases as well [*]. There is a general impression that “conda is for data science, and only data science” – which I would rather not propagate in PyPa docs.

In fact, due to some heroic efforts, there are now binary wheels available for the bulk of the “scipy stack”, so:

“wheels are readily available and reliable for the projects beginners are most likely to install” for data science as well.

The problem is that they hit a wall later on, when they want to use something that isn’t pip-installable, and now they have to learn a whole new system

[*] – In my shop, we use conda for pretty much everything – we find the deployment of web apps to work well with conda – even those that don’t have a shred of the scipy stack in them. I did have to spend a couple days getting the Pyramid stack up on conda-forge, but it’s now been a couple years since we’ve had to pip install anything.

PythonCHB · February 26, 2023, 6:55pm

I think this has been covered, but to put it simply in one place:

conda is different because it provides isolated environments.

BrenBarn · February 26, 2023, 7:21pm

I saw you mentioned that on the conda forum, but maybe it’s better to reply here as I think the issue is more about what pip (or other non-conda tools) should do in that situation.

Basically, I agree that that is a valid thing to want to do, but I don’t agree that it’s necessary to have that work without the user supplying some kind of --go-ahead-and-break-things flag, and it’s not clear to me (neither from your comment here nor on the conda forum) whether you’re saying such a flag would need to be given.

To me the normal, non-breaking workflow sequence is “a package is always installed after all its dependencies”. If you’re deviating from that (e.g., by swapping in a dev numpy version “out from under” other things that depend on numpy) you’re making an explicit choice to violate the dependency guarantees, and in that case you should pass a flag explicitly saying so.

rgommers · February 26, 2023, 7:39pm

That would just be gratuitous breakage of what works today, plus very annoying UX. Requiring --break-system-scary-flag for a regular development workflow is a terrible idea.

Not at all. That development version still gets built as a wheel, with dependency metadata. If that metadata is in conflict with what is installed, then pip will refuse to install the wheel.

BrenBarn · February 26, 2023, 7:47pm

There are two changes that I think pip (and other installers) could make that would ease this.

The first one is that pip could provide a way to show the plan for what to install without actually doing anything. As far as I can tell there isn’t any way to do this. It looks like --dry-run now exists but that will download a bunch of stuff and only at the end tell you what it was going to install, whereas conda will show you its whole plan before doing anything. I suppose this is gated on separating the metadata from the actual package, which conda does and Pypi doesn’t, and, hey, guess what, that’s another case where I think the conda way is objectively better. (Just to forestall any further debates about this, I’m not saying we should immediately dump pypi and switch to conda-forge, but that on a conceptual level it is better to be able to get the metadata without the package than not to be able to do so.)

The second one is that pip (and probably most other installers, including conda) would need to grow some substantial abilities to communicate more effectively with other installers. That means things like providing detailed output about their internal logic in some form like JSON, so that they could pass this back and forth to delegate as appropriate. Conda has a --json option for many operations (e.g., for installation); as far as I know pip only has it for showing the current environment state. But both would need to be enhanced.

What I’m envisioning is development of some sort of meta-protocol where a command like conda install obscurepackage would look for the package in conda channels, and then if not found, could delegate to something like pip getdeps --machine-readable to get pip to give it a machine-readable list (in some defined format) of the deps pip would install, so that conda could install those itself before delegating back to pip.

Obviously this would require a lot of work from many projects, but in terms of “better installer interop” as discussed way back earlier in this thread, I think it would be the most thoroughgoing approach. The current system basically has installers communicating through info files that are left on disk in various places, which is a lot more limited than actually letting the installers have a live dialogue with one another about who should install what.

BrenBarn · February 26, 2023, 8:06pm

Let me back up a bit here. Let’s go to the main Python docs page. There are two links from there relevant to packaging, one to a page about installing stuff and one to a page about distributing stuff. The “Python packaging user guide” isn’t even linked from there! And neither of those two pages that are linked mentions conda, nor poetry, nor many of the other tools that people use.

If you go into the “distributing” docs page, then way down at the bottom, there are some links, not actually to the packaging guide main page itself, but directly into its sub-pages. Some of those pages mention conda or poetry or other tools, usually way down at the bottom in a section that says “you can also try this”.

As you say, packaging is hard. I get the impression that most people here do not think it is an achievable goal to have Python provide, by default, a solution that is clearly superior to all existing alternatives. All I’m saying here is that, if indeed we can’t do that, then at least what we should do is have it so that, when someone looks for packaging-related info in the official Python docs, the very first thing they see is something like: “There is no single solution, process, or toolset for Python packaging. A wide variety of tools and package repositories exist in parallel, maintained by independent groups. This documentation describes one particular workflow that is widely used. If you are going to be working on a project already begun by others, they may be using a different workflow, and you should check with them first about that. In any case, you may want to explore other options that are also widely-used, such as. . . [list of poetry/conda/PDM/etc.].”

Own it, loud and proud! Ideally, of course, there would actually be some information there about why someone might choose one tool or another, but even just a statement of the situation would alert people to the fact that they should gird their loins before embarking on a Python packaging expedition.

BrenBarn · February 26, 2023, 8:20pm

That’s fine for the things that your dev version depends on, but what about the things that depend on your dev version? You’re installing a version of numpy that is not known/declared to be valid for SciPy (or whatever else you have in the env) to depend on, so there’s potential breakage. (Or am I misunderstanding your example?)

I think it might be less scary if it were --break-environment rather than --break-system. Surely there’s some expectation when testing a development version that you might break something, so that doesn’t seem crazy to me.

PythonCHB · February 26, 2023, 8:40pm

That’s a good idea for other reasons as well, so +1

Perhaps surmountable, but a key problem hwe is that different packaging systems are, well, different. So there is no way to know that the “foo” package on PyPi is the same as the “foo” package on conda-forge – even if the conda package is a packaging up of teh PyPi one, it might be different in various subtle and not so subtle ways.

This would be a lot easier if conda-forge had set up a namespace system: e.g. pypi-foo would be defined as the “the foo package on PyPi”

(there has been discussion about this a lot in the past and present – currently for julia packages)

Maybe that could be addressed in the future, or there could be some registry of names, or … but it’s a bit of a mess.

PythonCHB · February 26, 2023, 9:04pm

or just “–force” or “–yes_i_want_to_do_this” or “Im_managing_my_own_environment_thank_you”

I understand the reason behind using a scary sounding flag, but really – as long as folks aren’t using the default behavior, they shouldn’t need to be scared off.

pf_moore · February 26, 2023, 9:06pm

As you know, pip cannot currently show the whole plan before downloading anything because:

Metadata is stored within distributions (we have standards that fix this, but they still need to be implemented, as you point out)
Sdist metadata can be computed at build time (we have standards that fix this as far as we can, but they still need to be implemented)
Some sdists compute metadata at build time in a way that cannot be avoided - conda (I believe) “fixes” this by being a binary-only install system. We can’t go quite that far without major disruption to the ecosystem, but we are going as far as we can to encourage publishing binaries and static metadata.

The only ways we won’t achieve this are in places where pip (and PyPA tools in general) are addressing situations that conda doesn’t support. And unlike conda, we can’t ignore those cases, because our user base (unlike conda’s) relies on those features. At least to some extent, and until we understand how critical the requirement is, we have to continue supporting it. We are trying to establish what flexibility we have in this area, but it’s neither easy or quick.

So basically, everything needed for your first suggestion is in progress.

pf_moore · February 26, 2023, 9:11pm

Thanks for the support, but you do realise that you’re voting +1 for something that we’ve all already agreed will happen, and it’s only blocked because we don’t have the resources to implement it, don’t you? (I’m not being sarcastic here, I feel that it’s a genuine problem that people sometimes don’t realise how much better than they imagine the situation will be once all of our existing agreed standards have been fully implemented and rolled out - even if we never made any further progress).

rgommers · February 26, 2023, 9:33pm

Yes, I think you are misunderstanding. If for example the metadata of the conda-installed scipy says numpy>=1.21.3,<1.27.0 and I am building for example 1.25.0.dev0 in a conda env with the conda compilers, then there is no potential breakage beyond the effect of the changes that I’m actually intending to test.

I’m still -1 on that, no matter what the name is. I really have zero interest in seeing anything like that. It’s just churn and breaking existing workflows, and making the UX of the installer tool worse.

steve.dower · February 26, 2023, 9:55pm

Is there really a problem with uninstalling numpy from the env before you pip install the dev build? Conda already has a flag to uninstall just the single package and not dependencies/dependants (I think it’s just the --force flag, but would have to double check that).

If someone is at a level where they can modify numpy and get it to build, uninstalling a package carefully is going to be a comparatively easy ask.

BrenBarn · February 26, 2023, 10:06pm

I guess another way to look at it is that maybe it’s just called --no-build-isolation so you’re already passing it? I just need to think about what other situations someone might use that in without intending to do something that might break the environment.

The main thing to me is this: there is a range of commands that people are going to type to install a released, known-working version of numpy (most basically pip install numpy). What I have zero interest in is increasing (or even leaving constant) the risk that those will break their environment. Packages are installed far more often than they’re developed. I just see it as a perfectly fine tradeoff to disrupt some development workflows if by doing so we will reduce the likelihood of breakage for the colossally larger number of situations that are just innocent attempts to install a working package.

In my view, for a default package installer provided with Python, “install a working released package and get a working non-broken environment” is the base use case to which all others must give way. I recognize that there are myriad situations in which people need to do myriad other things; it’s just not clear to me that “the normal behavior of the default tool provided with Python for installing working released packages” needs to be the way to do them.

But I’m coming around to the idea that --no-build-isolation is already enough of an admission of advanced-ness, even if its name doesn’t say that so obviously. (I suppose in some sense, the more mysterious the name is, the less likely people will use it without intending to do so. ) It might help if the listing of options in pip install --help made a clearer division between options intended for everyday end-user use, and those that are only for development and thus inherently carry a higher-level risk.

rgommers · February 26, 2023, 10:55pm

The docs say so, but it doesn’t actually work. I just tried in an env, and it wants to change a ton of stuff. If I upgrade everything and then try again, it still wants to uninstall matplotlib and a few other package if I type mamba uninstall --force numpy. Regardless, it would make the UX now 2 lines instead of 1 if I remember - and probably fail to install after a long wait for a build to complete if I don’t remember. It also leaves the environment in a broken state, so if the build fails or you simply get distracted and go do something else, you must now go reinstall numpy. At which point other packages in your environment may also change. It’d be incredibly poor UX.

The crux of the matter is that pip is used as a developer tool, in multiple ways, and not only in virtual envs. pip simply has multiple use cases that are badly separated. Breaking the UX for one use case because of some assumed breakage in another - which is nothing like that in a Linux distro, neither in probability nor in impact - is not a good idea.

There are probably ways to improve things, but this ain’t it. Another illustration of that: when I explained PEP 704 on the conda chat, a quick answer was (paraprasing) “I don’t know what this is all about and I don’t really understand the PEP process, but pip no longer installing into conda envs is a complete nonstarter”.

I’m sorry, but you’re just making things up that sounds scary at this point. This does not happen. If the package is not already installed, it will be installed and it will work. If it is already installed, then pip will immediately exit with Requirement already satisfied.

If you try hard enough then sure, you can break an environment. But that’s true anywhere. The options for tweaking conda-pip interaction have now been laid out, multiple times. It looks like EXTERNALLY_MANAGED for the base env will be rolled out. And the rest is not appropriate (cure worse than the disease). It looks like the conda community is aiming for smoother integration (a lot of work already went into that over the years) + end user education. Attempts to force the opposite to happen, by making pip not work at all by default in conda envs for example, by folks uninvolved in the conda community are not all that helpful imho.

ncoghlan · February 26, 2023, 11:17pm

“Do it yourself” environments. As noted in The Python Packaging Ecosystem | Curious Efficiency, conda covers managing Python packages and the Python runtime and many external dependencies. The DIY path involves providing your own solutions for the latter two layers.

As far as the “data science or not?” question goes, the relevance is that outside that domain, the more general tooling that doesn’t much care how you obtained your Python runtime is already up to the task of providing a good user experience, since the relatively small number of common external dependencies are covered by projects publishing relevant wheels. Within that domain though, the probability that you will want NumPy/SciPy/pandas/matplotlib/etc becomes a near certainty, massively increasing the likelihood that you will run into the problems that conda solves.

However, It also isn’t PyPA’s (or even CPython’s) role to dictate to people how they obtain their Python runtime - we can only indicate which approaches may create more or less difficulty in particular scenarios, and provide suggestions to both new and experienced users that allow them to find a work flow that is suitable for them.

As @BrenBarn has noted, there are places that new users will potentially hit early in their user journey that aren’t asking them that “Is designing your own approach to managing external dependencies really something you want or need to be doing?” question, contributing to the risk that they head down a path of avoidable frustration. (e.g. I don’t think anyone has really touched the in-tree CPython packaging docs since shortly after the creation of packaging.python.org).

CAM-Gerlach · February 27, 2023, 2:36am

Yeah, pip’s done pretty much everything it realistically can here on not having to pre-download packages to present a package plan. However, IMO that’s actually quite a minor issue in comparison from the perspective of providing similar UX benefits to Conda, as in my experience a typical Conda-Forge repodata download and requirements solve (at least without Mamba) takes substantially longer than pip downloading and inspecting any uncached PyPI packages anyway.

What I think would be a huge UX improvement for pip in this regard, something I’ve always really missed from having it in Conda, would be pip displaying the package plan from --dry-run by default before giving the user the y/n option to install (with a config option, env var and CLI flag to proceed without asking—or vice versa, if you don’t want to make this the default at least initially)—basically, the same thing it already does when uninstalling.

This is a real lifesaver that would have saved me from so many typos and mistakes over the years messing up my virtual/conda, user or base env by not having an env activated (or the right env, or it being deactivated due to a bug), accidentally installing the wrong package or version, and other issues.

I can crudely emulate this by running the install command once with --dry-run, checking the output and then re-running the install command without the --dry-run flag but it its a significant hassle remembering (and bothering) to do it every time, wastefully inefficient as it requires two runs of everything but newly-cached downloads and the final install, and can suffer from mistakes, typos and race conditions. And of course, this doesn’t help the 99% of users that won’t know (or bother) to use it—of course, many won’t actually check the package plan anyway, but at least many will once they get a little more experience (especially if it bites them once).

What I’m baffled by (and I’d wager you are too) is why the “powers that be” are apparently spending the PSF’s limited financial resources trying to come up with a master plan for improving the packaging neighborhood and attempting to herd cats to agree to it (or really, just poking them and expecting them work it out among themselves) when their very own town hall on which the entire neighborhood depends is in desperate need of some long-delayed basic renovations and maintenance, without which successful execution of any master plan would be impossible.