Pip/conda compatibility

BrenBarn · February 27, 2023, 4:59am

Yes, this would be a great benefit. I too am always surprised that it doesn’t do this, and I think it’s something that contributes to conda/pip messes, as accidentally running pip will immediately install stuff without giving the user a chance to go “oh wait I should have tried conda first”.

PythonCHB · February 27, 2023, 6:57am

No, I didn’t know, but I did have a vague memory of this being talked about,and didn’t go and do any research. Call it a +1 for getting these good plans implemented

But I don’t think it’s just resources – there’s also the balance between backward compatibility and moving forward. the Community has strongly emphasized backward compatibility – I think maybe too much.

But this conversation is about “vision” – we can ignore that conflict for now …

PythonCHB · February 27, 2023, 7:50am

Got it, thanks.

True (ish) – but I think that continuing to talk about conda solving “data science” problems without the context you provided does everyone a disservice – it’s not that conda is about data science, it’s that conda is about providing all sorts of non-python components. The Python runtime is an important part of this, but I actually think that’s the least impactful for most users. What’s impactful are all the other pieces, and these fall into two (at least) categories :

non-pure-python python extensions – hard to build wrappers around C libs and the like
Non python components that are needed to work with your project.

I think only (1) gets much attention, and is most tied to the “data science world”

Funny that you should say: “you will want NumPy/SciPy/pandas/matplotlib/etc” – those were an issue when conda started, but now there are wheels for all of the core SciPy stack (everything you mentioned) and many / most related packages. The major exception I know of is the geospatial stuff: notably GDAL. (is geospatial software "data science?)

So why use conda anymore? Other that the geospatial stack, I use it because it makes things easier for package authors if your package depends on other libs. For example, I wrote the py_gd package (GitHub - NOAA-ORR-ERD/py_gd: python wrappers for libgd graphics drawing lib): it’s a Pythononic wrapper around the venerable libgd – so to make it easy for my users, I need to compile libgd, which depends on libpng, libjpeg, ligtiff, (and a few others, I think). I really didn’t want to deal with that on three platforms – and conda-forge came to the rescue.

But I think it’s (2) that isn’t well recognised. Non python components that are needed to work with your project – as a rule, these are easy to get on Linux, require homebrew or the like on a Mac, and can be a real pain on Windows.

Examples: (all using conda-forge)

Many web frameworks use redis for session management (and other things):
conda install redis

(and you can conda install mongodb, too)

Need to use node?
conda install nodejs

I even use it for git sometimes.

Another example that I’ve been wrestling with lately:

pytype uses the ninja build tool – it is a non-python command line program. As it happens, there is a “ninja” package on PyPi (I don’t know if the pytype folks made it, or someone else) – which is a wrapper around ninja, that provides a ninja command via an entry-point. So PyPi / pip solved this problem, but wouldn’t conda install ninja make more sense?

This also illustrates the namespace problem:

The PyPI package for pytype depends on the PyPI “ninja” package. A conda-forge package for pytype was created (with greyskull? not sure) that depends on the conda-forge ninja package – which, unfortunately, while both provide the ninja tool, is not the same thing, and the conda-forge package has been broken for ages. (PRs recently submitted, this should be fixed soon. This can all work, but pip really isn’t the best tool for this sort of problem.

uranusjr · February 27, 2023, 8:33am

This is essentially the same workflow I described, except each command is run manually. I don’t see why the same Do a conda remove --force before pip install is not a viable solution. It is explicit, arguably both makes things clearer to new users learning the workflow, and keeps subsequent Conda invocations more sane.

My impression does not change. This is not particularly disruptive IMO. And I even feel those disruptive scenarios are good since they current rely on implicit behaviours that should be made explicit.

rgommers · February 27, 2023, 9:05am

I have already explained the downsides in my last reply a few posts up. That should be more than convincing enough. I can add yet another reason: it changes a local development workflow into one that requires network access, so working offline (e.g., during travel) won’t work anymore.

Here are the docs for this option:

  --force-remove, --force
      Forces removal of a package without removing packages that depend on it.
      Using this option will usually leave your environment in a broken and
      inconsistent state.

That is not something one wants to teach as a recommended workflow. It’s just not. I’m hearing “it’s fine for this other tool that I don’t use IMO”, and I can’t say I like it much.

pf_moore · February 27, 2023, 9:27am

Surely that’s just a conda bug to be fixed?

Given that what you are doing is transferring ownership of a package from one manager to another, it’s hard to see how either manager can do the whole job here. If 2 commands is problematic, a utility script to run them both, with proper protections and rollback, seems like a reasonable thing to create.

Sure - I’m not going to disagree here. But it’s a reality that we have to deal with and can’t wish away. In the long term, if someone were to write a new tool that supported just just one of pip’s use cases, I’d be happy (although I’m sure it would prompt yet more “too many tools, too confusing” comments). But no-one is doing that, so we have to deal with what we have.

encukou · February 27, 2023, 9:46am

Oh, they can keep up. That is, they can keep up with their paying customers. I don’t know about Anaconda specifically, but I’m pretty sure that if you pay them enough and request that they include a package for you, they will do it.
Of course they can’t keep up with supporting their non-paying customers. IMO, a self-serve community playground (Anaconda’s conda-forge, Red Hat’s Fedora or EPEL, etc.) is the best solution here: you can still use stuff if you don’t pay to outsource all the maintenance, and the company gets users familiar with the technology so they’ll reach it when they do have the money, and start looking to pay for maintenance.
(Unfortunately, capitalism demands that there’s an evil-sounding business justification for anything nice.
Business-wise, community package repositories are a lot like free licenses for students. But unlike those they’re, IMO, OK ethically – if they weren’t I wouldn’t be working at Red Hat.)

encukou · February 27, 2023, 10:01am

Christopher H.Barker, PhD:

Non python components that are needed to work with your project – as a rule, these are easy to get on Linux, require homebrew or the like on a Mac, and can be a real pain on Windows.

Examples: (all using conda-forge)
Many web frameworks use redis for session management (and other things):
conda install redis
(and you can conda install mongodb, too)
Need to use node?
conda install nodejs
I even use it for git sometimes.

Yup, $FOO install redis is useful.

FWIW, it’s becoming easier to get Linux on Windows – either WSL for development, or containers for production-like environment. And on the Linux side, there are efforts to use, essentially, entire OSes as conda/venv-style isolated environments.
In my neck of the woods there’s toolbox. GitHub/VScode has dev containers.

pf_moore · February 27, 2023, 10:05am

Frustrated rather than baffled. Ultimately, sponsors can choose what their money gets spent on, whether we like it or not. What we can do is add things to the “funding opportunities” list. Unfortunately, implementing PEPs 643 and 658 wasn’t on that list until recently^[1].

I created the PR, but it got delayed and I forgot to chase it. My bad. ↩︎

pf_moore · February 27, 2023, 10:46am

I also don’t use conda so you could accuse me of the same thing. And maybe you will. But I know that I’m not trying to say that, what I’m trying to do is explore the options here, and what I’m hearing is getting worryingly close to “your tool has to change because I won’t change my workflow”.

I don’t think that either impression is correct, and I don’t think that either attitude would be a reasonable one to hold. But I do think both sides need to accept the need for some flexibility.

As I said above, I think we need to look at what we’re trying to do here. Your workflow sounds to me as though you’re trying to switch the management of numpy from conda (which manages the production version) and pip (which manages the in-development version). I don’t think that either tool can do both parts of that role, so having a dedicated tool/script that does it makes sense to me. Even if it’s only a per-project utility, rather than a standalone tool.

How? Nothing changes on the pip side, so I assume conda remove --force is the thing that needs network access. And I don’t understand why it needs that, so I can’t really comment, beyond saying “isn’t this something conda could fix?”

pf_moore · February 27, 2023, 11:02am

Windows has scoop install redis or probably choco install redis. It’s incredibly convenient, but it doesn’t particularly need to be handled by the same tool that installs your Python packages, unless you like the “one tool to manage everything” model. So I’d count that as a case of “conda is useful for this if you like conda”.

Trying frantically to bring this back on topic(!), a tool that installs everything (Python packages, native libraries, standalone utilities) is very convenient and useful, and many people like such tools. But many such package managers either don’t cover Python packages (just like they don’t cover node packages, rust crates, and a bunch of other things) or they do, but they only have limited coverage.

Python package managers like pip need to work with such environments, to fill in the gaps, but they also need to work in environments where users choose to use a variety of tools, each to install a subset of what they use (e.g., rustup for rust, a dedicated installer for Visual C, scoop for git, Windows Store for Python, …).

Solving the “unified environment” problem is without a doubt simpler, and solves the needs of many users. It probably would work for a lot of people who currently don’t use a “unified manager” for whatever reason^[1]. But historically, there have always been Python packaging tools whose remit is to support everyone (with varying degrees of success, there’s not much we can do to help a S-390 user with no C compiler who wants to install Pillow, for example!) Unless we explicitly say we’re going to stop doing that, that is the role the “PyPA tools” play. And that brings a very different set of challenges that we’re trying to deal with here.

We can learn from conda (or apt, or spack, …) and we can interoperate better with them, but we can’t limit our scope to the point where their solutions “just work”. Otherwise which manager you use really would be just a matter of personal preference. And then “I’m sorry, I just don’t like conda” would be an absolutely valid reason for using pip

Including reasons as superficially weak as “I can never remember the right apt command” or “I don’t trust Anaconda”. ↩︎

rgommers · February 27, 2023, 11:15am

Not really, quite the opposite. I think the Conda community would much prefer stability and pip not to change in ways that are regressions for conda users. And I’ve heard the exact same thing from the Spack team.

Good question, what are we trying to do here? I’m failing to see the point of the last N messages on conda-pip interaction. It’s not directly related to the “singular packaging tool/vision”. It looks to me like it’s mostly non-conda users suggestion changes to conda workflows to either fix some problem that they don’t quite understand, or to argue for changes in conda so they can make breaking changes in tools like pip.

The dynamic is extra weird because after the suggestion is made, and I explain the downsides and my assessment that it won’t fly, the next person who also doesn’t know conda or its workflows comes along with a “surely you’re wrong and this is logical to change”.

@steve.dower asked the right question I think, way up in this thread, which is for the conda community to write down what they’d like to see. I think that’s a much more fruitful way of spending time.

Exactly. It needs to download metadata for the channel it’s using (~15 MB or so for conda-forge - that gets cached of course, but the cache invalidation time is short), to be able to do a new package solve. Separately downloading metadata is inherent to the whole design, not something to be fixed.

pf_moore · February 27, 2023, 12:04pm

Thank you. That’s a (sort of) useful observation. The reason I say “sort of” is that everyone wants pip to not break their workflow, and if we accept every such point, pip is paralyzed and we cannot make any progress. Hence my point that we need some flexibility on both sides.

I checked the thread history - it started trying to establish how (or if) conda can make use of existing mechanisms rather than asking for yet more interoperability features to allow other tools to “work with” conda. And if they can’t, work out why not and how to avoid going round this loop in the future (because we tried to design these features to work for all package management systems, not just for Linux distros).

Cool. I agree. But in addition I think we need to understand what we should have done differently to get conda’s input on the existing features. Personally, I’m unclear why we’re now seeing so much pushback from conda users, when we’ve always had so much trouble in the past getting any engagement on questions like “how would this affect conda”? I don’t want to debate who’s to blame, just to make sure it doesn’t happen any more. But I will admit that I’m getting quite frustrated at being told now that what we have in place won’t work, when it would have been so much easier to handle such feedback when the features were being designed.

For what it’s worth, this sub-discussion appears to have started from a comment by @ncoghlan that ironically enough started with

I think we should probably either drop this sub-discussion or wait until someone creates such a new thread (either by waiting for the admins, or by manually positing a new thread).

PythonCHB · February 27, 2023, 9:11pm

Well, there is no one “conda users” to speak, so that’s a bit tricky. For example, I think @rgommers and I have different ideas about how easy it should be to use pip inside a conda environment… he’s started a thread on the conda discourse about this – it hasn’t gotten much traction, unfortunately.

But I think I’ve (as the opinion of one person fairly invested in the conda community from early on, but not a conda developer) made an apparently not clear TL;DR:

Don’t do anything specifically for conda, rather:

Don’t make the assumption that pip is being used as the primary package/environment manager

That’s it – and most of this is not about capabilities, but rather defaults:

In conda-build, conda can control the settings, so pip behaves as it wants – notably, it doesn’t allow pip to install dependencies inside a conda-build script.

But outside of conda-build, pip’s defaults are sub-optimal for use within conda: notably installing the dependency stack, and without any warning. This is often fine, as conda packages make a point of installing python packages in a way that pip will recognize, but if that package is not already installed by conda, pip will install it, and perhaps make a bit of a mess of the conda environment.

Which is what concerns me about PEP 704 – it is making a major assumption that Python package managers are all being used within the Python package management / environment world. [*] With an easy way to turn it off / change the defaults, that would be fine.

I’m not entirely sure (I haven’t dug deep enough yet) as to what pip does if EXTERNALLY_MANAGED is there – I think it may be too focused on the *nix distro use case – but I do think that’s not a bad way to go: have a way for conda and the like to change the default behavior of pip (and other package builders, installers). So that may be a solution to many of these issues.

Other than defaults – the other issue, and this is very much out of PyPa’s hands – is documentation / tutorials / etc scattered about the internet. Many of them assume pip-with-venv-without-conda (the “standard” setup) – so they write docs that show that workflow.

When a conda (or spack, or ???) user follows those instructions, they may get a mess :-(. (or it won’t work, if my vision is realized).

Reasonable enough.

[*] As I’ve mentioned in that thread, I’m not thrilled with teh assumption that virtual environments are a best practice for all use caese, either, but that’s not this topic.

Part of the challenge is that pip does a lot: it manages packages, it acts as a front end for package builders, it does that all at once transparently (building from an sdist on PyPi), I’m not sure what else.

To be fair, it does a liot less than setuptools did – but it’s still a bit intermingled.

rgommers · February 27, 2023, 10:18pm

No one is asking for that though? Certainly not Conda or Conda-forge maintainers as far as I can tell.

Again, pushback on what? I’m only pushing back on two things: ill-informed suggestions to break existing workflows, and on the breaking part of PEP 704 (which I believe you also agreed in the other thread is no longer needed, PEP 668 is fine).

I think what you are seeing is lots of conda users participating in general. Which isn’t strange - the results of the survey say 25-30% of all users use conda, and this set of threads tries to engage a fairly broad set of participants. Hence, lots of conda users around. And they have opinions, and share things like “that works differently in conda” or “virtual envs are not the only user-managed envs”. That’s not pushback imho.

I’ll also note that I did review PEP 668 (and most other recent packaging PEPs), asked for some clarifications, and then was fine with it - seemed either harmless or a useful knob to have access to. And it indeed turns out to be. The other features are not “won’t work, we need something new” - they’re just not that relevant.

fair enough

pradyunsg · February 27, 2023, 11:15pm

… configuration files?

OK, I’ll use more words: is there any reason this changing of default behaviours is something that pip needs to infer and do, rather than those various tools doing things instead? Like, Conda has multiple knobs available to it today that it’s not using to protect users from various workflow issues that come from mixing package managers, and it doesn’t use them. Is it the responsibility of pip maintainers to deal with that?

PS: The PyPA is too loose a group of volunteers to justify a single motive/direction to, at least as things stand today.

steve.dower · February 28, 2023, 12:24am

I’ll use some more words too, and some code, since I suspect this is a fairly unknown feature: pip/src/pip/_internal/configuration.py at 56e5fa3c0fd0544e7b5b9b89d9d7854b82d51242 · pypa/pip · GitHub ^[1]

Yes, pip looks in sys.prefix (and sys.base_prefix^[2]) for a pip.ini or pip.conf file. This means that if you control the Python install, you control the default settings for pip.

I use this in the Windows Store install of Python to make pip use user site packages by default, since the system site packages are very read-only.

Any Conda package at all could install this file as well (i.e. the pip package could). It’s fully intended to be used by the one who controls the runtime (users have their own locations, as well as environment variables and command-line options).

So there’s certainly no need for pip to detect a Conda install, just as there isn’t really a need for pip to detect a Linux distro or “isolated” environment. The only obligation is for pip to ensure that configuration files can be used to override any defaults, which I believe is their standard policy anyway.

I assume there’s just very limited awareness of this feature, which is why people believe they have to influence the default setting. So here’s some more awareness

I link to the code rather than the docs, because the docs make it seem like a venv-only feature. It’s not - it was deliberately intended to work for any environment. ↩︎
Though personally I think it shouldn’t do this, which is why I only contributed sys.prefix originally. But I guess whoever added support for base_prefix had a good reason. ↩︎

PythonCHB · February 28, 2023, 2:19am

I was trying to make the point that conda in particular (and others in general) are useful well outside the “data science” world, so I think documentation should reflect that. That’s it. Whether you want that is an entirely different question.

Well, how about “conda is useful for this if you like a cross platform way to manage your whole stack”?

Bringing this back on topic:

One thing I don’t think I’ve seen in the thread (forgive if I missed, it – 244 posts long now!) is an articulation of what problems we want pip (or whatever) to solve in the long run?

As has been mentioned, conda was developed partly because the core Python folks said that they were not interested in solving the problems some folks had.

Since then, pip+friends has evolved – adding binary wheels, adding manylinux, going through pains to make functional binaries available (the whole scipy stack) – and there are proposals to do more (I can’t find it now – recent discussion about including shared libs).

In some ways, I think expanding the scope of pip / PyPi is a detriment to the community, at least if’s not done deliberately with a plan.

PythonCHB · February 28, 2023, 2:21am

Awesome, Thanks! – I was certainly clueless – now off to the conda community to discuss making better use of this feature!

brettcannon · March 1, 2023, 2:00am

FYI I’m just recovered from COVID-19, so I don’t have the energy to try and split things out carefully. Best I can do is take Wanting a singular packaging tool/vision - #196 and break out every reply to that comment and on down into a separate thread on conda/pip interoperability. If people are okay with that then I can do the thread split.