How to help people migrating from pip to conda?

steve.dower · April 11, 2019, 2:00pm

There are two ways to solve migration problems like this: reduce the friction, or provide a carrot (meaningful benefit).

Conda has provided the carrot already, and it works. Generally well enough to overcome the friction (though not always - see the quotes I posted above).

What we’re actually seeing is frustration among non-conda users who don’t have that carrot, but not badly enough to overcome the friction. And the closer people get to isolated, working package installs without needing conda, the more fiction is created.

In short, the “grow up story” from venv-based tools is weak, because you have nowhere to go without restarting. It’s not on any other package manager to solve this - they’re totally free to say “yeah we told you the mistake was right at the start when you relied on the system install, but we’re glad you’re here now, let’s get to work”. They don’t in any way have to pick up where venv-based tools give up and go from there if their argument is that venv was the original problem.

(conda and other system package managers are all interchangeable here. Nobody expects apt/yum to install packages that will Just Work into your highly customised venv, right?)

pf_moore · April 11, 2019, 2:43pm

Well, there are people working on that - but the original focus of this discussion was “drawing a line” to the scope of venv-based tools, which basically means putting an artificial limit on that work (artificial, because it’s based on a principle, rather than on a lack of people willing to do the work). And yet, you’re now saying that if we do that, we have to tell people that there’s no way to go other than restarting?

I think in that case, I’d rather we continue working on solving the issues people ask us to work on, as long as they prefer to continue with venv-based tools over restarting with a toolset like conda.

steve.dower · April 11, 2019, 2:50pm

Either that or just be really clear upfront: “if you start here you have these responsibilities that the tools won’t handle for you”. Along with the other recommendations for users who want those things handled by their tools.

But we’ve come a long way with venv-based tools and they’re still good for most cases. Personally I’d like to see more investment in new libraries with better dependencies that can easily work with venvs, and any C API and performance work on CPython that can also alleviate the need for extension modules everywhere. That’s not packaging, but packaging has the pressure it does because it’s our only way to work around the real problems.

brettcannon · April 11, 2019, 8:54pm

So let’s think about what gets in the way of people migrating from pip to conda? Off the top of my head, the one people would probably bump up against is:

Lack of shared standard around specifying dependencies
Missing packages (versions) between PyPI and conda-forge
Different deployment story

For 1. there are two solutions. One is someone comes up with a requirements.txt → environment.yml translator. The other is something like Structured, Exchangeable lock file format (requirements.txt 2.0?) where we agree to a file format that everyone can work with.

For 2., I think that’s up to conda to fill in the holes in their package offering.

For 3., that’s basically just the way it is as conda is a shift in how you want things managed for you.

So to me it seems like the next potential step in harmony would be to figure out how to come up with agreed-upon dependency declaration format that everyone can work from (and also add anything specific for their tooling such that it isn’t only a common-denominator thing).

cjerdonek · April 11, 2019, 9:20pm

I thought a distinction was made earlier in the other discussion between conda and other package managers, e.g.

Or as @pitrou described it, an “end-user package manager”:

pf_moore · April 11, 2019, 9:33pm

From a personal perspective, what stops me is:

Lack of “community” information about how to use conda (what I mean by this is not so much formal docs, as things like hits on google, StackOverflow questions, etc). This is a critical mass issue.
Packages on conda lagging behind PyPI (as you mentioned).
Existing projects I work on don’t use conda (sure, those are things like pip, pew, virtualenv which obviously won’t use conda, but even so, for anyone working on more than just personal projects there will always be a “what the stuff I work on but don’t control” factor at play).

But there’s still the underlying feeling that I’m being asked to switch. And I don’t want to, I’m 99% happy with my existing tools and workflow. I just want the odd extra package that apparently I can’t have without buying into the whole conda toolset…

For me, the first step needs to be deciding what we are trying to achieve here. Do we want a set of mostly disjoint toolsets, with means of making it easier for people to switch between them, or do we want an integrated story allowing people to use what suits them, without their choice being dictated by incidental issues like the availability of some package or other?

Personally, I’d rather see an integrated solution. But if that’s not possible, I guess I’ll go back to the venv-based side of the fence and work to make that as attractive as possible for people using it. I really have no interest in trying to make it easy for people to migrate to a competitor of the tools I’ve spent so much time and effort developing

brettcannon · April 11, 2019, 9:42pm

Same here, and I think potentially standardizing how we specify dependencies flows into that. Hopefully we can have that conversation since it seems we are all congregating towards wanting that, but we haven’t actually started that specific conversation.

teoliphant · April 12, 2019, 11:21pm

One more thing that gets in the way, is that for many pip users, they really rely on the “developer workflow” which works with their chosen build environment (usually relying on system package managers or mechanisms to install compilers and other building tools).

Conda’s in-place developer workflow is actually pretty weak. Conda is focused on a different use-case. You can use conda to build any software quite well. But, you have to use conda-build and use the right compiler combinations. There is also not support from the community yet for a wide-range of compilers and configurations. When you say “conda-build” what happens is that a new environment is created and everything needed to build (based on the recipe) is installed into that environment. Then, the build takes place (with errors if the recipe is not accurate). Then, conda-build will take the resulting package and install it into a test environment where everything is installed and tests run. This process helps ensure that the package actually depends on what it says it depends on and that everything needed to build it is listed. This can be a pretty heavy-weight process for a developer-centric workflow where the user is more used to a “make” or “python setup.py build” “in-line” installation approach. The incremental approach is hugely valuable during development when you have to iterate quickly and don’t want the heavyweight approach of conda-build. People who use conda-build are also pip users during this development process.

I don’t think conda should try and do a better job of this actually. pip works pretty well for that use-case. When people start using binary wheels with pip it results in two very real problems that can’t really be solved unless you become a general purpose package manager:

the vendoring of “non-python” dependencies. NumPy wheels, for example, do this today (they embed openblas and Fortran run-times in the wheel). This is not ideal as other packages that might also need these things have no way to really describe their dependencies and there is no software that can help negotiate which versions should be installed as to satisfy both NumPy and these modules. It makes it harder for the ecosystem to evolve as the most popular package to vendor the dependency “wins” and everyone else has to deal with the installation headaches.
You are defining a “platform” when you make binary choices. Which flags did you compile with? Which runtime libraries are you using? Which version of compilers have you selected? There is a combinatoric explosion of possibilities. With wheels you are basically having a chaotic conversation about this with every package author contributing to the specification based on what they choose to do. On Windows this can work (though even there I would say it is sub-optimal) because of the lack of developer heterogeneity. I don’t believe you can “spec” your way out of this problem. You have to have a “distribution” that makes these choices and then use that.

What if pip just made --non-binary as true by default on Linux and MacOS. Then, people on Linux and MacOSX would be looking for other solutions to their desire for easy installation of binaries that would not also be slowly forcing pip to become a general-purpose package manager as well as python.org and “pip installable wheels” to be a hard-to-maintain de facto “binary distribution”

njs · April 13, 2019, 12:04am

I mean, that’s how it used to work. People remember what it was like. If we tried to go back there would be riots in the streets. (Or at least in the twitters.)

steve.dower · April 13, 2019, 1:56am

Nathaniel is 100% correct here, but I would like to see it be made easier to set up (and use) more specific platform package indexes.

Basically, imagine many conda-forges but for a range of platforms. Third-party builds of packages that can be consistent within a given index that is tied to a far more specific platform than what PyPI allows. Distros could then pre-set their platform pip to use a specific index before PyPI to get binaries, and PyPI remains canonical for sdists.

As we’ve seen with conda-forge, there’s plenty of infrastructure available for this. But like conda-forge, it requires people and most importantly endorsement as something that’s okay for users to do. PyPI is very firmly seen as the official source of packages and anything else is suspect. Changing the culture around this to have a “best effort” PyPI and a “better effort for platform X” index may well provide the way around having to try and solve all the problems with a single macOS wheel and two Linux wheels.

A distro/user-specified platform tag (rather than always inferring it from environmental information) would go a long way here. If Ubuntu could just set “ubuntu_1804” as the platform tag in their Python/pip package, they can have the best wheels available for their platform.

(And now I’m off-topic for this new thread… do we really have to split out things that are not really side discussions? It makes things harder to follow for me.)

teoliphant · April 18, 2019, 8:24pm

Yes, the PyPA needs to acknowledge there are other communities of packagers out there more strongly, and avoid the implicit (and what I believe to be dangerous) messaging to users that if you aren’t doing “pip install” you are somehow doing it wrong. You couple that with the sometimes over-used mantra of “there should be only one way to do it”, and you have the problem.

That would be a useful feature, yes. At least, then, pip install for a conda-user would have a hope of getting a compatible binary.

cjerdonek · April 18, 2019, 8:50pm

I think an easy starting point for this would be for one or more people to begin proposing PR’s to the repo for the Python Packaging User Guide with the type of language that you’re looking for: GitHub - pypa/packaging.python.org: Python Packaging User Guide

In a comment of mine in one of the other threads, I documented that site’s current references to conda and Anaconda.

I even tried to start that process myself by fixing the two broken links to conda tutorials I mentioned in that comment, but I couldn’t find the new location of those tutorials, if they still exist. (I tried Googling the titles as well as searching around the new location of conda’s docs for those same tutorials, but didn’t find them.)

njs · April 18, 2019, 9:21pm

I’m not aware of cases where pip is installing incompatible binaries… Do you have any links for the bugs?

pf_moore · April 18, 2019, 9:25pm

I’m a little unsure what that means in practice. As @cjerdonek said, if someone proposes a PR to the packaging guide, I doubt it would be rejected - does that count as “PyPA acknowledgement”? But expecting someone from the PyPA to write such a PR is optimistic at best - pretty much by definition, we don’t know enough about alternatives like conda to do a good job.

Hmm, are there any projects from “other communities” (the obvious example being conda and related projects) that would like to be part of the PyPA? I don’t see why that couldn’t happen - and if a project like conda were a member of the PyPA, it probably counts as a fairly strong “acknowledgement”

steve.dower · April 19, 2019, 2:09pm

I’m with Paul here, I don’t actually know what this looks like either. Despite being one who keeps saying it’s necessary

I think more what I have in mind is actually drawing an explicit scope around what tools are meant to do and who they’re meant for.

For example “tool A is for users who want consistent cross-platform environments at some performance/size cost, while tools B+C+D together are for producing a platform-specific environment at greater effort to compile and version-match things” (it’s so hard to be generic here, without making it just sound like a list of pitfalls! This is far from the best example)

But I’m talking about project statements/vision/scope, not documentation. The kind of thing that makes it okay to say “we don’t have to put that in X, people who need it should use Y”, but based on deliberate design choices and user analysis. We do some of that between PyPA blessed tools, which is the right way because it’s hard to rely on tools outside of that, but I think the Linux distros, Homebrew and Anaconda have proven themselves well enough here.

(Sorry for getting off topic by going back to the topic before this was split out…)

pf_moore · April 19, 2019, 5:35pm

Surely that’s up to the tool developers themselves?

steve.dower · April 19, 2019, 6:25pm

Of course, but they’re mostly here in the discussions, and “PyPA” tends to be used as a shorthand (and “Python” is used as the target of complaints).

When people complain that “Python can’t install my package” what they actually mean is either “a third-party developer didn’t consider my needs” or “my tool selection was incorrect”. But that’s hard to explain, and meanwhile people criticize “Python” for not working and avoid the language based solely on rumours of this issue. As a representative of Python, I’m trying to approach this in some way other than simply passing on the blame and claiming that we can’t help.

If that means I have to start telling people that pip is the wrong solution for their packaging problem (which I already do quite often, tbh), then that’s what I’ll do. And this isn’t a pip vs conda thing, since conda is often the wrong solution too. Often wheels are the wrong solution, and sometimes letting end-users use PyPI directly is the wrong solution.

There’s a whole lot of nuance here that people have to learn for themselves, and meanwhile we do or say things that make it seem like certain tools ought to be the complete solution, and when people find out that they aren’t, they blame the tools.

Defining the scope of scenarios that always ought to work is an important piece of this, which is where the conversation started. And defining ambitions separately from reality is critical for real users to understand what they’re getting. Defining personas is important for us to help users see themselves in our scenarios and be able to make better choices up front. And owning this ourselves is essential to keeping it up to date as we make ecosystem wide improvements.

cjerdonek · April 19, 2019, 7:35pm

Have you seen a guide like this you like that’s been published anywhere on the internet? I know I’ve seen various blog posts comparing, evaluating, and recommending different packaging tools (pipenv, poetry, conda, etc).

Guides like this can be useful no matter where they’re published. And it could help discussions here so we’re not starting from scratch.

pradyunsg · April 20, 2019, 5:59am

I think the Overview on packaging.python.org does cover some of this ground.

https://packaging.python.org/overview/

The tool recommendations can also perhaps be improved to discuss this there.