Drawing a line to the scope of Python packaging

Wheels became an agreed standard without sufficient input IMHO. It was emerging as conda packaging was emerging and was then “blessed” too quickly before sufficient time for discussion and discovery took place. This is something concrete that the PyPA can do. Be patient before “officially” blessing new packages and approaches. There was a little bit of discussion around making conda the “official” format very early on — but that was too early as well.

My approach would have been to wait on both counts and only define meta-data standards for what information is provided on PyPI (and possibly a few blessed general-purpose installers). In this way people could have been shipping dpkg, conda-pkgs, brew packages, and many other binary formats before choosing one for every Python user.

Wheels are not a general-purpose binary packaging mechanism at this point but because of Python’s general-purpose nature and the fact that it is used to glue together so many other systems, it will be “backed-into” that use-case encountering all the same problems that conda has already encountered and is still encountering.

My message is really to recognize the impact you have when you encourage a solution and either take on that whole solution or define interfaces and not solutions.

2 Likes

The wheel and conda binary package formats are solving distinct technical problems. Both are necessary.

I don’t think there is such a thing. Conda attempts to be universal on some axes (the breadth of software that could be distributed in conda packages; the breadth of platforms where you can create a conda env). But you can’t use it to install system software, that’s out of scope. If you define “universal” as “supports the full universe of python packages and versions”, then conda is not a universal package manager, and AFAICT the only way it could become one is if it gained first-class support for wheels.

I think there’s a basic misunderstanding here. People want features in pip because they have decided that pip is already the best option for them, and they want those features. You’re not going to convince those people to not want those features, because they should want to use some other solution that doesn’t solve their problems as well as pip does. There’s no global plan for pip to solve some pre-determined amount of packaging problems, it’s just individuals rationally doing what’s best for themselves (and/or their users), as and when they can.

2 Likes

The language summit is about the interpreter itself. Packaging is a separate group of people anyway :slight_smile:

2 Likes

This is a side note. But for Anaconda folks reading this thread, when doing background reading yesterday I noticed that this Anaconda blog post from Nov. 2018 called “Understanding Conda and Pip” doesn’t acknowledge that pip can install packages from locations other than PyPI: Anaconda | Understanding Conda and Pip
For example, the last line of the chart at the bottom titled “Comparison of conda and pip” says the possible package sources for conda are “Anaconda repo and cloud” whereas for pip it is just “PyPI.”

Not sure if I am in the same page of you all, but I think the point is not convince anyone to not want those features. and the same is applied for conda … there is a lot of people that see a lot of benefits of conda already mentioned before.

at this moment conda-forge has 6657 packages … I think it is a reasonable number to be considered.

also a lot of users doesn’t care if they are using wheels, pip, conda … they just use what is described in the documentation and they are happy if it works.

probably a PEP will help to move this conversation forward. I think the community always would be happy with a better packaging workflow.

This is an assumption that can be tested (e.g with a simple poll, “which of these packaging tools did you consider before selecting pip”), but my suspicion is that most chose pip because it’s included in default distros and for no other reason.

Without a better understanding of our user base, we should try and avoid making assumptions about their motivations.

3 Likes

I think I’m seeing at least three distinct threads in this discussion:

  1. What tools should Python / PSF / PyPA (Python) be helping to publicize, point people towards, and/or recommend, etc? (And more generally, what is Python’s role / responsibility here?)

  2. What additional packaging standards are missing that could help Python users and the packaging community? (By standards I mean things that would be adopted through the PEP process.)

  3. Should PyPA be limiting the scope of any tools it maintains like pip (either functionally or in terms of what it advertises that it supports)?

For (1), I think it would be helpful if someone could, as background, find places in Python’s / PyPA’s docs where it currently does that. Does PyPA currently recommend or otherwise publicize any tools not maintained by PyPA?

For (3), I’d be curious if anyone thinks any of pip’s current functionality is out of bounds and shouldn’t be developed further. In other words, is this more a concern about current functionality or possible future expansion of pip’s functionality?

It might be worth breaking out some of these as separate threads, but I hesitate to do that myself now.

1 Like

Also, to add to @njs’s recent comment, there is a topic here about when / where to discuss packaging at PyCon: PyCon US Packaging Mini-Summit 2019

2 Likes

Or they should join the other general-purpose packaging communities (Fedora, Debian, Arch, Homebrew, Chocolatey, pkgconfig…). Each is special in its own way.
I definitely agree that PyPA’s solutions (pip, pipenv, …) should be easier to integrate with general-purpose package managers, but before pointing people to a specific one, there should be a discussion about why the other ones are inferior.

3 Likes

To start answering my own question, I searched for “conda” in the search box of the Python Packaging User Guide to get a sense of how much conda is being publicized in PyPA’s own docs. I found the following four references:

Conda is also discussed in two of the “Guides”:

Agreed. If there’s one obvious way in which the PyPA tools are special, it’s precisely that they are not one of the many options in the list of “general purpose package manager” solutions. And I do agree that we shouldn’t be trying to solve all of the problems that such solutions address. What’s hard is that our users want us to solve those solutions, because there are so few good ways for those users to integrate their use of PyPA tools with the general purpose package manager they use for other software (software that needs the heavyweight solutions that general purpose managers bring to the table). If we had a better story for such users (“use yum/apt-get/conda to install tensorflow, because it has requirements that pip cannot handle, and then continue using your normal tools with the following config so that they can see the package manager owned copy of tensorflow”) then I suspect there would be less pressure for PyPA tools to “compete” with the system-level tools.

I’d much rather frame this discussion in terms of how could PyPA tools integrate better with all of those general purpose tools instead of seeing PyPA tools as “competing” with them.

4 Likes

I think it would also be useful to research how conda promote themselves. Are they framing themselves as a general package manager, working in the same sort of area as homebrew, Chocolatey, Linux distros, etc? Or are they placing themselves as an alternative to pip? Because I see people getting the impression that pip and conda are competitors, but @teoliphant is very strongly arguing here that pip isn’t even addressing the same scope as conda.

To put that another way, if the conda community don’t see pip as a viable alternative to conda, what do they see as alternatives? I’m genuinely confused as to where the conda community see themselves on the spectrum between “language package manager” and “system package manager”.

And yes, it is a spectrum. Pip and wheels are a perfectly viable solution for Python-only packages, where there’s no shared library issues. So there’s certainly a potential niche for package managers that don’t support C extensions. And the question here is where on the line from there to “system package manager” various tools sit.

1 Like

Perhaps what’s confusing you is that you see this as a spectrum with two end points :wink: Conda is neither. It doesn’t act (nor depend, for the most part) on system packages – though you can have a system-wide install of Anaconda, I suspect it’s not the dominant use case, at least among community users (perhaps it’s different among “enterprise” users?). It’s also not language-specific – I think you already understood that.

I’m not sure if there’s a catchy phrase to describe Conda, but perhaps it’s an end-user package manager. You don’t need to be a systems administrator (and you don’t risk hosing your system), and you don’t need to be a developer / integrator either.

By the way, Conda is probably farther from the system than Pip is. On Unix systems at least, Pip generally relies on a system-wide Python install (perhaps symlinked in a virtual environment) that links to many system-provided libraries (e.g. compression and encryption libraries). Conda doesn’t, it keeps the system dependencies extremely minimal in order to offer a reliable execution environment regardless of the system it’s deployed on.

2 Likes

Thanks, that clarifies things (for me, at least :slightly_smiling_face:). So I guess the question then becomes, how does the conda community describe that different role/axis/whatever when describing how conda fits alongside other tools like pip/pipenv/virtualenv and system package managers? And how can PyPA help express that message in a way that lets Python users make an informed choice of what toolset to use?

I think that my outsider’s perspective would be that conda, like ActivePython, is an alternative Python distribution (with the proviso that conda also includes other languages as well). So people make the choice between conda, ActivePython, PythonXY, python.org python or whatever, and then use the tools provided by that distribution to manage their software stack (pip for python.org, conda for conda, PyPM for ActivePython, …). That seems fine to me, but it does imply that people who have chosen python.org python still need pip-based solutions to their dependency issues, so saying “PyPA shouldn’t be trying to address these issues, but should be directing people to conda” is wrong - if anyone should be pointing people at conda, it would be python.org, as they are the “competitor” to conda here.

1 Like

Given that conda and pip have different scopes and so aren’t direct competitors, what prevents a tool like pip from being able to serve the same purpose relative to a conda environment that pip does to a “python.org” Python environment? Then they would be complementary rather than mutually exclusive. Is conda’s current design a monolith in which conda must perform all functions? It seems like there must be some duplication of functionality. Or is there a good reason why it doesn’t make sense for non-conda tools like pip to interact with a conda environment?

It does, but because they don’t totally align on metadata or package sources, it’s a great way to get into trouble.

For example, if you conda install a version of scipy, it will pull down a known compatible version of numpy. This may not be the latest version that’s on PyPI, though, which means if you “pip install --upgrade” something that depends on numpy you’ll upgrade it to a version that is no longer compatible with the version of scipy you have installed.

Some of the work that’s happened on both sides has aligned metadata for installed packages (previously pip would just dump files all over a package installed by conda) and reduced the aggressiveness of pip upgrade.

Ultimately, conda’s indexes are like views over PyPI that are restricted to ensure cross-package compatibility. If you’re using it, you want to prefer this view over any other source, and only fall back to PyPI when you have nothing available. That’s a thing for conda to implement (since they need to feed the PyPI package’s dependencies back into their own resolution process), which is why it appears to completely replace pip.

(As I understand it, this is exactly the same reason you shouldn’t use pip to upgrade system packages provided by your OS.)

3 Likes

This captures the sentiment extraordinarily well - the tension between users pushing pip to be much more than a python package manager, and on the flip side, the package managers that are more general are not as good in some ways at being python package managers.

My hopes for the “external package spec” stuff are centered around facilitating these integrations: helping users understand ways to get what they need, even when it isn’t provided by pip, rather than trying to provide everything with pip. To me, it’s not about pip and conda, per se, but about pip and arbitrary external providers of things that python libraries may need.

I’d much rather frame this discussion in terms of how could PyPA tools integrate better with all of those general purpose tools instead of seeing PyPA tools as “competing” with them

I agree completely with this framing, and I’m glad we’ve gotten to this shared understanding.

4 Likes

One thing that I (as a pip user) would like, is a way to use packages that (for valid reasons) need the extra management of something like conda, without having to switch totally over to a conda-based distribution. Maybe that’s something as simple as pip install conda; conda install tensorflow and have conda sort out all of the details to make tensorflow work in my existing Python interpreter (which might require reinstalling other packages like numpy, to replace them with conda-managed versions).

At the moment, there seems to be support (to what level, I don’t know, but something ) for pip install from a conda-managed system, but no-one is looking at the converse case. And the people on the pip/PyPA side of the fence don’t know how to even start doing something like that, so we work out our own solutions. Maybe conda could help us work on that scenario?

At the moment, there seems to be support (to what level, I don’t know, but something ) for pip install from a conda-managed system, but no-one is looking at the converse case. And the people on the pip/PyPA side of the fence don’t know how to even start doing something like that, so we work out our own solutions. Maybe conda could help us work on that scenario?

This gets pretty complicated and fragile when the pip in use is not managed by conda (as it is a dependency for whatever conda will do). For example, let’s say you have pip provided by a system python, which then installs conda in the user-local .local folder. In order to satisfy, say, tensorflow’s dependencies into a non-conda user environment, the solver would have to act correctly not only on things provided by pip, but possibly also provided by the system installation. It’s easy to envision and provide conda as a foundation for virtualenvs, but having conda act in spaces where it does not “own” the python interpreter will be tricky. I think we’re open to exploring how this kind of thing might work, but there’s a boatload of complexity in there…

2 Likes

So I tested this quickly yesterday in one of our Python channels internally:

Quick survey: what Python package manager do you prefer (e.g. pip, conda, etc.) and how and why did you decide to use it? (e.g. colleague recommended it for me, internet recommended it, seemed reliable, seemed correct, etc.)

The first response came within a minute: “pip, since it comes with Python” (and unfortunately, since everyone can see all the responses, it probably discouraged more people from replying the same).

Other interesting replies:

pip. I started with Conda because it was how my data science prof had us set up our environments. Then I kept having bloat issues and things not being updated properly…

pip. I use Anaconda Python, so conda was my first obvious choice. But I pretty fast switched to pip … sometimes I need to recreate my environment somewhere (say Azure app service) where installing Anaconda is not feasible, so I need to use vanilla Python and pip anyway

Pip, because I am still learning

Pip, because that is what a lot of web instructions specify

Pip. This is what I started with and that’s what felt simplest

There were a couple of condas and a very positive response for poetry, but for the most part pip is just where people started and so they stayed there. I find it very hard to say that people are actively choosing it from a range of options (the actual hypothesis I was testing).

4 Likes