Drawing a line to the scope of Python packaging

My perspective is that there is a requirement from users that they can easily install packages (and from projects - it’s not like a project that can’t be installed is much use!). For better or worse, a lot of users (particularly in Windows environments1) don’t have the tools available that are needed to build projects they want to use - and it’s not reasonable to expect them to, to be honest.

So, I don’t think it’s at all unreasonable for users to expect prebuilt packages. And let’s be honest, conda and Anaconda are proof of that, their business is built on supplying prebuilt packages.

I’m not sure there is an expectation that prebuilt packages means wheels. Rather, I think that the people complaining know that there are other options available (conda, distribution supplied packages, …) but that they are unsuitable for them - and I think we should be asking why that is, rather than simply dismissing the problem as just being one of inaccurate user expectations.

I’m not saying that there isn’t a perception that prebuilt binary = wheel, but rather I’m saying that characterising it as nothing more than a perception issue doesn’t give us a useful way of addressing it. Understanding the user requirements behind that expectation might do so - and it’s the projects that can’t ship wheels that have access to users who can answer those questions, so maybe they could do more to understand the problem and communicate it to the people designing the installation tools and standards? After all the “packaging community” by definition includes the people producing actual packages, so it’s not like their views would be unwelcome (at least I’d hope not!)

1 I’d actually love to see a breakdown of how badly the “shared library” issue hits users, based on platform. My (biased) perception is that Windows mostly doesn’t have an issue, but Linux has huge problems. But that’s as a Windows user for whom everything just works fine, seeing bug reports from Linux users but never success reports from them. So I’m pretty sure my perspective is inaccurate - as, I suspect, would be any individual’s, so we need more objective measures to help us get data here :slight_smile:

1 Like

One thing that would be very helpful for conda would be to change the default channel to conda-forge instead of Anaconda channels. Ref The burden to add the community maintained conda-forge should not be placed on the end user.

3 Likes

The difference is that conda builds up a complete integrated environment (imperfectly, but it tries) whereas pip is far more like extending an existing environment (like installing 3rd party mods into a game).

Because of its different dependency model (builds are pinned to builds, not version ranges) and consistent build and runtime environment, the amount of integration the end user has to worry about is greatly reduced. It’s not magic, it’s just work that is pushed onto the package distributor rather than the installer.

4 Likes

I believe it’s in there by default now, but the bigger problem is that the only reliable way to install conda right now is to use Anaconda’s distributions. pip install conda was not supported last time I asked, nor do any system package managers that I’m aware of support it (and for the record, I’d settle for an install that only supports conda create and doesn’t actually include a global conda environment).

I think it’s totally fine for an Anaconda-provided distribution to point at their channels by default. The lack of an alternative conda installer is the problem here, but it’s not at all clear who should provide it.

From my point of view, it’s completely the opposite. I recently asked about packaging an external library on distutils-sig and there seems to be no tooling at all for packaging Windows shared libraries (DLLs) in a wheel. OS X has delocate, Linux has auditwheel but for Windows there is nothing like that.

But maybe I’m totally missing something which makes it work on Windows (which would be great news).

For my use cases, Conda works fine as it can take care of non-Python dependencies just fine.

1 Like

Windows resolves DLLs using search paths and names by default, so there’s no need to do anything special other than put it in the right place and it will be found. (There are other complexities, but for a properly-integrated environment these can be made to work. But this is one of the core problems when you cobble together an environment with pip but without actually doing the system integration work. Conda helps by making properly built packages work together, but some effort is still required when custom-built dependencies conflict.)

Ah, sorry - I was talking about the user side of things (people installing the packages) not the packager side. While I appreciate it’s something that might need some work, I don’t really worry about the packager side, as it’s a relatively small number of instances. I’d be more worried if you’d confirmed that it wasn’t possible to package DLLs in a wheel - but “there are no tools” is relatively easily solvable (by someone willing to automate the process). My impression is that auditwheel exists because the problem is so hard that it needs a tool to address it.

(Having said the above, I don’t build packages that have complex dependencies myself, and I don’t know how much experience you have with building on Windows, so apologies if I’m assuming a lack of knowledge when you’re trying to explain that there’s a technical issue that I’m unaware of - @steve.dower seems to be confirming my understanding, though).

I’m pretty sure that conda defaults include channels from repos.anaconda.com which are always given priority unless the user adds conda-forge as the highest priority channel in .condarc. Often the Anaconda channels serve a less recent version of a package than what is on conda-forge.

For it to be a completely open community solution, conda should default to conda-forge. Otherwise, it defaults to a vendor specific channel. Yes, today it serves open source packages but in the future, will these packages have additional Anaconda niceties added.

PS We have run into cases where things that work after a conda upgrade no longer work as expected.

Yeah, that’s what I said. I also said that because it’s Anaconda’s distribution, I think that’s fine. We could offer another way to obtain conda that has different defaults (for example, Intel does this with their distribution).

1 Like

I’m unsure why “working with pipenv” specifically is a packaging problem. Perhaps “providing a functionality similar to pipenv” is a packaging problem, and I honestly don’t know if there’s a similar enough answer in the conda ecosystem, because I’ve never had such a need.

It’s fine to say that you like pipenv’s user interface and don’t want to learn another one. It’s less fine to dismiss conda as a full-fledged package manager just because it doesn’t implement your user interface of choice, IMHO.

2 Likes

We are providing conda as an .RPM and as a .DEB now. I think it’s more of the “conda create” only kind of tool.

pip install conda is hard for the reasons you’d expect: it should manage its own space, not the python installation that was used to run pip to install it. That’s a hard concept to communicate to people, and we’re hesitant to wade into it.

2 Likes

but if people include DLLs in their package, and more than one package in the env has those DLLs, there’s a pretty good chance of DLL hell. Centralizing, either via conda, system package manager, or whatever, is helpful.

2 Likes

@willingc

For it to be a completely open community solution, conda should default to conda-forge. Otherwise, it defaults to a vendor specific channel. Yes, today it serves open source packages but in the future, will these packages have additional Anaconda niceties added.

This is pretty hard to do because of the implied support burden. I definitely think we should make it easier to add conda-forge, but we get a lot of issues from conda-forge users on the conda/conda issue tracker right now. How many more would we get if the line between miniconda/anaconda and conda-forge were less visible?

If conda-forge were a default channel, where should its priority be? Again, the packages have a very different level of integration effort behind them. If we make conda-forge the top priority, maybe things won’t work as well. Is that a worthwhile tradeoff for “community” perception?

@pitrou

It’s fine to say that you like pipenv’s user interface and don’t want to learn another one. It’s less fine to dismiss conda as a full-fledged package manager just because it doesn’t implement your user interface of choice, IMHO.

FWIW, I see these tools as important user interfaces. We will be putting effort to support their input files in conda, so that end users won’t have to re-learn as much to use conda. People should be able to use pipenv from conda environments if they care to, and perhaps even use conda as a backend for creating envs with pipenv. Conda should work to support PyPI as a source of packages. That’s going to take a lot of effort because of gaps in metadata, but it’s definitely worthwhile.

@pf_moore

I see your point, but you could equally say why wasn’t pipenv told to explain how it would work with Ubuntu’s packager, Fedora’s, Arch’s, with ActiveState’s package manager, etc. Multiply that by all the Python tools out there and you have a combinatoric explosion.

I think the task here is for PyPA to describe a standard for an external tool, and the put the onus on the external tools to register themselves as options. I think it’s completely unreasonable to ask PyPA to maintain all of those connections, but the ability to specify an external dependency, and to leave it up to any external (user-choosable) tool to resolve that would be very powerful. The hard part here is describing the external dependency well enough to get it right. For a shared library, what version of the library is needed? Are there symbols associated with things like glibc that necessitate a particular build? The details here are going to be tricky, but I think the problem is tractable.

1 Like

Thanks Mike. I’m not sure what the priority should be. I would say for reproducibility perhaps conda-forge first followed by the defaults. I’ve had my .condarc set this way for 2ish years and rarely see unexpected stuff.

You and the greater conda team have done a great job supporting the science community. Personally, I find the conda and pip interaction together works very well if creating a conda environment with an environment.yml file that pip installs anything that isn’t in conda-forge or the defaults.

I also typically use miniconda over conda with the suite of Anaconda packages to only work with the minimal dependencies needed.

1 Like

Probably because the people who developed the tools weren’t conda users to begin with (for whatever reasons they had).

This seems to be the crux of what you want to see changed: have packaging.python.org acknowledge conda more. Is that a fair assessment? All the other technological discussions are going to be hard to square away with the differing use-cases between pip and conda and it doesn’t seem to me to be that critical to square away based on what you seem to be asking for.

That’s obviously an opinion, but you unfortunately worded it like a fact. And I’m not sure what you mean by your “community-focused” packaging statement since everything everyone here is being done for the community. I know you well enough personally, Travis, to know you probably don’t mean for this to come off as confrontational, but it does at least to me.

How can we move forward?

The problem I’m hearing is this thread has become a “conda versus pip” discussion and that has never turned out well no matter how many times I’ve heard it. Both tools have benefits and drawbacks and neither solve everyone’s problems perfectly (and this is speaking as someone who manages a team who has to support both tools in a code editor and so I see issues both beginners and advanced users have with both tools).

I personally think that the only way we will ever move past this past this issue is to get the stakeholders in a room and have a discussion about how we would want packaging to work if we were to all start from scratch based on what we know now. How do we layer it, handle external dependencies, etc.? Then we can talk about how to move the community towards that idealized goal (and I believe there plans to have such a discussion at PyCon US this year).

But from my view, arguing either side should move to the other isn’t going to get us anywhere.

5 Likes

Please be wary of using the term “standard”. Why? Because the only viable interpretation of the word w.r.t. Python packaging for many of us is the PyPA/pip/PyPI/Warehouse Python trademark owner (PSF) owned & operated system.

Conda is a separate system run by different entities. Laudable useful goals, seemingly open, but not PSF and thus not “blessed” as Python standard given its forked non-superset ecosysyem not auto-connected (? - my assumption) with PyPI so publishing a PyPI package doesn’t make it available on Conda. https://stackoverflow.com/questions/29286624/how-to-install-pypi-packages-using-anacaonda-conda-command

The reason I write this is just framing. When discussing what is standard, people won’t read it the same way. Leading to disagreement due to everyone understanding a different meaning of the same text.

3 Likes

Apologies, I was simply trying to find an example. Maybe pipenv wasn’t a good example, but my intention was to point out that managing shared libraries isn’t the only “packaging problem” we have to deal with. IMO, interoperability between tools is the biggest thing we should be addressing - and framing the discussion as “conda or pip” doesn’t serve that goal - whether it’s conda users/developers or pip users/developers doing it.

Specifically regarding pipenv, I don’t think we should be asking any of the parties (pip, conda or pipenv) to “work with the others”. Rather, we should be defining standards that all parties buy into, and then everyone can interoperate. Going back to binary distribution formats, wheel is the agreed standard, but it doesn’t serve the needs of the conda community (or some of the packaging community’s needs, like tensorflow AIUI). So maybe we need an improved standard - but we won’t get one unless the parties for whom wheel isn’t sufficient give us some insight into the issues (there is a lot of that going on, from the “scientific” community, but I don’t know how much overlap there is between that group and conda).

The other option, of course, is to simply accept that the 2 groups serve different user bases, and move on. But personally, I’m not willing to give up and let the community fragment just yet.

2 Likes

As a binary format, I’m pretty sure wheel is totally fine (and if conda hadn’t invented their format before wheel was invented, they likely would have gone with it).

But the conventions around metadata (how to pin library dependencies), layout (in-tree or out-of-tree), and front end (internal installer vs external installer, and all the rest) vary so much that even if the packages looked like the same format, you wouldn’t get the same results.

One specific thing that conda does is build all binaries against the same dependencies and then pin those dependencies to each other. This is like an sdist having a version range for each dependency but the wheel of that sdist requiring a very particular build. If you lose that aspect, you lose a central piece of what makes conda conda, but if you take it then you lose a piece of what makes pip pip.

At some point, people just have to choose a world to live in and a source for their packages that fits their need (and many people prefer Anaconda’s packages - not conda-forge - for very legitimate reasons). By only promoting pip and PyPI, many believe there is no need to look further, as “the standard” way ought to be the best. But we also don’t promote the extra effort necessary to make those tools work - workflows vary, yes, but people coming to our community see us arguing about them (or see people taking sides) more than they see valid reasons to choose one or another. We don’t promote conda, but we also don’t promote distro packages or Homebrew either, or many of the prebuilt distros for Windows

And though I’ve said it a few times, I don’t think promoting more things is the answer. I think workflows are the answer. Pipenv and Poetry showed a workflow, which is why they’re popular. We don’t show workflows for pip - we promote what it can do, not what it should do. And that makes sense, as it’s a different kind of documentation and both are needed. But we don’t have anyone on the teams analysing who the users are, figuring out what they need to do, and preparing actual introductory docs that clearly let them know how to do what they need to, rather than how to use a hammer to hit anything that might be a nail.

There’s been some talk of defining personas to represent our users, and hopefully the steering council will provide some guidance on which of those we should prioritise. I’m looking forward to being able to have a concrete discussion about actual users and not just hypotheticals and strawmen.

2 Likes

Also, given the topic of standards and PyPA’s role and the fact that this discussion was sparked by requests from the conda community, another possible path forward in the absence of getting everyone together in a room would be for folks from that community to draft one or more PEP’s with more specific proposals. That could help to focus the discussion on concrete steps. (Though I do very much think it’s still important to get everyone together, at least metaphorically.)

2 Likes

Yes, we’d be happy to work on a PEP that captures the metadata necessary to allow external providers of shared libraries. We will be at PyCon, though we were told that there was not enough room for us at the language summit. Hopefully we can sort that out in person at the conference.

2 Likes