Pip/conda compatibility

That’s not what I mean when I talk about an environment manager (for Python).

To be honest, I think that as long as this is the case, Python packaging is not going to meet users’ needs. Now, that’s with my usual proviso that I’m not talking about PyPA or pip specifically, but more like “As long as the people who make decisions about what happens on Python.org and pypi.org are not willing to make Python.org/pypi.org provide solutions that manage non-Python dependencies, there will be a very large subset of Python users whose needs are not met by what’s available via Python.org/pypi.org.

And in some sense that’s fine. But as I mentioned on another thread, I think if that is the case it needs to be stated much more loudly and forcefully in much more obvious places on Python.org (in the docs, on PyPI, etc.). It needs to be stated that the Python builds obtained from Python.org, and the documentation on Python.org, and the package management system implemented by pip and accessible via pypi.org is not enough and users must consider, right from the beginning, looking elsewhere or they will encounter pain.

Why do you think anyone must treat people as if they can’t do their own research and think for themselves?

Python is a programming language. PyPI is a Python ecosystem package repository. The purpose of their respective documentation is to tell readers what they do and how to use them, not to try to convince them to use something else.

That said, I already agreed earlier in the thread that there are almost certainly more opportunities to add improved off-ramps to PyPUG in particular, reminding readers that they don’t need to build their own working environments from scratch and may want to consider pre-integrated options like conda if those are better suited to their goals.

The removal of distutils from the Python 3.12 standard library is also an opportunity to improve the way the main CPython docs direct readers towards environment setup & management resources.

The specifics of those changes are best considered via concrete PRs though, rather than via forum discussion.

This is an area where I sometimes wonder if a “_manylinux” style override module might make sense.

Consider if there was a standard “platformintegration” module that packaging tools looked for (ignoring it if was installed as a regular Python package, so only the underlying platform could provide it).

Via this hypothetical module, tools would be able to request:

  • the default index URL to use
  • any supplementary index URLs to use
  • a list of packages that must be installed via the platform rather than via Python-specific tools
  • the version constraints for those platform provided packages
  • installation of a platform-provided version of a package

Designing such an interface wouldn’t be easy, but it’s the missing piece to let platforms and packaging tools genuinely play nice with each other, rather than platforms only being able to say “hands off, this is mine to manage”.

1 Like

hmm – I wonder if it even needs to be that complicated – the hardest part (I think) is mapping of names (and versions) between different package providers.

And at least with the Python stuff, PyPi is the only namespace that matters, and more and more, conda-forge is the primary namespace for conda packages.

So that takes care of:

  • the default index URL to use

  • any supplementary index URLs to use
    (not that there’s any harm in specifying the url)

  • a list of packages that must be installed via the platform rather than via Python-specific tools

That’s tricky – I suppose there are some, but at least in the pip - conda world, it’s a question of: “use the conda package if there is one” – not that anything in particular needs to be the conda one.

  • the version constraints for those platform provided packages

This ties in with the naming – you need names and versions.

  • installation of a platform-provided version of a package

Hmm --so, e.g. pip would install a conda package if there is one? I suppose that could be done, but maybe it would be better if conda (or whatever environment manager) simply did it all. e.g. conda could provide a wrapper around pip that would use pip to figure out the dependencies of a package, look for conda packages to resolve them, and then install anything that couldn’t be found in conda.

However: maybe we could just focus on making it easier, and maybe even automatic in the easy cases, to make conda packages. If we had the name mapping, it wouldn’t be too hard for any pure python (or any self contained package).

Thinking out loud here:

It should be doable to build a service where there is an API to request a conda package for a given PyPi package. With greyskull that can “jsut work” for the simple cases.
(in fact, such a service already exists (almost): Marcelo Duarte Trevisani)

I’m not sure if it comes with grayskull, but you can right now combine grayskull and conda-build and get a package ready to go. So what to do with that package?

So why not build that service into conda-forge? One reason is that conda-forge is curated, at least to a point – there are many ways that auto-generating a conda package can lead to problems – it’s good to put some eyes on it first.

But: a “built_from_pypi” channel could be created, and the system could auto-generate a package from pypi, and put it on the built_from_pypi channel. Then any user could add the built_from_pypi channel on their channel list, at a lower priority than conda-forge, and they would be able to get access to all those packages.

And it could be used as a test bed for packages to be added to conda-forge later on.

Would this work for all the packages on PyPi? no, not at all, but it could work for a lot, and I think it would be about as reliable as pip installing within conda.

The primary missing piece is the name mapping – many PyPi names match conda-forge names, but not all, and there are some fatal clashes.

Maybe a first draft namespace mapping could be done by scanning all of conda-forge’s feedstocks, looking in the recipe, and determining what it is built from – if it’s a PyPI source download – we know the PyPI name.

Anyway, I know ideas like this have been bouncing around the conda world for a while – so I’m sure there are reasons it wouldn’t work – or just no one has stepped up to do it.

Same here - I think it’s more confusing than helpful to label package managers that happen to provide Python as a package as environment managers. The classification at Build & package management concepts and terminology - pypackaging-native seems more natural and useful to me.

I’ll note that I am actively taking a stab at this particular mess right now:) So thank you for this list of potential issues! Even if that works out, something like your classification here seems useful though. In particular it’d be great to distinguish pure Python from non-pure packages (Root-Is-Purelib: false in wheel metadata is kind-of a poor mans version of this). And also this:

I’d like to make that even stronger. From here: “Update PyPI so the three purposes are better separated. E.g., allow upload of sdist’s for archival and “source code to distributors flow” without making them available for direct installation.”

The classifications involving binary packages seem very tricky to me. Binary packages are always provided only for a subset of supported platforms and interpreters, may be uploaded at a later date, and so on. So I’d suggest that any such classification should only look at characteristics of the project, like “needs a C compiler” or “has an external dependency”, rather that including anything about availability of wheels.

It’s an interesting piece, luckily it’s not missing: cf-graph-countyfair/mappings/pypi/grayskull_pypi_mapping.yaml at master · regro/cf-graph-countyfair · GitHub

+1 tp the idea – I’m not sure “not making them available” is the goal – it’s really on the other end – installers should not try to build from source by default.

Slowly but surely, the multiple functions of pip are being teased out – so we’re getting there.

And as it is in fact pretty darn easy to build wheels for pure-python packages – that could be done automatically on PyPi or some external wheel building service (e.g. like conda-forge).

Awesome! thanks!

Indeed, I always do (and tell students to do) a pip install --dry-run then install everything possible from conda before doing the final pip install. It’s tedious but the only way to not almost surely break your environment over time. Anything that simplifies this would be an immediate improvement to qualify of life.

To not break your existing environment I often recommend people do:

pip freeze > constraints.txt
pip install newpackage -c constraints.txt

Pip therefore installs the minimal amount required while not breaking anything you already have installed, being a small list of installed package it’s also usually easier to uninstall them if it didn’t work out.

After many years of using mixed conda/pip environments I have firmly landed on using conda to bootstrap my non-python package dependencies (CPython, OpenSSL, git, rust, nodejs, etc.) and pip to do everything else with careful use of constraints to reproduce and share my environment.

If I am having to use conda to manage my Python packages I build conda packages for any missing dependencies to completely take Pip out of the loop.

I know this is against the theme of the thread but general compatibility solutions almost always depend on package authors implementing best practices and it seems unlikely at the scale of the Python ecosystem that one can depend on that. I am all for specific targeted solutions though (like externally managed for the base environment)!

2 Likes

I wonder how much mileage could be gotten just from a tool that runs pip --dry-run, parses the output, looks up every package pip wants to install, and then tries to install it with conda instead. It would be nicer if pip had a clearly documented format for this information (like a JSON version of --dry-run).

The various threads about externally-managed and whatnot have made me ponder this issue a bit. Most of the discussion of interop seems to be focusing on these things like INSTALLER and RECORD files and whatnot, but I have doubt about whether that kind of static information is going to be adequate in the long run. If there are multiple installers that exist because people want different behavior from them, then the relevant differences may not be in what is installed before you try to install something new, but in what the installer is actually going to do next when you actually do the install. That means what we need is not so much a record of what an installer already did as a way for installers to telegraph their moves ahead of time. Then Installer X can just ask Installer Y “what would you do in this situation” and get back a machine-readable answer that Installer X can then use to figure out what it can do on its own and what it genuinely needs Installer Y to handle.

There’s a --report output format (documented here) that you could use.

That works in the use case you describe, but conda will still happily install over the pip packages you pick up this way.

But that’s not what they’re doing. It’s fine for the documentation for PyPI to tell people how to use PyPI, and the documentation for pip to tell people how to use pip. What I have an issue with is the documentation for Python telling people that they should use pip and PyPI, and for the most part not even telling them that alternatives exist.

Here again is the main documentation page for Python. It links to this page, which says:

pip is the preferred installer program

“The”. “Preferred”. This is not explaining how to use pip. It is telling you that you should use pip, without even telling you that a large chunk of the Python world uses and prefers other things.

Later it mentions PyPI. PyPI is the only package repository mentioned.

It does not mention poetry. It does not mention PDM. It does not mention conda. It does not mention conda-forge. Not only does it not mention any of these, it does not even mention the possibility that other tools exist, except indirectly in some subsections towards the bottom. And even then, it fails to mention obvious alternatives in some cases, such as failing to mention pyenv or conda as ways of managing multiple Python versions. (This is leaving aside the issue of conda being shunted only under “scientific” headings, although as @PythonCHB and others have mentioned, that too is misleading.)

It is hard to overstate how grossly this misrepresents the way the Python packaging world actually works. If you go into places like Python IRC rooms, StackOverflow, reddit, blogs, etc., you will be deluged with alternatives and debate about those alternatives and questions about how to use pip that are answered with “you should use poetry instead” and all sorts of complications. There are no clear consensus solutions to be found anywhere.

Now, yes, if a reader digs around in the docs they can eventually maybe find a link to something like this that at least mentions some other tools. But what I’m saying is that the primary feature of Python packaging is its fragmentation, and so that should be boldly thrust to the fore in the documentation.

It has nothing to do with “doing your own research”. The documentation makes it seem as if there is one and only one obvious way to do Python packaging and that is not the case.

This is all without getting into the question of why these other tools exist, which is what a lot of these other discussions have been about. The reason all these other tools exist is because the default ones aren’t good enough. All I’m saying is that if Python isn’t going to provide a better packaging experience as an “included battery”, it should not present the “standard” solutions as the best, default, or only options.

That is (in effect) a statement by the Python core developers and the Steering Council, and is based on PEP 453. Neither the PyPA nor the packaging community has any power to change that position, except by proposing a new PEP and submitting it to the SC.

We can talk all we like here, but PEP 453 is accepted Python policy, and only the SC can change it.

Then that should be the first goal, prior to any of these PEPs about adding more complications to the packaging system.

Edit: Looking at PEP 453, it’s not clear to me that it would actually preclude modifying the docs to give greater mention of other tools. It seems the recommendation aspect was largely about un-recommending setup.py:

This PEP proposes that the Installing Python Modules guide be updated to officially recommend the use of pip as the default installer for Python packages, rather than the current approach of recommending the direct invocation of the setup.py install command.

Documentation that pip is the default could still be maintained, but it could be augmented with a warning that many other tools are in widespread use to overcome pip’s limitations. As I’ve mentioned on some of these threads, I still think we need to take an axe to more of the docs than that, and ideally do an even more drastic rethink of the whole packaging apparatus, but some modifications might be possible in the interim if a PEP is required for more sweeping changes.

Submit a PEP or a PR to fix those docs to your liking. The PyPA doesn’t own the Python docs, the Python devs own that.

1 Like

(quoting the stdlib docs)

I’m pretty sure I wrote that back when we were still trying to get people away from running “./setup.py install” directly (and to stop using “easy_install”) and before much of the subsequent work that really let alternate installer frontends play nice with each other.

Changing it now to something like “pip is the installer provided by default with python.org Python releases (other Python distributions may provide or recommend other options, and some users choose to adopt their own preferred tools, but pip sets a baseline for commonly expected package installation related functionality)” would be a reasonable thing to do.

“When was this last updated?” is often a question well worth asking about packaging related documentation - sometimes it says inaccurate things simply because it was literally last touched years before.

1 Like

Unfortunately, I believe that’s giving the current state of the stdlib packaging docs more credit than they deserve. As far as I am aware, much of it is still the near placeholder text that I wrote back when ensurepip was first added and we needed something to take the place of the old distutils based docs (which I shuffled off to one side rather than deleting, since they included a few bits of technical info that weren’t documented anywhere else).

While changing ensurepip itself would still require a PEP, changing the related documentation can be done through the regular CPython issue management and pull request processes.

Hence my comment earlier in the thread about the distutils removal in 3.12 being a good time to consider updates - independently of the evolution of the wider Python packaging ecosystem, they could likely use a review simply to check for any lingering distutils references.

3 Likes

Yup, that’s more or less the case as far as I’m aware—they’re very basic and have hardly been touched in years, with most of the interested moved to the PyPUG (which itself has been rather neglected) or more practically the individual tools’ docs.

I was working on this when the original distutils removal happened, but paused it to wait until the ≈beta phase for whether the removal would be reverted or not, as well as for some other feedback on the proposed approach. Here are the relevant issues…

Overall issues (not totally sure the difference in scopes):

Distutils removal docs-specific

1 Like