Virtual provides for cross-ecosystem dependencies


(Petr Viktorin) #1

Continuing the discussion from Structured, Exchangeable lock file format (requirements.txt 2.0?):

Nathaniel wrote:

What if we invented a new meta-namespace, that included both? So e.g. "pypi:gulp" would mean “what pypi calls gulp”, and "conda:gulp" would mean “what conda calls gulp”, and now we can use both vocabularies at the same time without namespace collisions.

And we’ll also need a meta-meta-namespace to tell us what the pypi: and conda: mean!

More seriously, the idea is almost exactly what Fedora (and probably other RPM-based distros) does now. Let me summarize to avoid re-inventing eggs.

Fedora uses use “virtual provides”. Those are typically used for “aliases”: for example, python2-requests package (still) provides python-requests for backwards compatibility, so an old package requiring python-requests doesn’t break (yet).
But it’s not just for aliases. For example, names of installed files are included as virtual provides. If I’m missing the http command, I can install if directly with dnf install /usr/bin/http, and forget that it’s from the project httpie Or was it httppie? Or python3-httpie in Fedora? (I don’t mean to pick on the author here; naming is hard.)
Virtual provides do complicate the system, though – requirements are “searched for” rather than just “looked up”.

For Python packages:

  • A package like python-requests (from the Fedora namespace) can specify that it provides python3dist(requests), i.e. requests from the PyPI namespace.
  • Other packages can Require either name.
  • A Provides generator automatically looks for egg-info/dist-info metadata when any package is built, and adds proper python3dist Provides to the package.
  • And a requirement generator now also automatically puts in the Requirements. (This has been turned on by default about a year ago.)

It’s taken a while to gradually get this implemented (and shepherd package maintainers to use it), but it works quite nicely.
Obviously, it’s not cross-platform, but it’s cross-ecosystem: in the same way, you can for example require headers for a C/C++ library, say pkgconfig(Qt5) – and if you build an extension with that, another dependency generator scans the extension’s symbols and adds a Require to the right version of libQt5Core.so.5.

It still doesn’t make it trivial (=automatable) to build Fedora-quality RPMs from PyPI packages (which often lack tests and licences, have different convention for descriptions, etc.), but it’s the first few steps :‍)

Please steal ideas as appropriate. (Most likely, though, this is not appropriate for Pip now.)


Digression: Distribution Package is the PyPA term for the stuff you download from PyPI, but I avoided that term here since I also discuss packages from a Linux distribution. The terms “project” and “package” and “module” are similarly confusing. (Thankfully, “distro” is always an OS distribution (where OS is Operating System, not OpenStack.)) Hah! Namespaces! Let’s do more of those.


Structured, Exchangeable lock file format (requirements.txt 2.0?)
(Steve Dower) #2

This sounds most interesting for building sdists if we had a namespace that could resolve native libraries by required header filenames (perhaps using vcpkg, which is x-plat, despite the docs still being Windows centric).

Header names seem general enough as keys/search terms to also integrate into private code repositories as well, such as the Linux distros or corporate build environments. Manually resolving name collisions (with more names?) seems easier all round than the current approaches.

It might also be interesting for dev environments to be able to request by command and let those be fulfilled “however” without necessarily having to be integrated into a single Python environment. I was thinking about how to build a tool that essentially generates a “bin” directory for your project with the commands you need (gcc, make, black, pytest, etc.), symlinking where possible or setting up a Python environment for ones where pip is the only feasible installer right now, but essentially making all the tools just look like tools to the user, rather than making them look like dependency conflicts with your app’s dependencies :blush: Being able to satisfy tool requests from multiple sources would be handy.

That leaves the deployment stage (which for me is the final deployment stage, not pushing to PyPI or building a wheel, the point where files are being laid down in exactly the place they’re going to be used), which is the only place I want a lock file. And actually here, I don’t want cross-ecosystem dependencies anymore unless I get given them as independent lockfiles for each ecosystem - there shouldn’t be dependencies in a lockfile, as they should have been resolved already. (And of course, at this point, I don’t want tools being installed at all. Though if they’re defined separately, it may be easy to install and then remove them cleanly?)

All of that to say:

  • a build backend that knows how to find native packages would be great
  • a dev environment configurator that can find tools from multiple ecosystems would be great
  • lock files are already pretty great (when used properly)

(Petr Viktorin) #3

The parallel for this in Python ecosystem would be the ability to resolve importable module names to PyPI package names. AFAIK, we still don’t have that.

Anyway, Fedora (RPM) does have this: dnf install '/usr/include/*/Python.h'. Cross-ecosystem, but not cross-platform…


(Steve Dower) #4

Right, that was what triggered the idea :slight_smile: And then I realised that we have a cross-platform repository of code, and just need the mapping from filenames. Which could all live in (a component of) a PEP 517 backend and probably be fairly static, tbh.

I’m in a real brainstorming mode on this right now, so ideas are kind of flying everywhere. I’ll try and let them bubble away before proposing anything major, but I always like to at least get bits and pieces out in case it triggers other people’s imaginations.


(Brett Cannon) #5

For this, I think a rustup/pipsi solution might work well, but in both cases there’s no versioning of tools and I know in Nathaniel’s motivating example in some topic somewhere he wanted different versions of tools. At that point you’re starting down a homebrew-style solution where you directly link the newest version into your bin/ but you have other versions sitting in a browsable location if you need to grab something directly.


(Steve Dower) #6

Actually, it looks like Chad Smith (https://github.com/cs01, https://twitter.com/grassfedcode) is someone we should be talking with, as he’s going out and just doing these already :slight_smile:

  • pipx does exactly what I was describing here for tools
  • pythonloc implements PEP 582 (__pypackages__)

(Chris Jerdonek) #7

On the subject of “cross-ecosystem,” I just thought I’d let people on this thread know about this issue (“Option to look for pip.conf in conda environment”): https://github.com/pypa/pip/issues/5060
(Not for discussion here though – just FYI.)


(Steve Dower) #8

That issue is more about a conda environment not being a venv/virtualenv environment and pip having no configuration file path within a “normal” environment. It affects Windows in lot of cases, where the install path of Python is much more flexible than other platforms (which is why I was interested enough to work on it :wink: )


(Nick Coghlan) #9

Thea and I have been talking to Chad on the pipx guide he wrote for packaging.python.org: https://github.com/pypa/python-packaging-user-guide/pull/594 (I just gave a +1 to the latest draft, so I expect it will go live once Thea has had a chance to take a look at it).

It doesn’t solve the “install multiple versions of the same tool” problem though, since they collide on the unqualified command name.