How to vendor a package into a python distribution?

Over at PyPy, we vendor cffi, hpy, greenlet, and readline into the distribution we distribute[1]. This causes problems for package managers like poetry When importlib.metadata.distributions() reports them as “installed packages” (which is true), package managers will sometimes try to remove them, either to create a clean environment or to pin to a different version. If PyPy removes the metadata, I think the situation would be worse: pip install cffi would now try to install cffi from PyPI since it would not know there is a version installed. Is there any appetite for adding a mechanism to mark packages as vendored in the INSTALLER file, which would indicate to package managers that they should not remove the package, and that the python distribution can support only that version of the package?


  1. PyPy vendors cffi, greenlet, and hpy because there is a version-specific backend built into PyPy to support these packages. readline is vendored to support pyrepl, which could be avoided by tucking the vendorded file inside of pyrepl, which is what CPython does. ↩︎

3 Likes

You will not make friends with the distros that forbid vendoring.
Vendoring is seen in distro world as preventing bug fixes and security fixes from being applied to a dependency.

How do you propose that the package resolution phase should go?

Is the idea that this proceeds as normal, potentially choosing a version other than the vendored version? Then at install time the installer should decide that pypy knows better, and ignore what was resolved?

1 Like

You will not make friends with the distros that forbid vendoring.
Vendoring is seen in distro world as preventing bug fixes and security fixes from being applied to a dependency.

If distros want to redistribute PyPy won’t they just devendor it? That’s what they do with pip’s vendored packages. Don’t they also effectively do this with CPython by “devendoring” venv?

I think there is some logic to a more fine grained version of “EXTERNALLY-MANAGED”, e.g. an “EXTERNALLY-PROVIDED” that tells Python Packaging packaging tools not to uninstall a specific package, nor attempt to install a package with that name.

I can imagine this being used by tools that provide packages via some other mechanism (conda? other distributors?), but I would like to see if there is interest from some tooling providers other than PyPy.

2 Likes

PyPy vendors the packages, and has for many years. We have worked things out with distros: they allow PyPy to vendor the packages. Problems with those libraries are treated like problems with any stdlib module and the very rare security problems can trigger new releases of PyPy. This proposal is for package managers: pip, poetry, uv, etc, to align them with the policy that a vendored package should be reported but not be removed or updated.

In my opinion package managers would not allow choosing a version other than the vendored one, and would treat the vendored package as any other stdlib package for the purposes of creating virtual environments (as virtualenv and venv already do, since they reside in lib not in site-packages).

I am not locked onto the vendored name, externally-provided is fine too.

The discussion for other Python implementations and for conda is rather theoretical at this point, since PyPy is no longer provided via conda [1] and as far as I know GraalPython has not released wheels on PyPI. I am willing to do the legwork to push this proposal through to completion once there is consensus here, including work with conda-forge for conda and working out the details directly with any package manager who wishes to support the new spec.


  1. PyPy had a conda distribution, but it was decided to sunset it ↩︎

Just to provide a bit of background, cffi, greenlet and hpy all are deeply integrated into the PyPy VM, they need tight JIT and GC integration to be correct (and fast). They also have parts that are implemented completely differently than what is packaged for CPython on pypi.org. So there is no way at all to not vendor them, short of changing how PyPy works fairly fundamentally. They are part of PyPy’s standard library, and some other parts of PyPy’s standard libray depend on them (also, as a historical note, these libraries were originally written or at least started by PyPy core maintainers.)

3 Likes

This does not seem practical for cross-platform locking tools (poetry, pdm, uv, …).

Currently they perform resolution by checking what is available on PyPi (or other indexes). In the case of cffi this will always result in a solution that is not compatible with the un-released version that pypy vendors, and for the other packages it sometimes will.

Do you intend that such tools should carry hard-coded lists of pypy-vendored packages (and their dependencies), and create different solutions for every pypy version? This does not sound attractive.

Even then: what should be the install-time experience for packages / lockfiles / requirements.txt that do require a version not matching the vendored one? The only reasonable answer seems to be that this installation must fail: but now you have just made a lot of things uninstallable on pypy.

(We all are getting away with this today because of compensating bugs: uv fails to notice the vendored packages, poetry and pip try and fail to remove them but do not error out…)

It seems to me that the pypy position is most unfortunate here, making it impossible for users to install the versions that they want of certain packages. This is more like a bug - maybe a difficult one to solve - than it is something that wants standardising.

Anyway I would like to see it spelled out what tools are supposed to do when encountering such requirements, and why that is okay.

Or if we all keep quiet and hope that none of the tools fixes their bugs then perhaps we can continue to blunder through.

2 Likes

Yes, I expect that the installation will fail with an error as if the user had specified conflicting installation requirements. A random version of greenlet, cffi, or hpy cannot be used with PyPy, so it is better to fail at installation rather than at runtime. No, this will not make “a lot of things uninstallable on PyPy”. There might be a vanishingly small number of packages that will not install on PyPy, and over the years we have engaged with such packages to suggest solutions when the problems are reported.

3 Likes

if you are not seeing error reports it is likely through some combination of: the tools are currently silently failing to install what the users ask for - and pypy is not much used.

eg here are 147,000 things that should fail to install on pypy. Here are 50,000 more.

1 Like

Yes, those 147,000 + 50,000 “projects” may fail to run on PyPy. The whole point of this thread is to try to understand the constraints that lead to failure, and propose a standards-based solution. A standards body saying “it is your fault, you fix it, I can’t do anything to help” is not what I expected when I started this thread. Note this is a very different experience than what I had with conda-forge, previous packaging discourse discussions, and the many many discussions the PyPy team has had with package maintainers, although a few indeed have said they will not support PyPy. I guess that is the prerogative of the poetry team, but it would be a shame if we cannot reach a solution.

Real world restrictions tend to be messy. That present challenges standards groups like this one are supposed to be open to discussing and solving. PyPy developers initiated CFFI and HPy as part of a solution to real-world problems, and the end result was apparently successful since at least 147,000 people use CFFI. Indeed, those projects that pin to old versions of CFFI might fail to run properly on PyPy. Wouldn’t it be a better user experience if, at the time they try installation on PyPy, they receive an error message rather than a runtime error? I tried to point to a path that might lead to a solution. What is your proposal given the constraints?

6 Likes

I am not a standards body, just a person on the internet. Please do not hold anyone else responsible for my opinions.

Perhaps our mismatch is that I did not imagine that pypy could be happy with an outcome in which it is not compatible with such large numbers of projects. But apparently you are fine with this.

But in that case, I am not sure that any new standard is required. pip already knows that something is going wrong here:

$ pip install cffi==1.17.1
...
  Attempting uninstall: cffi
    Found existing installation: cffi 1.18.0.dev0
    Not uninstalling cffi at /home/dch/pypy3.11-v7.3.18-linux64/lib/pypy3.11, outside environment /home/dch/foo/.venv
    Can't uninstall 'cffi'. No files were found to uninstall.

(There is a different warning when not in a virtual environment, but again pip already spots that there is a problem.)

So how about: submit a pull request so that this becomes a hard error?

If and when folk complain that they now cannot install things, tell them to update their cffi requirement (or whatever) with a marker platform_python_implementation != "PyPy".

1 Like

It’s hardly a standards body saying that, it’s mostly just interested community members voicing their opinions.

As far as your suggestion of using the INSTALLER file goes, that seems plausible. At the moment, INSTALLER is purely informational, but we could do more - for example, add a flag saying that only the installer recorded in this file is allowed to uninstall the project.

But I don’t know enough about the use case to do much more than speculate. I’d be willing to sponsor a PEP along the lines I suggested, but you’d need to thrash out the details to get it into a state where it could be a PEP.

1 Like

Curious onlooker here…

Shouldn’t vendored dependencies be hidden? In other words I assumed that if I were to vendor a dependency I would rewrite the import paths and make it so that its metadata is not available to the usual machinery. For example if I maintain foo that vendors bar, then I would make it so that bar is only importable as something like import foo._vendored.bar and I would make sure that importlib.metadata.metadata("bar") finds nothing.

But maybe the requirements here are vastly different than what I pictured… Should I instead understand that pip-installed 3rd party dependencies have to use the versions of cffi, greenlet, and hpy that are vendored by PyPy?

2 Likes

Currently they perform resolution by checking what is available on PyPi (or other indexes). In the case of cffi this will always result in a solution that is not compatible with the un-released version that pypy vendors, and for the other packages it sometimes will.

Cross platform resolvers already have to make simplifying assumptions that aren’t part of the spec, e.g. uv assumes all dependencies for a given release are the same. And I know uv adds information that isn’t defined by the spec but is implicitly true, e.g. https://github.com/astral-sh/uv/pull/9949. Why could cross platform resolvers not also add known PyPy dependency information?

And resolution, in general, isn’t defined by the spec, so I would want to hear specifically from such tool authors that it wouldn’t work for them, that they couldn’t encode that information into their resolution tools, and why not.

Exactly. Projects like pymunk depend on CFFI at runtime. In order to run on CPython, they install CFFI. In order to run on PyPy they use the vendored CFFI. For this reason PyPy cannot hide its vendored CFFI, but must advertise that CFFI is already installed.

3 Likes

I think a much cleaner approach is my subsequent suggestion that packagers should be expected to provide correct requirements in the first place (via a marker checking the python implementation).

This is true, in an ideal world. And I think it’s what PyPy should be encouraging projects to do. But the reality is that there will be people who specify a cffi (for example) dependency, even on PyPy.

IMO the way to frame this is not about vendoring, but rather that PyPy ships an “extended” stdlib that includes some packages normally found on PyPI. If we look at it that way, maybe there could be a file in site-packages that recorded “additional stdlib” packages: cffi==1.18.0.dev0 etc. Then installers would be expected to read that file and treat it as a constraints file[1].


  1. It would have to be more than a constraints file in pip’s sense, because it would need to block uninstalling, for example - but the principle is the same ↩︎

2 Likes

I think a much cleaner approach is my subsequent suggestion that packagers should be expected to provide correct requirements in the first place (via a marker checking the python implementation).

There are two apparent issue with this as a suggestion:

  • Asking 1000s of package authors to correctly define dependency information for a platform many of them might not be aware of will be more error prone than asking a handful of cross resolution platforms who are experts on the Python ecosystem to encode the information when they do cross platform resolution
  • Dependency information can’t be changed once uploaded to PyPI, whereas cross platform resolution tools can continue to update this information over time, so old packages can be correctly specified with the correct vendored packages of PyPy even if they never specified it in their dependency information

But again, I would want to hear directly from the authors of cross platform resolvers here, to see if they would agree or not.

The problem with proposals that rely on the currently running python implementation is that they do not help tools that are trying to generate lockfiles for use elsewhere.

At install time, the evidence from pip is that installers can already recognise that they should not remove the vendored packages - so I do not currently see the need for a new mechanism to tell them this.

fwiw I am a regular poetry contributor: though there, as here, my opinion carries only its own weight