How to vendor a package into a python distribution?

I can’t comment on that. Hopefully the maintainers of such tools can offer some insight.

Apart from the fact that the message is pretty unhelpful for the end user, it wouldn’t help with flags like --ignore-installed.

But ignoring that for now, the original post specifically mentioned poetry, not pip. Even if pip works OK (by accident, rather than design), there is a problem here that needs fixing. At the very least, having a list of exactly what “additional stdlib” files are included in a given distribution would help tools that need to fix something know what they have to fix.

I’m a strong -1 on anything that requires hard-coding the knowledge of what packages PyPy ships into tools. Reading information from the target environment is a standard way of getting such information - you’re right, it doesn’t work well when the target environment isn’t available at resolve time, but there’s lots of things that don’t work well in that situation. As @notatallshaw noted, tools already have to make simplifying assumptions in those cases, so I don’t think we should block finding a solution on solving the cross-platform case.

2 Likes

At install time, the evidence from pip is that installers can already recognise that they should not remove the vendored packages - so I do not currently see the need for a new mechanism to tell them this.

This is a misinterpretation of pip’s behavior, pip’s default upgrade strategy will not uninstall an installed package: https://pip.pypa.io/en/stable/cli/pip_install/#cmdoption-upgrade-strategy. But:

  • This behavior is controlled by the user not the package or platform
  • Pip will uninstall a package if it doesn’t meet the resolution

The original post explains the need for this, as for PyPys vendored packages:

package managers will sometimes try to remove them

Pip isn’t a package manager, but I do imagine if these packages metadata are exposed there will be situations where pip tries to remove them.

poetry shells out to pip for uninstall so it directly inherits pip behaviour. (Hopefully one day this will not be true, but no-one has yet showed up to make that so).

This is a misinterpretation of pip’s behavior …

I think you are mistaken, pip explicitly refuses to uninstall things that it thinks are in the standard library.

I think you are mistaken, pip explicitly refuses to uninstall things that it thinks are in the standard library.

Ah, so at the moment Poetry tries to uninstall and fails because pip throws an error?

I don’t see that specific error in the linked thread but I do see the error above it that the package is outside the environment.

Is this a bug in Poetry then? If it’s using pip to uninstall shouldn’t it implement the same guards as pip for uninstalling? Does Poetry consider these guards reasonable?

No. You have misunderstood something. pip does not throw an error, poetry does not fail.

There is no real error today, because of a confluence of mishaps:

  • packagers wrongly declare unconditional requirements on cffi
  • installers (certainly both poetry and pip) then try to remove vendored packages
    • imo this part is correct behaviour, given the requirement for a different version
  • that attempted removal fails (also correct)
  • but that failure is treated only as a warning by pip, and that warning is ignored altogether by poetry

The result - both with poetry and pip - is that eg cffi==1.17.1 is installed, but the vendored cffi is not removed and “wins” anyway by being earlier on the path.

The only actual symptom is some output claiming that a vendored package is being removed.

As I said some time ago, we could just not fix any of these bugs and blunder on.

The clean way to achieve much the same effect would be for the failed uninstall to be a hard error, which in turn would force packages to provide correct requirements.

2 Likes

Thanks for explaining in detail. Importantly, I think you are right that, in this case at least, these packages can already be identified.

So perhaps another solution is simply package installers should throw an error when trying to upgrade or reinstall a package that can’t be uninstalled? And package/environment managers should be careful not to upgrade or reinstall these packages? And cross platform resolvers will have to encode this information?

I’m not strongly familiar with the history of this on pip side, I’m curious why it only warns on uninstalling when it’s in the middle or an upgrade or reinstall. Given the information is all available ahead of time it could assert this after resolution but before it attempts to upgrade or reinstall any package.

1 Like

As @mattip pointed out, arguing over who’s to blame for the current situation isn’t productive. Packages (apparently) exist that are affected by this issue, so claiming that they have incorrect requirements doesn’t really help - that ship has sailed…

So can we be clear on what the actual problem is? I see two different situations which would cause issues. There may be others, but I don’t think they have been mentioned in this thread.

  1. A project that depends on (say) cffi, but explicitly requires a version that’s not the one shipped by PyPy. That project will never work on PyPy. Forcing the resolver to use the version shipped by PyPy (whether via a constraint, or by marking it as “not uninstallable”) would result in an error message explaining the issue, which is the best we can do. Current behaviour is suboptimal because it’s tool-dependent (pip issues a warning, poetry silently installs the required version, no-one has said what uv or other installers do).
  2. A tool that wants to uninstall “unnecessary” packages from an environment. This is (I believe) the issue @mattip originally referred to, in relation to the poetry sync command. Giving tools information about what packages should be considered as “part of the standard library” allows them to make informed decisions. Current behaviour is suboptimal because the only real option is “try, and assume it’s OK if you fail”. Worse, if you try and succeed, you may have uninstalled a key system component.

Constraints on a solution are:

  1. The list of “additional stdlib packages” needs to be maintained by the distribution, not by packaging tool maintainers, so that it’s accurate and up to date.
  2. Cross-platform resolves, where a tool is resolving for an environment not currently available to query directly, will be an issue, as there’s currently no standard solution for tools to get information about an arbitrary distribution, apart from querying it directly. Current state of the art here is tool-specific approaches, often involving simplifying assumptions that aren’t standard-compliant.

Upgrade (or downgrade) maybe. But not on reinstall - we should simply leave the existing copy alone and note that a reinstall wasn’t needed, and isn’t possible. I’m concerned how this would interact with tool-specific options, though. If the user chose the “eager” upgrade strategy in pip, that might trigger an error where the correct solution would be to not upgrade cffi, while still being eager about everything else.

I’m not that familiar with the history, either, but it may be due to packages in the CPython stdlib having installation metadata in the past. Or the weird pre-PEP 517 status of setuptools/pkg_resources as “necessary for bootstrapping packaging”. I’d be concerned about breakage if we simply changed it, just because no-one really knows any more why it’s there, so we’d need a suitable deprecation period at least.

And of course this doesn’t fix the problem in general. Every tool would have to implement the same fix. At which point why aren’t we making it a standard?

That is not what I am doing. I am proposing a possible path:

  • installers start to treat disallowed uninstalls as a hard error
  • in reaction to that, packagers start to fix their requirements

So far as I can see it is exactly the same proposal as “another solution is simply package installers should throw an error when trying to upgrade or reinstall a package that can’t be uninstalled”.

So can we be clear on what the actual problem is ?

Big plus one on this.

The only real symptom that I know is some misleading output during poetry install type commands.

That may be more through good luck than good design, but if that is the whole of the problem then it will not be worth much effort and transitional pain to get to a better place.

1 Like

I see two problems:

  1. installers are trying to uninstall packages that should not be uninstalled, and
  2. resolvers should recognize packages installed in the stdlib that contain metadata.

I am not sure (2) is actually a problem, once (1) is solved. I agree if a user actively requests (1) to happen it should be a hard error. But if I understand correctly the issue in poetry is that the installer has somehow decided to remove installed packages even though the user did not explicitly request they be removed.

Edit: qualified my understanding of the root cause of the Poetry issue.

1 Like

This thread turns out to be more or less irrelevant to the issue in poetry - per this the real problem that henryiii reports is only tangentially related to what we are discussing here.

If that misunderstanding is the basis for this conversation then I apologise for my part in not spotting that sooner.

AFAIK, the standard way to prevent this is removing the RECORD file. Keep the rest of .dist-info to make the package count as “installed” for dependency resolution.

5 Likes

I get the impression uv is similar to poetry here, so let me run you through an example and show where uv’s assumptions and pypy’s assumptions clash.

In uv, we produce a cross-platform lockfile: The idea is that there is one lockfile that can be installed on all platforms and compatible python versions, assuming there’s a wheel or source dist available for each package. This lockfile contains the hashes for each distribution, to both to guarantee reproducibility and as security measure. (Sidenote: This all only applies to uv lock/uv sync/uv run, but not the uv pip interface.)

For example, we could have brotlicffi:

[project]
name = "foo"
version = "0.1.0"
requires-python = "==3.11.*"
dependencies = [
    "brotlicffi>=1.1.0.0",
]
uv.lock
version = 1
requires-python = "==3.11.*"

[[package]]
name = "brotlicffi"
version = "1.1.0.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
    { name = "cffi" },
]
sdist = { url = "https://files.pythonhosted.org/packages/95/9d/70caa61192f570fcf0352766331b735afa931b4c6bc9a348a0925cc13288/brotlicffi-1.1.0.0.tar.gz", hash = "sha256:b77827a689905143f87915310b93b273ab17888fd43ef350d4832c4a71083c13", size = 465192 }
wheels = [
    { url = "https://files.pythonhosted.org/packages/a2/11/7b96009d3dcc2c931e828ce1e157f03824a69fb728d06bfd7b2fc6f93718/brotlicffi-1.1.0.0-cp37-abi3-macosx_10_9_x86_64.whl", hash = "sha256:9b7ae6bd1a3f0df532b6d67ff674099a96d22bc0948955cb338488c31bfb8851", size = 453786 },
    { url = "https://files.pythonhosted.org/packages/d6/e6/a8f46f4a4ee7856fbd6ac0c6fb0dc65ed181ba46cd77875b8d9bbe494d9e/brotlicffi-1.1.0.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:19ffc919fa4fc6ace69286e0a23b3789b4219058313cf9b45625016bf7ff996b", size = 2911165 },
    { url = "https://files.pythonhosted.org/packages/be/20/201559dff14e83ba345a5ec03335607e47467b6633c210607e693aefac40/brotlicffi-1.1.0.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9feb210d932ffe7798ee62e6145d3a757eb6233aa9a4e7db78dd3690d7755814", size = 2927895 },
    { url = "https://files.pythonhosted.org/packages/cd/15/695b1409264143be3c933f708a3f81d53c4a1e1ebbc06f46331decbf6563/brotlicffi-1.1.0.0-cp37-abi3-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:84763dbdef5dd5c24b75597a77e1b30c66604725707565188ba54bab4f114820", size = 2851834 },
    { url = "https://files.pythonhosted.org/packages/b4/40/b961a702463b6005baf952794c2e9e0099bde657d0d7e007f923883b907f/brotlicffi-1.1.0.0-cp37-abi3-win32.whl", hash = "sha256:1b12b50e07c3911e1efa3a8971543e7648100713d4e0971b13631cce22c587eb", size = 341731 },
    { url = "https://files.pythonhosted.org/packages/1c/fa/5408a03c041114ceab628ce21766a4ea882aa6f6f0a800e04ee3a30ec6b9/brotlicffi-1.1.0.0-cp37-abi3-win_amd64.whl", hash = "sha256:994a4f0681bb6c6c3b0925530a1926b7a189d878e6e5e38fae8efa47c5d9c613", size = 366783 },
]

[[package]]
name = "cffi"
version = "1.17.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
    { name = "pycparser" },
]
sdist = { url = "https://files.pythonhosted.org/packages/fc/97/c783634659c2920c3fc70419e3af40972dbaf758daa229a7d6ea6135c90d/cffi-1.17.1.tar.gz", hash = "sha256:1c39c6016c32bc48dd54561950ebd6836e1670f2ae46128f67cf49e789c52824", size = 516621 }
wheels = [
    { url = "https://files.pythonhosted.org/packages/6b/f4/927e3a8899e52a27fa57a48607ff7dc91a9ebe97399b357b85a0c7892e00/cffi-1.17.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:a45e3c6913c5b87b3ff120dcdc03f6131fa0065027d0ed7ee6190736a74cd401", size = 182264 },
    { url = "https://files.pythonhosted.org/packages/6c/f5/6c3a8efe5f503175aaddcbea6ad0d2c96dad6f5abb205750d1b3df44ef29/cffi-1.17.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:30c5e0cb5ae493c04c8b42916e52ca38079f1b235c2f8ae5f4527b963c401caf", size = 178651 },
    { url = "https://files.pythonhosted.org/packages/94/dd/a3f0118e688d1b1a57553da23b16bdade96d2f9bcda4d32e7d2838047ff7/cffi-1.17.1-cp311-cp311-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:f75c7ab1f9e4aca5414ed4d8e5c0e303a34f4421f8a0d47a4d019ceff0ab6af4", size = 445259 },
    { url = "https://files.pythonhosted.org/packages/2e/ea/70ce63780f096e16ce8588efe039d3c4f91deb1dc01e9c73a287939c79a6/cffi-1.17.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:a1ed2dd2972641495a3ec98445e09766f077aee98a1c896dcb4ad0d303628e41", size = 469200 },
    { url = "https://files.pythonhosted.org/packages/1c/a0/a4fa9f4f781bda074c3ddd57a572b060fa0df7655d2a4247bbe277200146/cffi-1.17.1-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:46bf43160c1a35f7ec506d254e5c890f3c03648a4dbac12d624e4490a7046cd1", size = 477235 },
    { url = "https://files.pythonhosted.org/packages/62/12/ce8710b5b8affbcdd5c6e367217c242524ad17a02fe5beec3ee339f69f85/cffi-1.17.1-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:a24ed04c8ffd54b0729c07cee15a81d964e6fee0e3d4d342a27b020d22959dc6", size = 459721 },
    { url = "https://files.pythonhosted.org/packages/ff/6b/d45873c5e0242196f042d555526f92aa9e0c32355a1be1ff8c27f077fd37/cffi-1.17.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:610faea79c43e44c71e1ec53a554553fa22321b65fae24889706c0a84d4ad86d", size = 467242 },
    { url = "https://files.pythonhosted.org/packages/1a/52/d9a0e523a572fbccf2955f5abe883cfa8bcc570d7faeee06336fbd50c9fc/cffi-1.17.1-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:a9b15d491f3ad5d692e11f6b71f7857e7835eb677955c00cc0aefcd0669adaf6", size = 477999 },
    { url = "https://files.pythonhosted.org/packages/44/74/f2a2460684a1a2d00ca799ad880d54652841a780c4c97b87754f660c7603/cffi-1.17.1-cp311-cp311-musllinux_1_1_i686.whl", hash = "sha256:de2ea4b5833625383e464549fec1bc395c1bdeeb5f25c4a3a82b5a8c756ec22f", size = 454242 },
    { url = "https://files.pythonhosted.org/packages/f8/4a/34599cac7dfcd888ff54e801afe06a19c17787dfd94495ab0c8d35fe99fb/cffi-1.17.1-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:fc48c783f9c87e60831201f2cce7f3b2e4846bf4d8728eabe54d60700b318a0b", size = 478604 },
    { url = "https://files.pythonhosted.org/packages/34/33/e1b8a1ba29025adbdcda5fb3a36f94c03d771c1b7b12f726ff7fef2ebe36/cffi-1.17.1-cp311-cp311-win32.whl", hash = "sha256:85a950a4ac9c359340d5963966e3e0a94a676bd6245a4b55bc43949eee26a655", size = 171727 },
    { url = "https://files.pythonhosted.org/packages/3d/97/50228be003bb2802627d28ec0627837ac0bf35c90cf769812056f235b2d1/cffi-1.17.1-cp311-cp311-win_amd64.whl", hash = "sha256:caaf0640ef5f5517f49bc275eca1406b0ffa6aa184892812030f04c2abf589a0", size = 181400 },
]

[[package]]
name = "foo"
version = "0.1.0"
source = { virtual = "." }
dependencies = [
    { name = "brotlicffi" },
]

[package.metadata]
requires-dist = [{ name = "brotlicffi", specifier = ">=1.1.0.0" }]

[[package]]
name = "pycparser"
version = "2.22"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/1d/b2/31537cf4b1ca988837256c910a668b553fceb8f069bedc4b1c826024b52c/pycparser-2.22.tar.gz", hash = "sha256:491c8be9c040f5390f5bf44a5b07752bd07f56edf992381b05c701439eec10f6", size = 172736 }
wheels = [
    { url = "https://files.pythonhosted.org/packages/13/a3/a812df4e2dd5696d1f351d58b8fe16a405b234ad2886a0dab9183fb78109/pycparser-2.22-py3-none-any.whl", hash = "sha256:c3702b6d3dd8c7abc1afa565d7e63d53a1d0bd86cdc24edd75470f4de499cfcc", size = 117552 },
]

This locks a simple dependency tree:

foo v0.1.0
└── brotlicffi v1.1.0.0
    └── cffi v1.17.1
        └── pycparser v2.22

This model now clashes with pypy: The lockfile says 1.17.1, pypy wants the vendored 1.18.0.dev0.

Usually, the lockfile should guarantee that cffi 1.17.1 gets installed, but there’s already a cffi 1.18.0.dev0, and that distribution can’t be removed (no RECORD). But even if we could remove it, installing 1.71.1 would potentially be broken if pypy need the aforementioned tight integration through the 1.18 prerelease (Edit: At least brotlicffi seems to work with 1.17, too). On the other hand again, the lockfile is built in a way that we can install from just the lockfile, with details on the requirements elided, so we can’t tell anymore is 1.18.0.dev0 satisfies brotlicffi 1.1.0.0. There could even be an upper bound, and we would have to discard the lockfile and do an entire new resolution.

2 Likes

:+1: I have one project where I tried to apply the correct markers on the user side, and quickly gave up.

Alternatively, could cffi and the other packages mark themselves on PyPI as having no installable candidates for pypy?[1]


  1. Small detail, I don’t think there’s a method for doing so, outside of depending on a nonexistent package under pypy? ↩︎

Thanks for the concrete example.

The suggestion of this thread is that the “correct” fix is that brotlicffi’s requirement should not be cffi >= 1.0.0 but rather cffi >= 1.0.0 ; platform_python_implementation != "PyPy".

Perhaps it would be interesting to see how a pull request to that effect went.

IMO making packages explicitly handle pypy as a platform is the wrong solution that pushes the work away from the small amount of people who have a good chance of understanding the issue (installer/locker/pypy maintainers) towards a larger number of people who are going to cargo cult some half backed solution or have no solution at all.

Also, just removing the dependency on cffi is going to cause issues when the version does actually matter since suddenly people have a very easy time getting a situation where there is a version mismatch and they are going to file unnecessary bug reports.

I believe that the ideal user facing solution is to have a list by pypy of which pypl packages that the distribution “vendors”. This should be seen as immutable like the python version itself. If no resolution is found involving these versions, the install should fail.

That cross platform lockers can’t deal with this IMO doesn’t change the fact that this is the correct solution, it just means they need an extra solution. This could maybe be as simple an official file provided by pypy somewhere that clearly defines what version of pypy bundles which versions of third party libraries.

I also say this with potentially other python distributions in mind, although that is a bit more theoretical (i.e. I don’t know any real examples). But there could e.g. a scientific python distribution that includes CPython as well as a wide variety of third party apps and various GUI/CLI applications. In this case the versions used might tightly depend on each other or they might be special builds of packages. Here it would also be useful to tell installers “don’t touch these, work with what you have”.

(sorry for the terse response, I haven’t read the entire thread)

Can’t comment on the appetite aspects, but here’s something you can do today that’ll make things work in the manner you’d like: Remove the RECORD file from these vendored packages’ .dist-info folder.

That’ll make it so that installers can’t uninstall the package, and we can probably improve the error messages in pip/uv etc to be better at guiding users in the correct directions if necessary.

2 Likes

Or we can improve resolvers to treat those already-installed packages as fixed, and to solve for a dependency graph that uses them. That would be vastly more useful for most users - adding a package to an environment shouldn’t have to change the rest of the environment if there’s a solution that fits.

1 Like

Can you explain how you imagine that working? I suspect that whether it’s possible would be closely tied to the details of the resolution algorithm. Pip explores the dependency graph starting from the user’s stated requirements, adding edges as defined by package metadata, and pruning branches when they lead to failure. If we encountered one of these “fixed” packages, we’d have to apply a new global constraint (that the installed version of the fixed package must be used), and that might affect the graph we’ve already built, so we’d need to rebuild it (effectively restarting the resolve). That could easily be very costly on large resolves.

I don’t know how the PubGrub algorithm used by uv (and Poetry, I believe) would handle this situation, but it seems like they would have the same problem - discovering a new constraint that affects the whole dependency graph part way through the resolve.

Regrettably, resolution algorithms are often exercises in compromise (because we’re fighting exponential complexity) and so “fail with a somewhat more helpful error” is quite likely the best we can do here.

Wouldn’t it be identical to having a global constraint of cffi==1.18.1 or whatever as part of the initial requirements? We aren’t really discovering new information - we are confirming that information that we already know is actually relevant.

If it helps the resolver, we can just always add cffi==1.18.1 as part of the initial list of required “dependencies”.

2 Likes

If the packages and versions can be identified ahead of time, which it seems like they can, I don’t see any issue with the resolution.

Pip already supports constraints, these packages and versions could be appended to any user provided constraints.

What might make it a non-trivial add to pip is the UX. When it fails because one of these implicitly added vendored constraints, how do you communicate to the user the issue is with vendored constraints, not user requirements or constraints?

Pip does already identify constraints vs requirements during a ResolutionImpossible (unlike uv, so I don’t know about other resolvers), so the machinery is there, I just think it might be non-trivial.

2 Likes