Can vendoring dependencies in a build be officially supported?

Vendoring packages is complicated. It often involves rewriting imports or pulling other tricks. Patches sometimes need to be applied.
Usually, the source of vendored libraries is included in the source tree and committed, as is the case for pip’s vendored dependencies. I’m also aware that vendoring has been extracted from pip, but doesn’t recommend 3rd party usage.

It seems that publishing tools like setuptools could be involved in helping sort this out and make this a standard part of python packaging. The idea I’ve been thinking about is to make vendoring at build time a standard and supported behavior.

  • Can packages better accommodate this if they expect to be vendored? Could packages declare themselves as vendor_safe=True? What would this imply/require? (Pure python only? Relative imports only?)
  • How would packaging with vendored libraries be declared? Would a package need to declare a namespace for its vendored dependencies?
  • What is the bare minimum vendoring capability which would be useful? Does it need to support patching files?

I’m imagining that I could write one day a setup.cfg like so:

[options]
package_vendors =
    mypackage._vendor:requests
    mypackage._vendor:pyjwt[crypto]==2.3.0

and this means that when I run python -m build

  • I get a vendored copy of the latest requests in-tree, in the sdist
  • I get a vendored copy of pyjwt==2.3.0 as well
  • my package’s install_requires list is extended with the dependencies of my requests version and the dependencies of pjywt[crypto]
  • if requests doesn’t publish a version which sets vendor_safe=True, then the build would fail instead

Has this been discussed in the past? I’d be curious to read about past thoughts on this topic and to better understand the situation.

2 Likes

Why do you want to vendor packages? Generally, I’d consider it an anti-pattern. Much better to simply depend on the packages you need. For an application, you should typically be installing in an isolated environment (maybe using a tool like pipx) anyway, so there’s no need to worry about conflicts with other applications.

For the rare cases like pip where vendoring is necessary, it’s IMO acceptable for it to be a little bit difficult - it acts as a reminder that you should only be doing this if you need to.

7 Likes

I’ve been out of the loop for years now, but I think policy hasn’t changed much since then. Vendoring also makes it much harder for downstreams such as Linux distros to adopt and upgrade packages. Debian and Ubuntu have (had?) policies against vendoring because it makes it much more difficult to apply security patches for example, and back in the day we went to great lengths to unvendor packages.

2 Likes

You could do write a build backend that does that… but please don’t do that. As Paul pointed out, this is anti-pattern and there is only a small number of cases where it makes sense.

If you have a specific use-case where you think that would make sense, please let us know so that we can advise if this would indeed be the best option.

1 Like

I have two motivating cases from my work. In both cases we have considered and decided against vendoring – the complexity of doing so being one of the reasons.
(Maybe that’s fine. You could say that our decision not to vendor things was a successful outcome of this not being easier.)

Case 1: I maintain a package which is a building-block for many of our internal applications and some of our customers’ applications, so maximal compatibility is a goal. This leads to us not wanting to use many libraries, as there are conflicts this would or could introduce with the downstream applications. e.g. I don’t want to use jsonschema because we have applications which use specific versions of it.

Case 2: We depend on a package, call it weird_dep, which has many updates in its main branch but which does releases very infrequently and has not released bugfixes which we need. The main branch of weird_dep is stable, but we don’t want the semantics of a VCS dependency on a branch in our package metadata. e.g. Users can get a version when installing which we might never be able to fetch again. We’d like to release a version of our package which vendors weird_dep@main.


I’ve seen threads about allowing different versions of a package to be installed in the past, and they run aground on the issue that import semantics would need to change (and how!).

I started this thread because vendoring seems like the only feasible way to support having multiple versions of a package installed. It’s actually pretty similar – in some ways – to the way that npm installs packages.

That makes sense, but unless you depend on packages with very constrained dependencies, or you yourself depend on a very constrained version of that package, it should not be an issue.
The example you gave, jsonschema, does not have any runtime dependencies, so unless you need a very constrained version of it, depending on it should not really cause any issues.

Well, the correct answer here would be to not depend or the main branch, or if it is indeed stable, have a discussion with the upstream to figure out how to have more frequent releases. If it’s an internal package, you should likely be in a position to do that, if it’s an open source project, perhaps consider funding the upstream and ask for more frequent releases. This should not be a problem if the main branch really is stable as you say.
But I understand that sometimes that might be complicated, though I don’t think manually updating the vendored dependency in those cases should be much work. If you are running into this often, I’d say that it generally means something is probably wrong.

You don’t actually have to use import to import packages, you could totally write your own importer that supports multiple versions, though you might run into issue due to packages not expecting that to happen (like having global locks or something like that, that would be duplicated).

I don’t think these issues you are facing are common enough in the ecosystem to justify either supporting multiple version importing or adding a feature to normal build backends, like setuptools, to support automatic vendoring.
Personally, I think that would actually create a lot of issues, which would be a net negative. But that is just my opinion!

If these issues result from your specific corporate environment, these approaches could actually make sense for you. And they are something you could actually implement yourself.

Please let me know if I misunderstood something :sweat_smile:

1 Like

Even if I don’t need a constrained version of it today, any time it does a breaking change (major release if you use semver), I need to consider it. jsonschema recently released 4.0, so using it would put us in the position of having to support jsonschema 3.x and 4.x in the same codebase.

Sometimes supporting multiple versions of a library is easy or near-trivial, and sometimes it is not. And maybe it’s worth bumping your own library’s major version when you update the library dependency, maintaining the old release line, and avoiding support for both versions of the upstream library in a single version of your own package.

But it demonstrates that if you care a lot about compatibility, even trivial seeming dependencies can trip you up.

I agree with most of your points here. I’d like to be donating some of that most precious resource – developer time – to this upstream package, but it’s not in the cards. The situation is unfortunately complicated by organizational politics. Suffice it to say: it’s even weirder than the name weird_dep suggests. :wink:

But more than anything, I agree that this case should be extremely rare.

My goal in raising this thread was not so much to solve problems I’m seeing, but to see if this is a viable approach to a class of problems that users have raised over the years, my own included.

Although rare, python users have asked in the past for built-in support, as a feature of the language, for installing multiple versions of the same package side-by-side. Some examples:

I wonder if such use-cases wouldn’t be satisfied if users could

[options]
package_vendors =
    mypackage._vendor.requests:requests
    mypackage._vendor.requests200:requests==2.0.0

As far as I’m concerned, the asks for support for multiple versions of a package in the language are dead on arrival. The use-cases are too rare and the “solutions” are much too expensive.
But vendoring is doable without a single commit on cpython.

All of that said, I respect any opinion that it is not worth the effort, as supporting this is no small ask.

The answer here is isolating each of the applications from each other. Don’t make all of them use the same set of Python packages. Use a separate venv/virtualenv for each of them, or utilise a higher level of isolation for each application (Overview of Python Packaging - Python Packaging User Guide talks about this, in terms of “depending on …”).

+1 to what @FFY00 said. That said, I can understand how sometimes the needs of the upstream package can be different from your project.

In those cases still, you can depend on weird_dep@main… the following work:

pip install git+https://github.com/pypa/pip@main
pip install git+https://github.com/pypa/pip@ec8edbf5df977bb88e1c777dd44e26664d81e216

See VCS Support - pip documentation v23.3.2


Mixing multiple versions of a single package, in a single Python process, is usually a really bad idea. See the following post, from one of the threads you’ve mentioned:

2 Likes

If this were a ticket in a bug-tracker, I’d probably self-close at this point. I was curious if this was an interesting area to explore, but it clearly doesn’t appeal to people nearly as much as I was expecting.

There is one thing which I have failed to communicate accurately, for which I am sorry. Just to clarify:

I’m not maintaining the various applications I’m talking about. I’m a library maintainer who supports these applications. So my library libfoo is used by appbar. And even if you pipx install appbar, it doesn’t solve the problem that appbar and libfoo might have conflicting dependencies.

Yes, and VCS support is a great feature of pip! We’re quite likely to use a commit hash for the dependency, as it turns out. The difference between having a vendoring script pointed at weird_dep@main and having package metadata which declares a specific commit is not enormous. But there are differences, e.g. if the git repo goes down but pypi stays up.

We’ve explored (and often attempted to use) it thoroughly in the past. So it’s less that the idea isn’t “appealing”, and more that we’ve tried and decided it isn’t worth it.

We don’t very well communicate that Python application developers have to take responsibility for handling their dependency tree. None of the packaging tools we have come anywhere close to doing this magically for you, in large part because most packages have insufficient metadata to allow anything to be done automatically (and usually because there’s no way to determine ahead of time).

I get to have this argument a lot at work. pip is by far the best tool we have for obtaining code (and often builds) of packages from a variety of sources, and it does a good job of also obtaining the dependencies. But conflict resolution can only be done by the application developer (or it can be outsourced, and then the application developer must use the versions they are given [1]).

Vendoring is a perfectly valid way for an application developer to carry their expected dependencies (though it’s also perfectly valid for an application distributor to choose never to use it). But the complexities add up very quickly when you do it in libraries - it’s almost like copy-pasting the code into your own (and in many senses, it is literally copy-pasting the code into yours), but without carefully adjusting to isolate from other instances of itself, things get worse rather than better.

[1]: FTR, I consider a conda environment created without any version constraints an example of “outsourcing”, because they’ve add enough metadata to give you things that have been built/tested together. As soon as you start curating specific versions, you have to curate everything if you want it to work. And if you don’t specify constraints with pip, you’ll get latest of everything, which almost by definition hasn’t been tested until you test it yourself.

11 Likes

I fully support this. Diamond dependencies, and diamond dependency conflicts are very common in my experience. This is unavoidable, I can control my direct dependencies, but I can’t control what they depend on. A lot of resources are wasted trying to resolve these issues.

The recent release of Pydantic 2 is a case in point. It’s possible to make a Pydantic-dependent application compatible with either 1 or 2, but it’s hard. A massive number of the packages we depend on depend on Pydantic (and they depend on packages that depend on Pydantic…). One of our upstreams decide they are moving to Pydantic 2, this blocks us from using any versions of their package from that point on (including critical bug fixes), unless we can negotiate with all the other package maintainers of the other packages we depend on to either upgrade or do the work to make their codebase compatible with either Pydantic version. In practice, we do this, negotiation with upstream package developers, give them PRs, etc, but this is a lot of work.

I don’t think this is uncommon. I can point to well-regarded Python developers who have had to resort to vendoring, at least as a short term measure until diamond dependencies are resolved.

Yes, vendoring is an anti-pattern, but the alternatives are worse, bar full shading support, as in java.

I don’t figure out why so many people are saying that vendoring dependencies is an anti-pattern. I think otherwise it’s a great way to make sure that the application you build in a CI tool that generate a immutable artifact will be deployed the same way in a non-production environment and be deployed in production in the exact same way.

I don’t figure out why so many people are saying that vendoring
dependencies is an anti-pattern. I think otherwise it’s a great
way to make sure that the application you build in a CI tool that
generate a immutable artifact will be deployed the same way in a
non-production environment and be deployed in production in the
exact same way.

The main reason is that, if one of those dependencies has a security
vulnerability, a new or patched version of that dependency needs to
be re-vendored and a new version of the “immutable artifact”
published/installed even if your actual project has no new changes
at all. SBoM efforts go some way toward alleviating this concern,
but it’s still an added challenge and possible delay for the end
user who is otherwise left with a steaming pile of compromised
systems. Multiply it by all of the different things you’ve installed
each of which contains vendored dependencies and the situation can
quickly become unmanageable.

Another reason is that, if multiple projects have a dependency in
common, you’re unnecessarily installing multiple copies of the same
dependency. If each one ships a slightly different version, then a
program incorporating them into a greater whole may cease to
function as expected due to different parts accessing the “wrong”
version of that dependency.

Yet another reason, related to the last, is that when an upstream
project decides to start vendoring copies of their dependencies,
they may also decide to only care about those exact versions of
their dependencies rather than trying to be interoperable with
reasonable ranges of versions. This creates significant challenges
for downstream (re)packagers of the software in curated
distributions which have policies against embedding vendored copies
of things (for the above reasons or others).

The container image ecosystem has taken this approach to the
greatest possible extreme, leaving end users as the maintainers of
their own individual Frankendistros. It’s become popular to pretend
these various concerns are paranoid fantasy, but orgs that actually
care about security are left needing to rebuild any images they use
from scratch with automated testing to make sure it all still works
as intended, rendering much of the promise of “vendored”
dependencies moot. If you’re just running the convenient “immutable
artifact” upstream supplied, you either have a toy use case or
you’re sipping tea in the middle of a minefield of your own
construction.

2 Likes

I’m in the “vendoring is an anti-pattern” camp for all the reasons @fungi mentions.

However, the pervasive desire to vendor to me points to a deficiency in Python packaging, and that would be interesting to understand better. I’m not too optimistic that there’s a better solution out there though.

6 Likes

Personally, I think it’s less of a deficiency in packaging, and more one in environment management. With virtual environments, the normal approach is to put each app in a virtualenv (e.g., pipx). That’s logically no different than vendoring your dependencies, or container-based approaches. We don’t have environment management solutions that encourage sharing of dependencies, so we’re not getting the advantages of shared dependencies and from there, vendoring seems like a small extra step.

Pip, and in general our package management, is perfectly able to manage dependencies for multiple apps[1]. But the ecosystem isn’t set up to take advantage of that.


  1. assuming they don’t conflict, of course ↩︎

2 Likes

I’d argue it’s a deficiency in managing compatibility.

In a magical utopia where packages never break compatibility, you wouldn’t need to vendor because you’d be able to trust that the world won’t break underneath you.

In many ecosystems, compatibility breaks are rare enough or designed well enough that you don’t worry about it and just trust that your code will work against any likely version (e.g. we have a single platform tag for all versions of Windows).

Unfortunately, in Python, we’ve got a culture where things change (in breaking ways) often enough between releases that vendoring becomes one of the best ways to trust that your code will continue to work. (And yes, CPython is one of the worst offenders, which is why I regularly recommend people vendor the whole runtime.)

Once you accept that and start working under that assumption, then yeah, there are ways to make environment or package management make thing simpler, or language features (e.g. relative imports) that can help libraries make themselves more vendorable, but I don’t think we can point to a cause anywhere other than how often compatibility breaks really occur.

2 Likes

I think it is, because as the “end user” of the venv, you can always update your dependencies, and you have control over them. As the end user of a library/application that vendors, you really have no control over those vendored libraries at all.

assuming they don’t conflict

That’s the problem though, isn’t it? You vendor because you can never guarantee they won’t conflict with whatever else is in the environment.

Which is a human problem, and that’s why I’m not optimistic there’s a technical solution waiting for us to discover.

Another aspect of the problem is that, vendoring aside, there must be one and only one version of a library in the environment. And even when you vendor you have to be sure that those objects imported through the vendored library don’t leak out, or are compared to env-global objects, etc, because they are different objects. I recently helped a colleague debug a problem where they couldn’t understand why they had two instances of what they thought was the same class. Turns out they were imported through different module paths.

3 Likes

Perhaps the deficiency is that Python packaging and Python environment management are not well enough integrated.

It is unfortunate. In my experience the situation with Python in this regard is not as bad as some other ecosystems. But overall I think the trend towards breaking things is accelerating, in many cases justified by a need for “security fixes” which are not obtainable separately from behavior changes. It is also in some cases a communication problem where users rely on undocumented behavior and maintainers are then reluctant to break that behavior, even though users were never “allowed” to rely on it; this is exacerbated if the documentation is out of date or poor to begin with.

Which is all to say that, insofar as we can minimize the need or desire to vendor dependencies, I’d say part of it involves keeping documentation good and up to date, which can include not releasing behavior changes until the documentation is ready.

Indeed. Compatibility is hard, and requires resources. OSS maintainers for projects without foundations and endowments are routinely stretched to their limits attempting to supply what is demanded of them. For my own part, after decades of open source contributions, I prioritize my own personal time, health, and happiness. If users have to change a line of code now and then, consider that the price of free software.

4 Likes

I disagree with every previously stated reason for the popularity of vendoring and instead I think it’s because the Python interpreter can only use one version of a package at any moment which often makes resolution tricky. I however don’t agree with the viewpoint of, for example Armin, that supporting that is in the best interest of the ecosystem.

5 Likes