Can vendoring dependencies in a build be officially supported?

Vendoring packages is complicated. It often involves rewriting imports or pulling other tricks. Patches sometimes need to be applied.
Usually, the source of vendored libraries is included in the source tree and committed, as is the case for pip’s vendored dependencies. I’m also aware that vendoring has been extracted from pip, but doesn’t recommend 3rd party usage.

It seems that publishing tools like setuptools could be involved in helping sort this out and make this a standard part of python packaging. The idea I’ve been thinking about is to make vendoring at build time a standard and supported behavior.

  • Can packages better accommodate this if they expect to be vendored? Could packages declare themselves as vendor_safe=True? What would this imply/require? (Pure python only? Relative imports only?)
  • How would packaging with vendored libraries be declared? Would a package need to declare a namespace for its vendored dependencies?
  • What is the bare minimum vendoring capability which would be useful? Does it need to support patching files?

I’m imagining that I could write one day a setup.cfg like so:

[options]
package_vendors =
    mypackage._vendor:requests
    mypackage._vendor:pyjwt[crypto]==2.3.0

and this means that when I run python -m build

  • I get a vendored copy of the latest requests in-tree, in the sdist
  • I get a vendored copy of pyjwt==2.3.0 as well
  • my package’s install_requires list is extended with the dependencies of my requests version and the dependencies of pjywt[crypto]
  • if requests doesn’t publish a version which sets vendor_safe=True, then the build would fail instead

Has this been discussed in the past? I’d be curious to read about past thoughts on this topic and to better understand the situation.

Why do you want to vendor packages? Generally, I’d consider it an anti-pattern. Much better to simply depend on the packages you need. For an application, you should typically be installing in an isolated environment (maybe using a tool like pipx) anyway, so there’s no need to worry about conflicts with other applications.

For the rare cases like pip where vendoring is necessary, it’s IMO acceptable for it to be a little bit difficult - it acts as a reminder that you should only be doing this if you need to.

3 Likes

I’ve been out of the loop for years now, but I think policy hasn’t changed much since then. Vendoring also makes it much harder for downstreams such as Linux distros to adopt and upgrade packages. Debian and Ubuntu have (had?) policies against vendoring because it makes it much more difficult to apply security patches for example, and back in the day we went to great lengths to unvendor packages.

2 Likes

You could do write a build backend that does that… but please don’t do that. As Paul pointed out, this is anti-pattern and there is only a small number of cases where it makes sense.

If you have a specific use-case where you think that would make sense, please let us know so that we can advise if this would indeed be the best option.

I have two motivating cases from my work. In both cases we have considered and decided against vendoring – the complexity of doing so being one of the reasons.
(Maybe that’s fine. You could say that our decision not to vendor things was a successful outcome of this not being easier.)

Case 1: I maintain a package which is a building-block for many of our internal applications and some of our customers’ applications, so maximal compatibility is a goal. This leads to us not wanting to use many libraries, as there are conflicts this would or could introduce with the downstream applications. e.g. I don’t want to use jsonschema because we have applications which use specific versions of it.

Case 2: We depend on a package, call it weird_dep, which has many updates in its main branch but which does releases very infrequently and has not released bugfixes which we need. The main branch of weird_dep is stable, but we don’t want the semantics of a VCS dependency on a branch in our package metadata. e.g. Users can get a version when installing which we might never be able to fetch again. We’d like to release a version of our package which vendors weird_dep@main.


I’ve seen threads about allowing different versions of a package to be installed in the past, and they run aground on the issue that import semantics would need to change (and how!).

I started this thread because vendoring seems like the only feasible way to support having multiple versions of a package installed. It’s actually pretty similar – in some ways – to the way that npm installs packages.

That makes sense, but unless you depend on packages with very constrained dependencies, or you yourself depend on a very constrained version of that package, it should not be an issue.
The example you gave, jsonschema, does not have any runtime dependencies, so unless you need a very constrained version of it, depending on it should not really cause any issues.

Well, the correct answer here would be to not depend or the main branch, or if it is indeed stable, have a discussion with the upstream to figure out how to have more frequent releases. If it’s an internal package, you should likely be in a position to do that, if it’s an open source project, perhaps consider funding the upstream and ask for more frequent releases. This should not be a problem if the main branch really is stable as you say.
But I understand that sometimes that might be complicated, though I don’t think manually updating the vendored dependency in those cases should be much work. If you are running into this often, I’d say that it generally means something is probably wrong.

You don’t actually have to use import to import packages, you could totally write your own importer that supports multiple versions, though you might run into issue due to packages not expecting that to happen (like having global locks or something like that, that would be duplicated).

I don’t think these issues you are facing are common enough in the ecosystem to justify either supporting multiple version importing or adding a feature to normal build backends, like setuptools, to support automatic vendoring.
Personally, I think that would actually create a lot of issues, which would be a net negative. But that is just my opinion!

If these issues result from your specific corporate environment, these approaches could actually make sense for you. And they are something you could actually implement yourself.

Please let me know if I misunderstood something :sweat_smile:

Even if I don’t need a constrained version of it today, any time it does a breaking change (major release if you use semver), I need to consider it. jsonschema recently released 4.0, so using it would put us in the position of having to support jsonschema 3.x and 4.x in the same codebase.

Sometimes supporting multiple versions of a library is easy or near-trivial, and sometimes it is not. And maybe it’s worth bumping your own library’s major version when you update the library dependency, maintaining the old release line, and avoiding support for both versions of the upstream library in a single version of your own package.

But it demonstrates that if you care a lot about compatibility, even trivial seeming dependencies can trip you up.

I agree with most of your points here. I’d like to be donating some of that most precious resource – developer time – to this upstream package, but it’s not in the cards. The situation is unfortunately complicated by organizational politics. Suffice it to say: it’s even weirder than the name weird_dep suggests. :wink:

But more than anything, I agree that this case should be extremely rare.

My goal in raising this thread was not so much to solve problems I’m seeing, but to see if this is a viable approach to a class of problems that users have raised over the years, my own included.

Although rare, python users have asked in the past for built-in support, as a feature of the language, for installing multiple versions of the same package side-by-side. Some examples:

I wonder if such use-cases wouldn’t be satisfied if users could

[options]
package_vendors =
    mypackage._vendor.requests:requests
    mypackage._vendor.requests200:requests==2.0.0

As far as I’m concerned, the asks for support for multiple versions of a package in the language are dead on arrival. The use-cases are too rare and the “solutions” are much too expensive.
But vendoring is doable without a single commit on cpython.

All of that said, I respect any opinion that it is not worth the effort, as supporting this is no small ask.

The answer here is isolating each of the applications from each other. Don’t make all of them use the same set of Python packages. Use a separate venv/virtualenv for each of them, or utilise a higher level of isolation for each application (An Overview of Packaging for Python — Python Packaging User Guide talks about this, in terms of “depending on …”).

+1 to what @FFY00 said. That said, I can understand how sometimes the needs of the upstream package can be different from your project.

In those cases still, you can depend on weird_dep@main… the following work:

pip install git+https://github.com/pypa/pip@main
pip install git+https://github.com/pypa/pip@ec8edbf5df977bb88e1c777dd44e26664d81e216

See VCS Support - pip documentation v22.0.3


Mixing multiple versions of a single package, in a single Python process, is usually a really bad idea. See the following post, from one of the threads you’ve mentioned:

If this were a ticket in a bug-tracker, I’d probably self-close at this point. I was curious if this was an interesting area to explore, but it clearly doesn’t appeal to people nearly as much as I was expecting.

There is one thing which I have failed to communicate accurately, for which I am sorry. Just to clarify:

I’m not maintaining the various applications I’m talking about. I’m a library maintainer who supports these applications. So my library libfoo is used by appbar. And even if you pipx install appbar, it doesn’t solve the problem that appbar and libfoo might have conflicting dependencies.

Yes, and VCS support is a great feature of pip! We’re quite likely to use a commit hash for the dependency, as it turns out. The difference between having a vendoring script pointed at weird_dep@main and having package metadata which declares a specific commit is not enormous. But there are differences, e.g. if the git repo goes down but pypi stays up.

We’ve explored (and often attempted to use) it thoroughly in the past. So it’s less that the idea isn’t “appealing”, and more that we’ve tried and decided it isn’t worth it.

We don’t very well communicate that Python application developers have to take responsibility for handling their dependency tree. None of the packaging tools we have come anywhere close to doing this magically for you, in large part because most packages have insufficient metadata to allow anything to be done automatically (and usually because there’s no way to determine ahead of time).

I get to have this argument a lot at work. pip is by far the best tool we have for obtaining code (and often builds) of packages from a variety of sources, and it does a good job of also obtaining the dependencies. But conflict resolution can only be done by the application developer (or it can be outsourced, and then the application developer must use the versions they are given [1]).

Vendoring is a perfectly valid way for an application developer to carry their expected dependencies (though it’s also perfectly valid for an application distributor to choose never to use it). But the complexities add up very quickly when you do it in libraries - it’s almost like copy-pasting the code into your own (and in many senses, it is literally copy-pasting the code into yours), but without carefully adjusting to isolate from other instances of itself, things get worse rather than better.

[1]: FTR, I consider a conda environment created without any version constraints an example of “outsourcing”, because they’ve add enough metadata to give you things that have been built/tested together. As soon as you start curating specific versions, you have to curate everything if you want it to work. And if you don’t specify constraints with pip, you’ll get latest of everything, which almost by definition hasn’t been tested until you test it yourself.

5 Likes