Can vendoring dependencies in a build be officially supported?

BrenBarn · December 22, 2023, 10:34am

That’s not totally true, or at least it depends what you mean by “version of a package”. As was mentioned (maybe on another thread) you can import the same (or two versions of the same) package if it’s accessed via two different import paths. So it can be the same package in the sense of being two versions of the same conceptual library, but the importing code has to somehow specify which version to import.

Current tools don’t provide an easy way to install the same package under two import paths, but you can do it by moving stuff around if you want to get crazy, so you could have import version1.package and import version2.package. And this is essentially how people usually do things to import their vendored version, which is why I’m a bit unsure what you mean about the interpreter only being able to use one version.^[1]

Trying to use multiple versions in the same running code I think has the potential for headaches in just about any language. Having different versions installed in the same environment, and having one chosen on a particular run, is conceivable, but might get confusing. (And as you push up to looser notions of “two versions of the same library” you get closer and closer to things like venvs or conda envs.) I’ve sometimes wondered what Python would be like if our dependency constraints were specified directly in the code as part of the import rather than “up front” as part of an environment, so you would do like import somepackage>3. (Possibly more like JavaScript, which is not a pleasant thought. :-))

It’s still true that only one version will be used at any split-second “moment” during execution of the program, but I think that’s true for any language if we don’t get down to the level of CPU core scheduling and stuff. ↩︎

steve.dower · December 22, 2023, 11:06am

This is vendoring

Of course, it also requires modifying the library to use relative imports, and care (by the original developer) not to use type checks based on types from vendored libraries. And we don’t have a culture of designing libraries like this - even less now that people specify concrete types on everything (duck typing makes it okay to mix versions of types, provided you don’t compare type objects). So any possibility of making this approach work is rapidly receding outside of devs who want to support it more than other development helpers.

barry · December 22, 2023, 6:19pm

Right. With tricks you can import multiple different instances (same or different version) of a module, and with other tricks you can even “install” ^[1] multiple different instances of the same package. The problems come where these two different instances – and all the objects inside those modules – interact, because what happens if you get two instances of the seemingly same class but they have different APIs? I honestly don’t know how languages like JavaScript, which I think has this multiple-install capability, handle such cases. Is it just buyer beware?

in the sense of craft an environment where both packages are importable ↩︎

pradyunsg · December 22, 2023, 6:26pm

Yes.^[1]

Based on my not-so-limited experience, although it’s often encouraged to have “opaque” objects (i.e. private internal state) in that ecosystem. ↩︎

pitrou · December 22, 2023, 6:29pm

This can only work if the multiple-versioned dependencies are private to their respective dependents. For example, if both packages A and B must depend on different versions of library C, they should not expose C’s objects in their own APIs.

This is probably easy to achieve for most utility dependencies such as crypto or HTTP libraries. If you’re depending on requests to expose some higher-level functionality, chances are you won’t expose the requests classes in your own APIs.

sirosen · December 22, 2023, 7:54pm

I think we should note that library authors can publish new, distinct, “v2” packages, to allow users to install their mutually incompatible versions side by side. It’s relatively rare though.

To stick with the pydantic example from above, in which v1 and v2 are incompatible. Imagine the following (fanciful) future:

pydantic reserves all pydantic* package names
pydantic declares itself “rename safe”, meaning all internal imports are relative and no features rely on explicit module names, etc
by default, users installing get a package named pydantic
a user can install pydantic<2;as_name=pydantic1, which installs the package under the name pydantic1

This would establish a future in which is possible to install the same package multiple times under different names. It’s interesting to think about and play with as an idea.

Is it a good idea? Does it solve the same problems as vendoring? To both, my answer is no. Probably it’s not a good idea at all. It works wonders for applications trying to use direct dependencies which their dependencies also use. It does little for library developers who want to be mutually compatible with one another, unless they are lucky or agree upon conventions in how they use it.

Downstream renaming has very different properties from it happening upstream, as a maintainer strategy.

Maybe there’s some useful kernel of an idea here. Renaming your package in a major version has benefits for the downstream consumers, but it’s seldom done even by the most mainstream Python packages with the biggest impact. Why is that? Names are sticky, but also renaming a package requires maintainers to revisit all sorts of infrastructure (e.g publishing pipelines). Should we work to better support and more strongly encourage a package publishing under different names for different versions?

As mentioned, the problems here are a mix of our technical constraints and the culture of Python developers.

In my own libraries, I avoid dependencies as much as I can justify. To a degree that’s healthy – avoiding unnecessary externalities and liabilities – but I think it’s currently necessary to a harmful degree. For example, imagine the ecosystem impact if one popular package, e.g. flask, internally used another popular package on a specific major version range, e.g pydantic>1. In practice, this means that a library developer has to be very cautious about pulling in dependencies, even in cases where an application developer would very definitely choose to include the dependency.

All in all, vendoring is a nice fix for the cases which really demand it, but it’s not the same as the upstream package making a decision to try to tackle these problems. I’d like people to keep thinking about how to make the diamond application dependency cases and library dependency cases better, perhaps centered around ways that packages can better support this for their consumers.

BrenBarn · December 22, 2023, 8:24pm

Yeah, a few minutes after posting I edited my post to add a note clarifying that :-). Because the thing is, since this is vendoring, and can be done, what does it mean to say that “the Python interpreter can only use one version of a package at any moment”?

Right, but it requires a lot of care to make sure nothing slips through the cracks. It’s not just that the actual objects that need to be masked; the behavior of the underlying library can seep through in other ways. If library A vendors B and calls B.somefunc() and does something depending on the result , and the user also imports a “real” version of B directly and calls B.somefunc() and does something depending on the result, and the two versions of B have slightly different behavior for somefunc, then even if A never directly returns the result from B, the fact that the underlying behavior is different can cause differences in subsequent processing to bubble up to the user. This can be very confusing.

It gets even worse if the library has any sort of global configuration state, since a user who imports the “real” version may expect to be able to configure it and will be baffled to discover their configuration is having no effect on the vendored version. Likewise if the user ever feels the need to use a debugger, it will be doubly confusing to see it step through different versions of the same code at different times.

My point with this is mostly just to say that I think the main problems with vendoring, and the reasons why we should try to avoid it whenever possible, are due to the difficulties it creates for human minds, not technical difficulties it creates for tools. It is confusing for a human user to have multiple separate parts of their code that are internally using slightly different versions of a single underlying library. It makes it hard to reason about the code’s behavior, and it makes it extra confusing to debug. Making it easier to vendor dependencies only makes it easier to create situations where these problems will be surfaced to users.

steve.dower · December 25, 2023, 12:09pm

It means that from the interpreter’s point of view, the package under a different name is a different package. Where name is “the fully qualified module name that is used in sys.modules for caching purposes.” So if you manage to change that name, you now have a totally distinct package. Merely changing the search path used to find a package won’t affect the name, and so you still won’t be able to trivially import a second version.

csm10495 · December 25, 2023, 6:22pm

Pipx is a good way to solve this for clis. We also have shiv, pex, making a venv, etc.

To me none of it is really convenient at least for homelab usage. I have a standard python and a special package in PYTHONPATH with my stuff. I call stuff via python -m ...

I’m reminded of: xkcd: Python Environment and xkcd: Standards

The standards and environment stuff have honestly just gotten worse and more confusing over time for folks who can’t or won’t invest as deeply. I think this directly leads to desires to vendor packages, copy paste modules, etc.

Additional flexibility unfortunately leads to more fragmentation and divergence here.

brettcannon · January 1, 2024, 9:26pm

That xkcd may not suggest what you think it does IMO: Deconstructing xkcd.com/1987/

I’m going to pull a maintainer move here and ask if you have a concrete proposal to improve things and/or if you’re actively doing anything to help, or are you just venting?

csm10495 · January 1, 2024, 11:20pm

It’s a bit of all of the above. I’m my eyes having one mainstream way to do things is better.

Say we have pip: it can install packages.
Then we have venv: it makes a virtual environment.

Maybe having a pip functionality to directly install a dependency (like a cli) into a fresh venv the link to PATH?

Then we sort of have pipx, shiv, pex which sort of do parts of that already.

Then we have setup.py/setup.cfg/pyproject.toml. Assume we pick one: then it would be nice to have a single command to make a venv for it (for dev or usage).

We then have the lack of dependency hash files or something similar to guarantee the same transitive dependencies get used by all. Adding a standard version of that would be nice.

In my eyes pip and our dependency management should be a more centric experience.

Adding a dependency to something, generating pseudo-execuables, via venvs etc. should all be standardized.

There are so many options that it’s overwhelming. Pick a recommendation and have the community get it to be the best.

It’s funny because setup.py seemed fine most of the time. It was nice that it was standardish. Now we have all these other files and builders can have other files too.

I guess it’s a vent, with a tidbit of ideas scattered around. The current world is confusing. I think it leads to things like people wanting vendoring, people hitting issues with different environments, etc.

In terms of the xkcd, I choose to use my original interpretation: the python environment is complicated.

jeanas · January 2, 2024, 12:10am

The profusion of tools, lack of standardized workflows and resulting confusion has been discussed to death in numerous threads, some with multiple hundreds of posts. In my humble opinion, it does not help to restate the problems. They are already well known.

If a packaging council gets created (Draft PEP: Python Packaging Governance), it may define some way of changing this. Meanwhile, the PyPA only has a process for approving standards, not to bless tools, meaning that the main constructive action you can take if you want a more unified experience is to contribute to one of the tools that provide such an experience (e.g., Hatch, PDM, Poetry) in order to help it gain popularity by better serving its users’ needs. You can also help people find their way around the landscape by contributing to the packaging.python.org site.

sirosen · January 2, 2024, 12:49am

I don’t think that the perceived complexity around packaging workflows has much to do with vendoring.

Vendoring is a very specific and niche thing to do. There are a few reasons why projects do it, but I think the main one is allowing for two distinct versions of a package to be installed side by side.

I don’t see what pipx, pipsi, etc have to do with this. Libraries vendoring other libraries are not applications. It’s just not the same case at all.

IMO this thread is at the end of its lifecycle. I don’t think we’re likely to squeeze many more useful insights out of this one. If you want to talk about stuff that isn’t vendoring, consider starting a separate thread?