Proposal: overrides for installers

This is a proposal for a potential solution to some common issues with packaging. The main idea is to let users define overrides for the package installers for the cases where the normal behavior is either broken because of poor package metadata or results in something that is not what the user actually wants.

Currently this proposal touches only on the UX aspect. It is born from questions that I keep seeing (on StackOverflow, here, and other places). And I guess this proposal represents the kind of answers I wish I could give.

I wanted to wait until this is a bit more mature. But there are other discussion threads going on right now, that I feel are closely related, so I thought I might as well publish this now and take advantage of this momentum. This still lacks structure, and looks like a brain dump rather than a fully formed proposal. This is a naive take that does not concern itself much with actual feasibility. And finally this is a work in progress, changes are to be expected.

What do I expect out of this? Personally I am not impacted by the pain points that this aims to alleviate, so in the end my interest is relatively limited. I guess, it was important to me to write down these ideas, whether something comes out of it or not. With all that said, feedback is welcome.

The full (live) document is here:

And the short of it is right below…


This is a proposal for a solution to help alleviate some frequent pain points with Python packaging. Mainly those things:

  • Fix bad dependency requirements in metadata
  • Fix bad build system in dependencies
  • Provide more installation options per-dependency (--pre, --config-settings, --index-url, --find-links, etc.)
python -m pip install Application --overrides overrides.toml

where overrides.toml looks like this:

[Library]
requirements = [
    "LibraryNightly[cpu]",
]

[LibraryNightly]
index-url = "https://index.internal/simple/"
pre = true

In this scenario, the user wants to install Application. But in case it has a dependency on Library the user wants pip to install LibraryNightly with the cpu extra instead. Additionally the user wants that pip looks for LibraryNigtly on an internal index exclusively, and that pip considers pre-releases of that LibraryNightly. Other potential dependencies should be handled as usual,
including potential dependencies of LibraryNightly.

7 Likes

I like the idea of an override mechanism for dependencies especially for case of overly strict upper bound/bad dependency metadata. Given the very long GitHub issue on this topic this is definitely pain point many have experienced.

The other fields (pre, config settings, etc) though feel unnecessary for a lot of common use cases and add fair amount of complexity to proposal. I’m neutral on them as unlikely to use them, but lean for first improvement on allowing dependency overrides, restricting the initial proposal to just overriding requirements would be helpful. It also lowers chance of unexpected interactions if we can view it as solely replacing one dependency constraints by a user provided one.

Yes, in my mind, this whole thing is meant to be split in smaller pieces (although I guess none of them would be small to implement in the installers).

The focus of this proposal is really only an idea of a UX which itself is simple to comprehend from the user’s point of view but would cover a lot of ground. If a new file format were to be added (which seems relatively likely), then I think it should be somewhat multi-purpose while still staying consistent. And I think this proposition achieves that (or at least that is the intention).

A first step could be the notation for dependency overrides for example, and then other features can be added later. Similar to pyproject.toml, intially it had [build-system] (and [tool.*]), then [project] was added later on.

IMO, this makes sense. My only worry is how common this use-case is.

The config design for your overrides.toml might be a bit tricky, so I wonder if a Python file where you can provide a “transform” method wouldn’t just be a bit better?

1 Like

This would be useful for implementing PEP 708 - Extending the Repository API to Mitigate Dependency Confusion Attacks, which also needs some kind of override file. @dstufft will have contributions.

Yes. Since I first read the Google document that is now PEP 708, I have wanted to investigate if and how this proposal can be rewritten to accommodate for it, maybe take over some ideas from the file format suggested in the Google document. But I have not gotten to it yet.

Also this came back up, and I want to re-investigate it as well.

The proposal contains many links to use cases taken from questions I have seen on StackOverflow. But I do not know if it is a good enough way of measuring how often this proposal would be used if it were to be implemented.

For sure there are plenty rough corners in this first version, I would expect some compromises to be made to keep things approachable.

I see what you mean. My first reaction is that I feel like for this proposal to be useful it should target inexperienced Python users. I feel like more experienced users are able to setup other solutions for the targeted use cases (examples: setup a simpleindex, rebuild wheels, fork and fix a poorly packaged library). That is why I feel like a static configuration file is a better fit than a script. But I might be looking at this from the wrong angle.

I’ve had this use case. It looks like several users of Poetry have as well: Ability to override/ignore sub-dependencies · Issue #697 · python-poetry/poetry · GitHub

1 Like

Additionally other languages have also had common needs for dependency override mechanism. In Java using maven you have dependency exclusions. Javascript’s yarn has similar dependency override mechanism. Rust’s cargo has override mechanism.

I think if diverse set of other languages have commonly found value in this mechanism + that lengthy pip issue is good evidence that there are enough pain points where some override mechanism is useful. My personal experience is most of time I want this it’s because dependency I use has overly strict upper bounds. Occasionally I’ll even see a library uses equality pins for many of it’s dependencies treating it sorta like a lock file instead of all versions it may support. Even major libraries sometimes have done this. For a long time tensorflow had many pins/very strict requirements in it’s setup.py.

At moment my workaround to this problem is download wheel with dependencies I want to override, modify dependency metadata in wheel, re-upload wheel to private company index and intentionally shadow public version. This works but is heavy handed way that at minimum requires using your own index server. As internal index servers are commonly shared across company (or at least department+) this also means my team’s override affects all teams in company. Intentionally shadowing public package like this is also awkward for dependency confusion attacks case and I know security isn’t fond of this approach.

3 Likes

To clarify - I assume that the intention here is for this to be a standard file, supported by all installers, and not simply a pip feature request (because there’s already a pip issue for this capability)? I believe that at least pip, poetry and PDM have their own resolvers, so the intention would be for all of these to support this file? And I think pipenv and pip-tools might need support as well, although this might be easier to add, as I believe they re-use pip’s resolver. What about conda? Would you expect them to support this?

Reading that thread, the poetry maintainers seem to have significant reservations about this functionality, and I think that mirrors the pip maintainers’ feelings (it certainly reflects mine). The relevant pip issue has already been linked. And that also links to a further pipenv issue. So the capability is clearly wanted, but tool maintainers have reservations, and no-one has yet managed to implement anything in a workable manner. So while discussing the issue centrally here is useful, people should understand that the lack of an existing capability isn’t just because no-one has proposed it yet…

I will also note that it is possible (not easy, by any means, but possible) to override dependency metadata right now:

  1. Download the offending wheel, modify the metadata and save the modified wheel as a local version, and install using that.
  2. Use pip install --dry-run --report, or something like pip-tools or pip freeze to do the resolution and report a fully pinned set of projects, then modify the pin for the project you want to override, and install using --no-deps.

Neither option is easy, but I’m not sure how easy we want to make a potentially-breaking operation like this. But I do think it’s worth noting that the proposal makes doing this easier, rather than allowing something that’s not possible right now.

Other practical points that I think need to be considered here:

  1. It’s likely to be a lot of work to implement this, in multiple projects. Assuming that the project maintainers will implement this is unrealistic - given that the poetry and pip issues have been around for years, I think something would have been implemented by now if it was easy for the maintainers to do so.
  2. It’s not obvious how users will determine what to override - a recurring theme on the pip tracker is that the reporting of conflicts in complex cases is pretty unhelpful. We haven’t been able to work out how to give better diagnostics yet, so this is likely to be an ongoing problem. As a result, I can see us getting into a situation where people are using overrides “because someone on the internet said it might help”, without context or checking. That, IMO, is going to be far more damaging in the long run than the current situation.
  3. By overriding, we make it less likely that people will report dependency issues to project maintainers. While we have seen reports that maintainers aren’t always willing to change dependencies, and there’s always a need for a “quick fix” while waiting for any upstream change, nevertheless it’s important IMO that we encourage discussion of the issues around dependency conflicts, and encourage projects to explain why they choose restrictive dependency specifications. Allowing users to just override locally means that problems are much more likely to remain unreported.

As I noted above, you could use a local version suffix (which would be seen as later than the PyPI version, and hence preferred). But yes, this is the current workaround here - as you say it’s a bit heavy handed, but it’s good to have confirmation that it works.

2 Likes

Paul sums it up in detail, but for those who value my opinion, this seems like per-installer configuration. I don’t want different installers having the same settings - that’s the point of them being different! Just like I don’t want different virtual environments (necessarily) having the same settings.

This file would be a contract between the installer and the user. It doesn’t need to be specified any more broadly than that - it’s purely UX.

1 Like

In my mind, this is per-invocation: <installer.exe> --overrides overrides.toml, if that makes any difference. And, if we ever get this far, then yes the intention is to have the file format specified in a standard, so that such a file is portable from one installer to the other.

Yes. Installers would choose to support this or not. I believe the abstractions chosen in this proposal match those in use by current installers, at least in the PyPA neighborhood (including Poetry, pip-tools, and so on). I have not considered conda, I do not know enough about it (if I am not mistaken there is a concept of channels, I do not know if it is relevant and if yes then I do not know how it would fit in this). But in principle, yes, if it is good for other installers, then why not try to make it work for conda as well.

Yes. I am afraid it will not be easy.

Yes, true. It would be interesting to investigate how users behave around similar features in other ecosystems.

Indeed, that is a risk I have not thought of. Same as for the previous point, it could end up being an attractive nuisance and some education would be needed.

That multiple installers (in Python and other ecosystems) all encounter the same kind of issues reinforces my thinking that collaboration over a standard file format is worthy.

I want to point that this proposal is not only about overriding dependencies. Another goal is to offer the possibility to provide installation options on a per-dependency basis instead of global such as: --pre, --ignore-requires-python, --config-settings, --find-links, --index-url, and so on. So this could be the “repository file” of PEP 708, and cover a lot more use cases.

I also want to state again that I am clear on the fact that for some use cases such a feature should be used as a last resort. The right ways would still be to have well behaved packages with clean metadata (including build system), to setup something like simpleindex. But for some use cases, I do not recall having seen any better alternative, for example for things like Core Metadata’s Provides-Dist and Obsoletes-Dist.

1 Like

Collaboration over the format is worthwhile if the functionality is available in multipe tools. But if this proposal doesn’t get buy-in from tools to implement it (and commitment from someone willing to do the work) then IMO it’s premature to define a file format.

From pip’s POV, I’m still at best +0 on having the override feature at all, and I wouldn’t commit to anything being implemented by the pip maintainers, so community resource would need to step up for this, not just to implement it but also to argue the case for the feature, and resolve questions about how it would interact with the rest of pip (not all of which are going to be addressed by a generic standard).

From the sound of what you’re saying here, I think it sounds more like we should be saying “here’s a bunch of functionality that’s already implemented in multiple installers, let’s define a unified file format to control it” rather than trying to define a format for something and then hope it gets implemented (especially as there’s a risk that when we do the implementations, we find that there’s problems with the way the format is defined and need to revisit the spec anyway).

Absolutely. Until there is at least one proof of concept implementation in at least one or two installers, there is nothing to standardize. I guess writing down some kind of somewhat realistic file format makes it more tangible, and easier to reason about the features (at least for me).

It does - one thing this has done is make me think about the use cases.

It seems to me that the only valid use case for this sort of override is when building an application. You cannot expect to supply an override when installing a library, as that essentially means that your library is by design not installable based on standard metadata, and I don’t see how that makes any sense. Once a library requires an override, anything using that library also requires that override, all the way up the chain - so the override should be metadata, not a command-line choice. And that’s a whole different discussion.

So we’re talking here about building an application, which should have all of its dependencies pinned, ideally via a lockfile, but we don’t have those standardised yet, so typically with a fully pinned requirements file. The lockfile should be built from the environment that the application was tested in, by something like pip freeze, and installed using pip install --no-deps. So the only point where you need to override dependency metadata should be when building that initial environment. And you can do that however you want - it’s tricky, because you’re fighting pip’s by-design reluctance to let you create a broken environment, but it’s possible. And honestly, I feel that making it hard is a feature, not a bug, precisely because the environment you’re creating is in violation of the installed projects’ declared requirements.

So having thought about this, I’m moving towards being -1 on the whole idea of an override option.

Note: I’m aware that many applications are provided as installable wheels with entry points, and as such do not pin all their requirements. But that’s a compromise, using tools designed for libraries to install applications, and I don’t think we should design around this model any more than necessary. I’d rather see something like pipx adding the ability to install an app from a fully-pinned requirements file, as that allows the entry point model while retaining the idea that an app should have fully pinned requirements, but keeping within the existing design.

2 Likes

One bit of note, is that technically you need a lockfile with hashes to be assured that pip install --no-deps is going to do the right thing, otherwise you can end up in a situation where a new wheel is added that has different dependencies that would get installed silently.

Hopefully that’s a pretty esoteric situation though!

Personally I’m neutral on this.

I think it’s likely useful to have the ability for end users to “fix” broken metadata or to force otherwise invalid dependency sets to install for a variety of reasons, and I think that have some direct overrides mechanism makes that easier and more streamlined.

But, I also agree with @pf_moore that there is already tools available to accomplish this, albeit in a less user friendly way, so the functionality already exists. So the question then shifts from “should we allow this functionality” to “is this functionality important enough to deserve it’s own feature” as well as “is this functionality too advanced for a streamlined feature that might be a footgun to end users”?

That is where I start waffling on it. Every feature that gets added has an ongoing cost both to the maintainers of pip, and to the user’s of pip, and this starts feeling like something where the cost isn’t worth the reward.

Just wanted to share some of the experiences we have for Home Assistant.

As an application, we pin all first order dependency about 900 at the moment. With that many it’s (almost) impossible to create an environment without conflicts without resorting to --no-deps (or similar). For that reason alone, we haven’t yet been able to update to the new resolver for example and still rely on --use-deprecated=legacy-resolver. True it means that some requirements are technically invalid but the overwhelming majority of conflicts are just because of too restrictive library requirements.

As part of each dependency upgrade we run pip check and report, even fix, new issue upstream. However, that doesn’t work if the project maintainer don’t want to change the requirements or if the project hasn’t been updated in years. All the while we know that a simple metadata update would be enough.

From that POV it’s sometimes quite frustrating tbh and at least I have wished multiple times that an overwrite option would exist.

Pip already has an option to pass additional constraints to the install command. I would envision, overwrites could work in a similar way.

3 Likes

There are other valid use cases than just “building an application”. As you say it should not be possible for one library to change the dependencies of another library so it does not make sense for libraries.

So we have established that library authors cannot use overrides but otherwise they are potentially applicable to any situation where someone is installing things. That could be an application or it could be:

  • Building an application
  • Setting up the development dependencies for a library author.
  • Installing some software for scientific computing.
  • Setting up dependencies for a “folder full of scripts”.
  • etc.

Basically any time you are installing stuff into an environment but not publishing a library this is potentially applicable. In some of those situations versions might all be frozen but in others it might just be someone installing things piece by piece with pip or whatever.

Looking at the GitHub issues it seems that the people who have the most problems with this are people building applications and I suspect that’s because they like to make use of lots of dependencies and have large dependency chains (library authors are more conservative about adding dependencies). Another group that shows up having problems with this are integrators like WinPython and Nixpkgs who want to try to make a single consistent environment containing the largest collection of packages possible.

The other group that appears are people who are trying to develop libraries and install development versions of different things into a single environment. For example I am working on A and need B as a development dependency but B depends (with some version constraint) on A as a runtime dependency. Now I cannot install B properly in my development environment in order to test the changes I am making to A.

You can imagine more complicated versions of where you are trying to make changes to more than one library at a time like “I want library B from gh-1234 PR and here I have library A checked out in git but B depends on A transitively through C, D, etc”. I guess in some situations manually overriding all dependencies like pip’s --no-deps is probably easy enough but in others you really don’t want to have to manually resolve all the other dependencies.

As you say one way to resolve this is to separate resolving and installing into two phases and then a tool like pip can be used for the install step but currently does not help with the resolve step if ultimately no versions exist satisfying all constraints.

The suggestion is for there to be some tool that can help with this situation and in principle it could be something different from pip or poetry that generates the requirements file to be used. Then again pip and poetry are dependency management tools for Python and this is a dependency management problem so it seems that those asking for this don’t see why it should be a separate tool that does this.

I personally have never needed overrides but I can see why some people do. I imagine that if I was in their situation I would want something like an overrides file which is as tightly constrained as possible like:

[overrides-A]
A == 1.0, B < 2, B < 3
A == 2.0, B < 2.1, B < 3

What this means is “I expect the resolver to find version 1.0 or 2.0 of package A which have constraints B < 2 or B < 2.1 and I want that particular constraint for A to be replaced by B < 3 when resolving”. Then I would want a resolver or installer to print warnings about inconsistent environments. Probably the resolver should also warn if you are getting some other version of package A: a new release might make it possible to remove the override or you might need to add a new override for the new version. You would have to maintain the override with each new release of A unless it fixes the constraints to make your desired collection of dependencies consistent.

2 Likes

One related issue here. Making it easier to use simpleindex would help with current workaround of rewriting dependency metadata in wheels. That + small tool/script that downloads a wheel and rewrites dependency metadata would make the process easier and not need to live in pip at all.

I think those pieces only thing it’d be nice to have pip/packaging then would be a tutorial page explaining the approach vs current situation that most people probably would not stumble on this idea. Probably makes more sense for packaging documentation with pip just having link to it somewhere in section on dependency resolution common issues.

1 Like

This seems like it would be good a fit for this idea/proposal.

In general, I think it would solve a bunch of issues if more installation options were selectable per dependency (instead of for the whole installation session).

2 Likes

A couple use cases that haven’t been mentioned:

  • Specifying local (possibly editable) installs: If you wanted to replace the dependency numpy>1.21 with a local install on your machine. That should clearly not be committed in pyproject.toml. It belongs in some kind of local overrides.toml that is added to .gitignore.
  • Passing machine-specific options to tools. For example, to tell linters how many CPUs they should use.
1 Like