Adding a default extra_require environment

Given that extras are deep inside the metadata, I can see the appeal of the latter approach. It means that C remains a single node in the dependency graph, regardless of the extras that are specified. However, it does lead directly to all the issues we’re seeing here (removing extras, conflicting extras (as opposed to merely conflicting requirements)).

Maybe I’m not understanding, so can you explain what “conflicting extra” means? I can understand conflicting version specifiers, but not conflicting extra. From my POV, having C[e1] and C[e2] as separate dependencies automatically means installing C[e1,e2]. Where is the conflict in that?

2 Likes

having C[e1] and C[e2] as separate dependencies automatically means installing C[e1,e2].

Indeed, and (as per my memory) that’s how both the resolvers in pip will treat them as well.

Indeed, and (as per my memory) that’s how both the resolvers in pip will treat them as well.

My experience is that right now pip picks up the first extras set and disregards anything found later along the way. Very annoying.

To slightly update @uranusjr’s example from another thread:

Name: a
Version: 1.0
Provides-Extra: e1
Provides-Extra: e2
Requires-Dist: b >= 2; extra == 'e1'
Requires-Dist: b < 2; extra == 'e2' 

But also, the issue with a default extra and needing to replace/remove it would mean that it’s trivial to cause a conflict when one requirement wants the default extra and another doesn’t. So there are more future conflicts coming as we work down the path of this thread.

Whoops! Yes!

The old resolver picks whichever it sees first (and hence fails that test) but the new resolver works as described by @agronholm here.

There is conflict here, yes, but again I don’t see a problem. It’s the same case as project A requiring C < 1.0 and project B requiring C >= 1.0. You print out an error message explaining the problem and exit.

1 Like

Let me turn this around a minute.

What do people see as the advantage to automatically deselecting all of the default extras whenever any extra is specified? I’m struggling to come up with a benefit of that behavior other than that we’re more likely to get a minimal install footprint in more cases. Is there something else? I don’t particularly find the automatic minimal install footprint case particularly compelling, It only applies in a very narrow circumstance, and I don’t think most people generally care that greatly about the small amount of disk space they might save,.

To gain that minor benefit, I see the implicit removal of default extras as this weird implicit behavior, the kind of thing that is going to confuse people whenever they first come across it, and make it more difficult to add more extras to a project in the future.

3 Likes

Sure, it’s a version conflict that we’re good at handling, but the concerns earlier in this thread are more along the lines of how to resolve pip install D[a] D[-a] into a single pip install D[???]. Does the actual install include a or not?

I’m trying to figure out why people want to combine them like that, and not expand the extras earlier. Say D[a] implies pip install D A and D[-a] implies pip install D (i.e. no ‘A’), then it seems (to me) to follow that pip install D A D is easy to figure out :slight_smile: But not everyone agrees or the discussion would have been over already, so I’m trying to understand the other POV.

Because if they aren’t deselected then they may as well just be regular requirements? And if that’s the case, we don’t need to change anything :slight_smile:

Personally, I’d just create more packages, make the “simple” name depend on all the parts of the app/tool/etc, and use a different package for the minimal one.

Concrete example: the azure-cli package (which you shouldn’t install with pip, but you can) has a whole lot of CLI interface stuff that you don’t need if you just want to make a few calls into Azure. And so you can get any of the individual packages that provide the API and call them from Python code.

Theoretically, this could have been just an azure package that defaults to including the CLI, and has specific extras for the individual pieces you might want. I think that’s a poor design (especially in this case where we’d have hundreds of default extras), but then other people think that publishing a set of independent packages simultaneously is a bad design too.

There are definitely other ways to approach this kind of problem, but I think there are other cases that matter here too. And part of it is definitely “everyone already installs my package by name X and I don’t want to force them to change/break existing dependencies”.

Certainly it should, because if a dependency says it requires the extra, then it probably won’t work without it. As for the other, the dependent should not be bothered by the mere presence of that extra. Thus the answer is that it should do pip install D[a].

3 Likes

So then the follow up question, assuming default extras (e.g. to choose between a CPU or a GPU implementation): given pip install D implies the cpu extra should be included, and pip install D[gpu] implies the cpu extra should not be included but gpu should, how do you combine pip install D D[gpu] into a single requirement?

What default extras would D provide in this case?

given pip install D implies the cpu extra should be included, and pip install D[gpu] implies the cpu extra should not be included but gpu should, how do you combine pip install D D[gpu] into a single requirement?

Given D == D[cpu], I’d say D D[gpu] == D[cpu] D[gpu] == D[cpu,gpu].

2 Likes

Yes, that one makes the most sense. But it isn’t what people want :slight_smile:

(Remembering that these are most likely requirements of other packages and not directly typed by a user), what D means here is “I want D and I don’t care which backend”, while D[cpu] or D[gpu] means they do care. So for many, the useful result is D D[gpu] ==> D[gpu].

Similar requests from users look like “I need them to have X or Y but I don’t care which, and if they don’t have either I want to install X for them”.

Right now we don’t really have the ability to create package dependencies that work like this, and the default extra is one possible way to get there.

Then maybe they should define the dependency as D[-cpu,gpu]? Or remove cpu from D’s default extras? I see this just as a documentation issue.

2 Likes

Assuming we implement some kind of extra-negation (that adds a constraint, rather than just omitting the default extra), this is probably the most viable approach here.

They don’t control D, and the people who do control it want pip install D to provide something usable.

When either of the above two ideas are implemented, there will need to be documentation, yes :slight_smile:

1 Like

The consequence of pip install D D[gpu] is that an extra not needed by one dependency would be installed. The consequence of D[gpu] resetting the set of extras, on the other hand, would be disastrous as a package unknowingly relying on the default extras would no longer work. This would completely destroy the use case I was talking about – moving a hard dependency to extras.

They can be deselected though and regular requirements cannot be? Like, neither proposal prevents the extra from being deselected, the question is whether including a single extra should automatically deselect all default extras, or whether default extras should be explicitly disabled.

To use your azure example, I can see how if you’re targeting one or two services, it’s nice that someone could do azure-cli[service1,service2] and get only those things… but where that breaks down, is let’s say Azure had an optional Rust based speed up that it could also optionally depend on. Perfect use case for an extra, so I go and change my azure-cli to azure-cli[speedups]… and suddenly my cli does basically nothing. Now I have to either spend time to audit and figure out which extras are actually important for me, and manually tweak that list OR I have to go through and just duplicate the entire list of default extras into my azure-cli[speedup, ... literally every default extra] just to try and restore my “state” back to what it was. This breaks that expectations people have with extras, which is that selecting one will add functionality to my install, not suddenly remove a bunch of functionality.

The flip side here, is if you have to explicitly remove them, then that eliminates that entire footgun, at the cost of the person who needs a single one of azure cli’s dependencies AND they don’t want the dependencies for all services, only has to type like, an extra couple characters. So it would be something like azure-cli[-*, service1].

So basically the four scenarios are:

  1. I’m a person who doesn’t care what I get, just give me something working.
    • Implicit: azure-cli
    • Explicit: azure-cli
  2. I’m a person who only cares about getting a very minimal subset of dependencies, with explicit extras installed.
    • Implicit: azure-cli[service1]
    • Explicit: azure-cli[-*, service1]
  3. I’m a person who generally wants the defaults, but I want to select one specific extra (whether that’s a truely optional speed up extra, or picking one specific backend, or whatever).
    • Implicit: azure-cli[speedups, service1, service2, service3, service4, service5, service6, service7, service8, service9, service10, ..., service73] (a quick look at azure-cli looks like it depends on 73 sub service packages, maybe some of those would be grouped though into top level extras, idk).
    • Explicit: azure-cli[speedups].
  4. Sort of a riff on #3, but: I’m a person who generally wants the defaults, but this time I want to exclude a couple of the default extras, but still want the rest.
    • Implicit: azure-cli[service1, service2, service6, service7, service8, service9, service10, ..., service73]
    • Explciit: azure-cli[-service3, -service4, -service5]

The cpu vs gpu case is basically either 2, 3, or 4 depending on if there are other extras or if that’s the only extra, and whether the person doing the installing cares if the cpu backend is installed alongside the gpu backend.

7 Likes

That would break current functionality, where right now the requirement paxkage[] is equivalent to package. I would suggest a noisy deprecation if you want that.

How do you mean “relying”? If you mean directly importing and calling the methods of packages in the cpu extra, then I would argue that the user should be explicitly putting that in their package’s requirements. If you mean something else, can you give an example?

Remember that multiple distributions can provide the same package. In the past, TensorFlow had two distributions: tensorflow and tensorflow-gpu, which both provided the tensorflow package. They both provided the same interface, but the CPU version would lead to a surprising slowdown when installed accidentally for one reason or another.

The dependent package may not be aware that they’re relying on a default extra. For example, consider the recent situation with the packaging package: it has a hard dependency on pyparsing, but wheel wanted to just use its tags module which does not require pyparsing. I would’ve wanted the ability to negate that dependency by adding packaging[-parsing] to wheel’s dependencies because this library is not that small in size and wheel is installed in quite a few places. But if that dependency were to be moved to default extras and someone added a dependency on packaging specifying a different set of extras that didn’t explicitly include parsing, what would happen? Suddenly all packages dependent that expected the parsing module to work without an extra (as it used to) will break when the extra is not installed. That is why it’s important to not subtract from the extras set unless all dependents say that they don’t want this particular default extra.

1 Like