Adding a default extra_require environment

That was effectively my suggestion in the next sentence yes :wink: I could possibly make an argument that a user would expect -thing to mean never install thing, but I think the UX around enabling that is significantly worse, and it’s rarely what anyone actually wants.

1 Like

I don’t see how the former idea leads to the latter? If the default extras that are included are indeed just extras then couldn’t you just rebuild the default set sans whatever you want left out and skip the sytnax? IOW why is doing foo[spam] to leave out c-accelerated from the default extras set so troublesome as to require special syntax support? I can people making the argument of “but what if I add a new extra to that default set?”, but then I can turn that around and say, “yes, what are you going to do about since you now have a new implicit dependency to add/remove?”

I think if we are going to try and push packages to make small, targeted extras to all for a more composable way to build up indirect dependencies then I don’t think pushing a subtraction mechanism is going to (at least) initially be important.

I am strongly in favor of the subtraction mechanism. If specifying any extras in a dependency would clear the set of extras to be installed, It would also negate any future default extras which means no package author can ever reliably benefit from adding to the default set. Imagine a situation where an author moves a dependency from hard requirements to default extras. Now any dependent package which specifies an extras set for that dependency could break because they’re no longer getting the necessary sub-dependency installed because they didn’t explicitly specify it in the extras set.

In other words, if all dependents say “install this dependency but I don’t need this particular extra from it”, then and only then should that extra be removed from the set at install time.

1 Like

There are 3 basic possible mechanisms we could select here, with varying degrees of usefulness:

  1. The default set of extras just always get included.
    • Not very useful, we’ve basically just added a second install_requires.
  2. The default set of extras get cleared as soon as someone selects ANY extra.
    • IMO this also ends up becoming just a second set of install_requires, because any library that depends on the project in question is faced with a choice. They can either only depend on the extras they care about (and possibly break things for people who are doing a dependency with no extras) or they can attempt to reproduce the entire set of default extras… which is basically just implicitly turning those default extras into a sort of psuedo install_requires. I know that if I was publishing something that depended on a library with default extras where I wanted to override one, I would probably feel compelled to include the default extras to avoid my choices breaking things for other people.
    • I also think the behavior of implicitly clearing the entire set of default extras is a surprising action at a distance that will confuse new people and experienced people alike. I can easily forsee people having to trawl their dependency tree trying to figure out which project selected an extra and caused the entire set of default extras to no longer be included. Likely the way most people will fix this will be by duplicating transitive dependencies into their own projects with more extras included.
  3. The default extras never get cleared implicitly, but you can optionally choose to remove them.
    • This requires introducing new syntax, but I think it matches the existing semantics of extras better, and is far less surprising to people.

It is important, because when you include default extras, you have to pick which of the above 3 strategies you’re going to use. If you do nothing, then you’ve just implicitly selected #1, and added a new field for little to no purpose. So you need some mechanism that enables not installing those default extras, and if you pick #2, you can’t really move to #3 without silently changing behavior (which will break people) and likewise you can’t really move to #2 from #3 without also breaking people. The only way to do a transition like that would be to introduce yet another piece of metadata that controlled what kind of default extras it is… but that sounds like the worst possible outcome to me.

So yea, I think we need to pick what mechanism we’re going to use for causing the default extras to not be installed (because otherwise they’re not extras, they’re just dependencies), and I don’t think it’s a decision we can put off till later, or easily change once it’s been made.

For option 2:

I think “The default set of extras get cleared as soon as someone selects ANY extra.” should work at the individual requirement level, not globally. If, say:

  • package[someextra] == 1.0 works as before
  • package == 1.0 becomes syntax sugar for package[default] == 1.0
  • package[] == 1.0 explicitly selects no extras.

then if one library needs package[] and another needs package, the default extra does get installed.


Update: I personally believe option 3 is better than this, for reasons Donald explains later. But I want the best version of each option to be considered.

1 Like

A possible fourth option is to let extras remove install requirements (from the package dependencies, not the whole resolution context) when they are specified (by fixing The ‘extra’ environment marker and its operators).

This way you could include all your default dependencies, and use an extra to remove one and add another. Personally I think doing it through the environment marker system is fine, as we’re talking about a fairly complex case here.

I’ve been assuming this, because it seems like the only viable option (in essence, an extra is treated like a separate empty package with a requirement of its base package plus the extras).

I’m not sure how you would take extras specified on individual requirements and somehow calculate their effect globally, but since this feels like an argument I guess some people think there’s a way to do this? I’d love to hear what that approach is, because I can’t imagine it myself.

(Not you, Petr. I’m agreeing with you :wink:)

package[] == 1.0 explicitly selects no extras.

And what about the use case I mentioned – moving a hard dependency into optional dependencies? If the dependent package is using a part of the dependency that relies on a subdependency coverered by an extra, that subdependency would be automatically excluded and the application would stop working if the subdependency is moved to default extras from install_requires.

I’m not sure how you would take extras specified on individual requirements and somehow calculate their effect globally, but since this feels like an argument I guess some people think there’s a way to do this? I’d love to hear what that approach is, because I can’t imagine it myself.

I’m not understanding what’s difficult about this. Any extra specified by any dependent should be included.

Maybe I’m misunderstanding, but isn’t that precisely what an installer has to do when resolving a set of requirements?

I can’t tell you exactly how it works, because pip’s handling of extras makes my head hurt, but feel free to go and look at the code :slight_smile: And as an added bonus, if you want to see more than one interpretation of the process, pip currently has two resolvers so you can look at both :slight_smile::slight_smile:

More seriously, I’m starting to find it hard to follow what people are expecting again (in this case, I don’t know what “individual requirement level” and “globally” are intended to mean). If someone could clarify a bit, that might help avoid any miscommunication or misunderstandings.

1 Like

The context seems to be two conflicting extra specifications on the same package in the global context (e.g. A->C[e1], B->C[e2], pip install A B).

I think some of us see this as “A requires C and A also requires the extras under e1”, so essentially extending the requirements of A.

The alternative view is (I think?) “A requires C and C requires [e1]”, which implies that B has a conflicting requirement because C[e1] is not the same as C[e2] (because there is only one “C”, which means the “requires e1” and “requires e2” have to be combined programmatically into a specification that was never written down by a user/author).

Given that extras are deep inside the metadata, I can see the appeal of the latter approach. It means that C remains a single node in the dependency graph, regardless of the extras that are specified. However, it does lead directly to all the issues we’re seeing here (removing extras, conflicting extras (as opposed to merely conflicting requirements)).

I haven’t looked at the implementation, because implementations should be following designs, not the other way around :wink: But I suspect the resolver implementations ought to treat “C[e1]” as a separate node from “C[e2]” so that both can be installed together. Then conflicting extras can only exist within the context of a single specification (e.g. X[d, not_d]) and the conflicting requirements implied by two separate specifications (e.g. X[d], X[not_d]) are resolved as for normal requirement conflicts.

Given that extras are deep inside the metadata, I can see the appeal of the latter approach. It means that C remains a single node in the dependency graph, regardless of the extras that are specified. However, it does lead directly to all the issues we’re seeing here (removing extras, conflicting extras (as opposed to merely conflicting requirements)).

Maybe I’m not understanding, so can you explain what “conflicting extra” means? I can understand conflicting version specifiers, but not conflicting extra. From my POV, having C[e1] and C[e2] as separate dependencies automatically means installing C[e1,e2]. Where is the conflict in that?

2 Likes

having C[e1] and C[e2] as separate dependencies automatically means installing C[e1,e2].

Indeed, and (as per my memory) that’s how both the resolvers in pip will treat them as well.

Indeed, and (as per my memory) that’s how both the resolvers in pip will treat them as well.

My experience is that right now pip picks up the first extras set and disregards anything found later along the way. Very annoying.

To slightly update @uranusjr’s example from another thread:

Name: a
Version: 1.0
Provides-Extra: e1
Provides-Extra: e2
Requires-Dist: b >= 2; extra == 'e1'
Requires-Dist: b < 2; extra == 'e2' 

But also, the issue with a default extra and needing to replace/remove it would mean that it’s trivial to cause a conflict when one requirement wants the default extra and another doesn’t. So there are more future conflicts coming as we work down the path of this thread.

Whoops! Yes!

The old resolver picks whichever it sees first (and hence fails that test) but the new resolver works as described by @agronholm here.

There is conflict here, yes, but again I don’t see a problem. It’s the same case as project A requiring C < 1.0 and project B requiring C >= 1.0. You print out an error message explaining the problem and exit.

1 Like

Let me turn this around a minute.

What do people see as the advantage to automatically deselecting all of the default extras whenever any extra is specified? I’m struggling to come up with a benefit of that behavior other than that we’re more likely to get a minimal install footprint in more cases. Is there something else? I don’t particularly find the automatic minimal install footprint case particularly compelling, It only applies in a very narrow circumstance, and I don’t think most people generally care that greatly about the small amount of disk space they might save,.

To gain that minor benefit, I see the implicit removal of default extras as this weird implicit behavior, the kind of thing that is going to confuse people whenever they first come across it, and make it more difficult to add more extras to a project in the future.

3 Likes

Sure, it’s a version conflict that we’re good at handling, but the concerns earlier in this thread are more along the lines of how to resolve pip install D[a] D[-a] into a single pip install D[???]. Does the actual install include a or not?

I’m trying to figure out why people want to combine them like that, and not expand the extras earlier. Say D[a] implies pip install D A and D[-a] implies pip install D (i.e. no ‘A’), then it seems (to me) to follow that pip install D A D is easy to figure out :slight_smile: But not everyone agrees or the discussion would have been over already, so I’m trying to understand the other POV.

Because if they aren’t deselected then they may as well just be regular requirements? And if that’s the case, we don’t need to change anything :slight_smile:

Personally, I’d just create more packages, make the “simple” name depend on all the parts of the app/tool/etc, and use a different package for the minimal one.

Concrete example: the azure-cli package (which you shouldn’t install with pip, but you can) has a whole lot of CLI interface stuff that you don’t need if you just want to make a few calls into Azure. And so you can get any of the individual packages that provide the API and call them from Python code.

Theoretically, this could have been just an azure package that defaults to including the CLI, and has specific extras for the individual pieces you might want. I think that’s a poor design (especially in this case where we’d have hundreds of default extras), but then other people think that publishing a set of independent packages simultaneously is a bad design too.

There are definitely other ways to approach this kind of problem, but I think there are other cases that matter here too. And part of it is definitely “everyone already installs my package by name X and I don’t want to force them to change/break existing dependencies”.

Certainly it should, because if a dependency says it requires the extra, then it probably won’t work without it. As for the other, the dependent should not be bothered by the mere presence of that extra. Thus the answer is that it should do pip install D[a].

3 Likes