Adding a default extra_require environment

But a --no-default-extra pip option would mean that the average user trying to install projects would have to understand what a “default extra” is, which I think is definitely too much to ask.

Personally I feel any user wanting to disable defaults should automatically count as above above average. They should accept the default instead if the concept is beyond them :slightly_smiling_face:

With that said, I don’t think it’s a good solution to do this as a pip option either. Once we do that, people will start wanting to include it in their requirements.txt, lock file, Requires-Dist, etc. It would be better if we come up a way to encode this information into the PEP 508 format.

Right, I think that’s a bad pattern. For a few reasons:

  1. I think it scales poorly, if I have 5 different “axis” that my extras could install (e.g. maybe I can select what asyncio reactor I want to use based on a extra, and select optional C speed ups on multiple different acceleration packages, etc) then my “single list of default extra things to install” ends up being this weird mish mash of dependencies that aren’t related to each other at all, other than I want them optional, but installed by default.
  2. Similar to the above, the granularity is wrong. If I have 5 different axis, and I want to select a new extra for any of them (or even for something completely unrelated) I now have to specify the default extras manually for every single one of those axis.
  3. It makes backwards compatibility harder OR it’s not really allowing the minimal install a number of cases. In my foo/bar example above, either bar depending on an extra unselects the entire set of defaults (in which case, my dependencies starting to use a new extra when before they didn’t suddenly becomes a backwards incompatible change, when before it never was) OR we continue to install the default dependencies anytime anyone depends without explicit extras… which basically means that anytime a thing with default extras is depended on in a library (since libraries are not likely going to want to make decisions about backends or so) then the default extra just degrades to being effectively the same as install requires in all but simple cases.
  4. Likewise to the above, backwards compatibly for the project becomes harder, since adding a new axis to the default extras requires just adding it to the flat list of things, any downstream that is manually specifying extras, has to go and update their dependency specifier to add this new extra, since it won’t be selected by default (because i’m explicitly listing extras), even if I have no opinion about which way to go, and I’m just going to copy the default.
  5. It forces me to keep two lists of dependencies in sync, with no real mechanisms to automate keeping those in sync other than inside a dynamic setup.py, which we’ve been trying to de-emphasize.
  6. It makes discovering what the default options are harder. If I want to specify only one out of N axis, I have to figure out what extras I need to select to get the other N-1 axis back to their default state, but since there’s just a single flat list of dependencies in this implicit list of extra dependencies, I have to either hope the project documented it on their own, or reverse engineer it by comparing the actual contents of the different lists of requirements.
  7. It allows projects to get their metadata in weird states where downstream users can end up having to manually specify dependencies, and not only does it allow it, but this is the path of least resistance option for projects to be in. If a project only lists their default extras in (to avoid the problems in #5), and I want to customize my select for 1 axis, but use the default for other axis, and the project hasn’t duplicated that metadata, my only options are to manually copy that dependency data into my dependencies.

Instead my suggestion is literally everything gets defined as an extra like we have today, and this new metadata field literally just lists the name of extras that should be selected by default. Essentially this means that a package foo, with a metadata field like Default-Extras: spam, c-accelerated is equivalent to doing pip install foo[spam, c-accelerated]. Since extras are always additive, if two things depend on foo, one with the default, one with some explicit set of extras, we’d just take the union of all of those extras.

That then raises the question of how do we unselect extras, which hasn’t been a problem before, because there was no such thing as this implicitly selected extra. My rough idea for that is to extend PEP 508 with some explicit syntax to negate a selected extra, say for instance we say you can negate an extra by prefixing it with a - symbol. So if I wanted to install foo, but without the c-accelerator, I would do pip install foo[-c-accelerated] (we could use ! or whatever, I don’t care that much about specific syntax).

The ability to unselect an extra raises the question of what do we do if foo and bar both depend on spam, and one does spam[-thing] and the other does spam[thing], how do we resolve the situation? My suggestion would be to make an explicit request for an extra always take precedence, while an explicit request to exclude an extra only takes precedence over the list of default extra names to install.

Looking at my list of problems with the nameless default extra, this solution solves those problems, in more details:

  1. All extra definitions remain defined as they are today, separately keyed by the extra name, so they’re each grouped by the “feature” they’re related to. It doesn’t matter how many different extras I want, they’re all cleanly separated.
  2. If I want to specify an extra, It doesn’t affect the list of what’s being installed at all other than to be additive, which is exactly the same semantics extras have today. So say opting into a C accelerator (or a GPU specific package or whatever) doesn’t suddenly opt me into managing my list of extras for every axis that a project might have extras for. This is particularly nice for extras that are wholly unrelated to the things that are being installed by default.
  3. Backwards compatibility is basically the same as it is today. I can depend on things with extras without worrying that doing so will suddenly unselect a bunch of default options for other packages, since depending on things with extras is by default just adding new dependencies.
  4. For the project providing the extras, backwards compatibility is much nicer for the most part. I can add new named extras to my list of default extras, and my downstream consumers will just get my implicit default without having to do any work, it’ll just work for them. The one situation where the other proposal is nicer is if I want to make absolutely sure that no default extras are selected, (the minimal example above), then a new version might start adding a new extra to the default that I have to update my dependency spec to add another negation. I don’t think this is a big deal AND I think it’s going to be the least common use case, however, if we’re worried about supporting that use case, then we can add some sigil that says to deselect all default extras (foo[-:default:]? foo[-*]? I dunno).
  5. Since the dependency lists are only ever specified in the actual extra definitions, there’s no need to keep anything in sync.
  6. I can simply look at the metadata for a project (using pip show or something) and see exactly which extras are selected by default. So when I’m deciding what to add or remove, I have all that information available to me (which is super useful if we add the syntax in 4, because you could do something like say I want to remove all the defaults… except one, then you can see what those defaults are, and then do foo[-:default:, something-that-was-a-default-to-reselect-it-explicitly].
  7. Every state that can be expressed with default extras must be capable of being expressed with the extra syntax. There’s no possibility where some set of dependencies only would ever get installed with no extras selected. Which is better both for consistency and for phasing this feature in, since users with older installers will need to continue to manually select which extras they want installed.

Honestly, the “default implicit unnamed extra” proposal feels like it really only works in very simple cases where there is only a single axis someone might want to select different extras for. It also brings in this weird, action at a distance behavior that I think will be very surprising to end users both new and old, whereas my proposal doesn’t change how extras function, unless you’re explicitly using the new syntax.

3 Likes

Oh, the other thing I dislike about the single implicitly named default extra option, is it doesn’t lend itself well to future expansion. Now I’m not saying we’re ever going to expand extras to be more comprehensive.

However, one could envision a situation where what extras have been selected are made available at build or runtime, to allow them to act as a more fully featured (heh) feature flag system (they’re already sort of feature flags, just feature flags that are limited to only adding new dependencies). The single implicitly named thing is a lot harder to work with in that situation, because there’s no extras name to key off of for whether a feature has been toggled on or off, and because of the fact it doesn’t handle the “multiple axis” problem well, it would make code that consumes those feature flags harder, because it would have to check for both the explicitly named feature flag, and whatever marker we use for no feature flags selected at all.

Another possible enhancement is the ability to make mutually exclusive extras, or to make mandatory extras. Basically things like “Ok well you can use any backend you want, but you must pick at least one, but by default we’ll pick one for you” or whatever, would most likely be easiest to implement by using extra names in some constraint language of some kind. Say that you have to pick at one backend, one could imagine a constraint that says like, backend1 or backend2 or backend3 or something.

These aren’t really fully fleshed out ideas, and I’m not really even saying any of them are a good idea, but the named list of implicit default extras proposal makes implementing those kinds of additional enhancements much easier I think, and makes the implicit defaults much less of a special case, whereas the single implicitly named default extra option sort of just makes all of those kinds of features harder.

1 Like

Reminder that there is currently no restrictions on extra names (unlike package names), so we’ll need to specify that first to make any syntax possible. IIRC setuptools has some restrictions on what you can use as extra_requires keys, but they only apply to setuptools (IIRC even pip’s behaviour is different!)

That’s not true, at least in PEP 508, extras are explicitly defined:

identifier    = < letterOrDigit identifier_end* >
extras_list   = identifier:i (wsp* ',' wsp* identifier)*:ids -> [i] + ids
extras        = '[' wsp* extras_list?:e wsp* ']' -> e

I’ve also just checked packaging, and it also implements this correctly:

PUNCTUATION = Word("-_.")
IDENTIFIER_END = ALPHANUM | (ZeroOrMore(PUNCTUATION) + ALPHANUM)
IDENTIFIER = Combine(ALPHANUM + ZeroOrMore(IDENTIFIER_END))

EXTRA = IDENTIFIER

EXTRAS_LIST = EXTRA + ZeroOrMore(COMMA + EXTRA)
EXTRAS = (LBRACKET + Optional(EXTRAS_LIST) + RBRACKET)("extras")

And just to verify, attempting to use an invalid name fails currently:

$ pip install 'requests[-asds]'                                        
ERROR: Exception:
Traceback (most recent call last):
  File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_vendor/packaging/requirements.py", line 98, in __init__
    req = REQUIREMENT.parseString(requirement_string)
  File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_vendor/pyparsing.py", line 1955, in parseString
    raise exc
  File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_vendor/pyparsing.py", line 3814, in parseImpl
    raise ParseException(instring, loc, self.errmsg, self)
pip._vendor.pyparsing.ParseException: Expected stringEnd, found '['  (at char 11), (line:1, col:12)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_internal/cli/base_command.py", line 188, in _main
    status = self.run(options, args)
  File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_internal/cli/req_command.py", line 185, in wrapper
    return func(self, options, args)
  File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_internal/commands/install.py", line 300, in run
    reqs = self.get_requirements(
  File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_internal/cli/req_command.py", line 321, in get_requirements
    req_to_add = install_req_from_line(
  File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_internal/req/constructors.py", line 396, in install_req_from_line
    parts = parse_req_from_line(name, line_source)
  File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_internal/req/constructors.py", line 348, in parse_req_from_line
    extras = convert_extras(extras_as_string)
  File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_internal/req/constructors.py", line 77, in convert_extras
    return Requirement("placeholder" + extras.lower()).extras
  File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_vendor/packaging/requirements.py", line 100, in __init__
    raise InvalidRequirement(
pip._vendor.packaging.requirements.InvalidRequirement: Parse error at "'[-asds]'": Expected stringEnd

Additionally, Metadata 2.1 explicitly declares in the PEP that extras must be valid Python identifiers (emphasis mine):

A string containing the name of an optional feature. Must be a valid Python identifier. May be used to make a dependency conditional on whether the optional feature has been requested.

setuptools does allow you to specify invalid extra names currently (which should probably be treated as a bug). However those names are basically useless, so I don’t think that worrying about widespread use of them is something we need to do.

Thanks for the correction, that’s good to know. It seems we’re well covered in this area.


The valid Python identifier part brings out one more problem since it means that Provides-Extra: foo-bar is technically invalid metadata. It is however not only recognised by pip but also not uncommon in the wild. But that’s an issue for another day, I guess.

yea, the PEP 508 spec allows it, the 2.1 metadata does not. Probably the 2.1 spec just needs updated to match reality.

Because excluding an extra means “install this as if the extra were not specified”, surely this would be resolved exactly as it would be today, by including the packages from thing.

We discussed this a bit in The ‘extra’ environment marker and its operators, and while there was disagreement, it seems pretty clear that contradictory extras can exist and should be caught at resolution time (or later when things don’t work), and that these are a bug in the package, but it shouldn’t matter here.

As an aside, I regularly use/recommend pip-compile to get all the dependencies into a requirements file, then modify it manually and install with --no-deps to exclude dependencies. Would not be opposed to using an extra for more targeted handling, but it works fine.

That was effectively my suggestion in the next sentence yes :wink: I could possibly make an argument that a user would expect -thing to mean never install thing, but I think the UX around enabling that is significantly worse, and it’s rarely what anyone actually wants.

1 Like

I don’t see how the former idea leads to the latter? If the default extras that are included are indeed just extras then couldn’t you just rebuild the default set sans whatever you want left out and skip the sytnax? IOW why is doing foo[spam] to leave out c-accelerated from the default extras set so troublesome as to require special syntax support? I can people making the argument of “but what if I add a new extra to that default set?”, but then I can turn that around and say, “yes, what are you going to do about since you now have a new implicit dependency to add/remove?”

I think if we are going to try and push packages to make small, targeted extras to all for a more composable way to build up indirect dependencies then I don’t think pushing a subtraction mechanism is going to (at least) initially be important.

I am strongly in favor of the subtraction mechanism. If specifying any extras in a dependency would clear the set of extras to be installed, It would also negate any future default extras which means no package author can ever reliably benefit from adding to the default set. Imagine a situation where an author moves a dependency from hard requirements to default extras. Now any dependent package which specifies an extras set for that dependency could break because they’re no longer getting the necessary sub-dependency installed because they didn’t explicitly specify it in the extras set.

In other words, if all dependents say “install this dependency but I don’t need this particular extra from it”, then and only then should that extra be removed from the set at install time.

1 Like

There are 3 basic possible mechanisms we could select here, with varying degrees of usefulness:

  1. The default set of extras just always get included.
    • Not very useful, we’ve basically just added a second install_requires.
  2. The default set of extras get cleared as soon as someone selects ANY extra.
    • IMO this also ends up becoming just a second set of install_requires, because any library that depends on the project in question is faced with a choice. They can either only depend on the extras they care about (and possibly break things for people who are doing a dependency with no extras) or they can attempt to reproduce the entire set of default extras… which is basically just implicitly turning those default extras into a sort of psuedo install_requires. I know that if I was publishing something that depended on a library with default extras where I wanted to override one, I would probably feel compelled to include the default extras to avoid my choices breaking things for other people.
    • I also think the behavior of implicitly clearing the entire set of default extras is a surprising action at a distance that will confuse new people and experienced people alike. I can easily forsee people having to trawl their dependency tree trying to figure out which project selected an extra and caused the entire set of default extras to no longer be included. Likely the way most people will fix this will be by duplicating transitive dependencies into their own projects with more extras included.
  3. The default extras never get cleared implicitly, but you can optionally choose to remove them.
    • This requires introducing new syntax, but I think it matches the existing semantics of extras better, and is far less surprising to people.

It is important, because when you include default extras, you have to pick which of the above 3 strategies you’re going to use. If you do nothing, then you’ve just implicitly selected #1, and added a new field for little to no purpose. So you need some mechanism that enables not installing those default extras, and if you pick #2, you can’t really move to #3 without silently changing behavior (which will break people) and likewise you can’t really move to #2 from #3 without also breaking people. The only way to do a transition like that would be to introduce yet another piece of metadata that controlled what kind of default extras it is… but that sounds like the worst possible outcome to me.

So yea, I think we need to pick what mechanism we’re going to use for causing the default extras to not be installed (because otherwise they’re not extras, they’re just dependencies), and I don’t think it’s a decision we can put off till later, or easily change once it’s been made.

For option 2:

I think “The default set of extras get cleared as soon as someone selects ANY extra.” should work at the individual requirement level, not globally. If, say:

  • package[someextra] == 1.0 works as before
  • package == 1.0 becomes syntax sugar for package[default] == 1.0
  • package[] == 1.0 explicitly selects no extras.

then if one library needs package[] and another needs package, the default extra does get installed.


Update: I personally believe option 3 is better than this, for reasons Donald explains later. But I want the best version of each option to be considered.

1 Like

A possible fourth option is to let extras remove install requirements (from the package dependencies, not the whole resolution context) when they are specified (by fixing The ‘extra’ environment marker and its operators).

This way you could include all your default dependencies, and use an extra to remove one and add another. Personally I think doing it through the environment marker system is fine, as we’re talking about a fairly complex case here.

I’ve been assuming this, because it seems like the only viable option (in essence, an extra is treated like a separate empty package with a requirement of its base package plus the extras).

I’m not sure how you would take extras specified on individual requirements and somehow calculate their effect globally, but since this feels like an argument I guess some people think there’s a way to do this? I’d love to hear what that approach is, because I can’t imagine it myself.

(Not you, Petr. I’m agreeing with you :wink:)

package[] == 1.0 explicitly selects no extras.

And what about the use case I mentioned – moving a hard dependency into optional dependencies? If the dependent package is using a part of the dependency that relies on a subdependency coverered by an extra, that subdependency would be automatically excluded and the application would stop working if the subdependency is moved to default extras from install_requires.

I’m not sure how you would take extras specified on individual requirements and somehow calculate their effect globally, but since this feels like an argument I guess some people think there’s a way to do this? I’d love to hear what that approach is, because I can’t imagine it myself.

I’m not understanding what’s difficult about this. Any extra specified by any dependent should be included.

Maybe I’m misunderstanding, but isn’t that precisely what an installer has to do when resolving a set of requirements?

I can’t tell you exactly how it works, because pip’s handling of extras makes my head hurt, but feel free to go and look at the code :slight_smile: And as an added bonus, if you want to see more than one interpretation of the process, pip currently has two resolvers so you can look at both :slight_smile::slight_smile:

More seriously, I’m starting to find it hard to follow what people are expecting again (in this case, I don’t know what “individual requirement level” and “globally” are intended to mean). If someone could clarify a bit, that might help avoid any miscommunication or misunderstandings.

1 Like

The context seems to be two conflicting extra specifications on the same package in the global context (e.g. A->C[e1], B->C[e2], pip install A B).

I think some of us see this as “A requires C and A also requires the extras under e1”, so essentially extending the requirements of A.

The alternative view is (I think?) “A requires C and C requires [e1]”, which implies that B has a conflicting requirement because C[e1] is not the same as C[e2] (because there is only one “C”, which means the “requires e1” and “requires e2” have to be combined programmatically into a specification that was never written down by a user/author).

Given that extras are deep inside the metadata, I can see the appeal of the latter approach. It means that C remains a single node in the dependency graph, regardless of the extras that are specified. However, it does lead directly to all the issues we’re seeing here (removing extras, conflicting extras (as opposed to merely conflicting requirements)).

I haven’t looked at the implementation, because implementations should be following designs, not the other way around :wink: But I suspect the resolver implementations ought to treat “C[e1]” as a separate node from “C[e2]” so that both can be installed together. Then conflicting extras can only exist within the context of a single specification (e.g. X[d, not_d]) and the conflicting requirements implied by two separate specifications (e.g. X[d], X[not_d]) are resolved as for normal requirement conflicts.

Given that extras are deep inside the metadata, I can see the appeal of the latter approach. It means that C remains a single node in the dependency graph, regardless of the extras that are specified. However, it does lead directly to all the issues we’re seeing here (removing extras, conflicting extras (as opposed to merely conflicting requirements)).

Maybe I’m not understanding, so can you explain what “conflicting extra” means? I can understand conflicting version specifiers, but not conflicting extra. From my POV, having C[e1] and C[e2] as separate dependencies automatically means installing C[e1,e2]. Where is the conflict in that?

2 Likes