Adding a default extra_require environment

Oh, the other thing I dislike about the single implicitly named default extra option, is it doesn’t lend itself well to future expansion. Now I’m not saying we’re ever going to expand extras to be more comprehensive.

However, one could envision a situation where what extras have been selected are made available at build or runtime, to allow them to act as a more fully featured (heh) feature flag system (they’re already sort of feature flags, just feature flags that are limited to only adding new dependencies). The single implicitly named thing is a lot harder to work with in that situation, because there’s no extras name to key off of for whether a feature has been toggled on or off, and because of the fact it doesn’t handle the “multiple axis” problem well, it would make code that consumes those feature flags harder, because it would have to check for both the explicitly named feature flag, and whatever marker we use for no feature flags selected at all.

Another possible enhancement is the ability to make mutually exclusive extras, or to make mandatory extras. Basically things like “Ok well you can use any backend you want, but you must pick at least one, but by default we’ll pick one for you” or whatever, would most likely be easiest to implement by using extra names in some constraint language of some kind. Say that you have to pick at one backend, one could imagine a constraint that says like, backend1 or backend2 or backend3 or something.

These aren’t really fully fleshed out ideas, and I’m not really even saying any of them are a good idea, but the named list of implicit default extras proposal makes implementing those kinds of additional enhancements much easier I think, and makes the implicit defaults much less of a special case, whereas the single implicitly named default extra option sort of just makes all of those kinds of features harder.

1 Like

Reminder that there is currently no restrictions on extra names (unlike package names), so we’ll need to specify that first to make any syntax possible. IIRC setuptools has some restrictions on what you can use as extra_requires keys, but they only apply to setuptools (IIRC even pip’s behaviour is different!)

That’s not true, at least in PEP 508, extras are explicitly defined:

identifier    = < letterOrDigit identifier_end* >
extras_list   = identifier:i (wsp* ',' wsp* identifier)*:ids -> [i] + ids
extras        = '[' wsp* extras_list?:e wsp* ']' -> e

I’ve also just checked packaging, and it also implements this correctly:

PUNCTUATION = Word("-_.")
IDENTIFIER_END = ALPHANUM | (ZeroOrMore(PUNCTUATION) + ALPHANUM)
IDENTIFIER = Combine(ALPHANUM + ZeroOrMore(IDENTIFIER_END))

EXTRA = IDENTIFIER

EXTRAS_LIST = EXTRA + ZeroOrMore(COMMA + EXTRA)
EXTRAS = (LBRACKET + Optional(EXTRAS_LIST) + RBRACKET)("extras")

And just to verify, attempting to use an invalid name fails currently:

$ pip install 'requests[-asds]'                                        
ERROR: Exception:
Traceback (most recent call last):
  File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_vendor/packaging/requirements.py", line 98, in __init__
    req = REQUIREMENT.parseString(requirement_string)
  File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_vendor/pyparsing.py", line 1955, in parseString
    raise exc
  File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_vendor/pyparsing.py", line 3814, in parseImpl
    raise ParseException(instring, loc, self.errmsg, self)
pip._vendor.pyparsing.ParseException: Expected stringEnd, found '['  (at char 11), (line:1, col:12)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_internal/cli/base_command.py", line 188, in _main
    status = self.run(options, args)
  File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_internal/cli/req_command.py", line 185, in wrapper
    return func(self, options, args)
  File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_internal/commands/install.py", line 300, in run
    reqs = self.get_requirements(
  File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_internal/cli/req_command.py", line 321, in get_requirements
    req_to_add = install_req_from_line(
  File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_internal/req/constructors.py", line 396, in install_req_from_line
    parts = parse_req_from_line(name, line_source)
  File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_internal/req/constructors.py", line 348, in parse_req_from_line
    extras = convert_extras(extras_as_string)
  File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_internal/req/constructors.py", line 77, in convert_extras
    return Requirement("placeholder" + extras.lower()).extras
  File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_vendor/packaging/requirements.py", line 100, in __init__
    raise InvalidRequirement(
pip._vendor.packaging.requirements.InvalidRequirement: Parse error at "'[-asds]'": Expected stringEnd

Additionally, Metadata 2.1 explicitly declares in the PEP that extras must be valid Python identifiers (emphasis mine):

A string containing the name of an optional feature. Must be a valid Python identifier. May be used to make a dependency conditional on whether the optional feature has been requested.

setuptools does allow you to specify invalid extra names currently (which should probably be treated as a bug). However those names are basically useless, so I don’t think that worrying about widespread use of them is something we need to do.

Thanks for the correction, that’s good to know. It seems we’re well covered in this area.


The valid Python identifier part brings out one more problem since it means that Provides-Extra: foo-bar is technically invalid metadata. It is however not only recognised by pip but also not uncommon in the wild. But that’s an issue for another day, I guess.

yea, the PEP 508 spec allows it, the 2.1 metadata does not. Probably the 2.1 spec just needs updated to match reality.

Because excluding an extra means “install this as if the extra were not specified”, surely this would be resolved exactly as it would be today, by including the packages from thing.

We discussed this a bit in The ‘extra’ environment marker and its operators, and while there was disagreement, it seems pretty clear that contradictory extras can exist and should be caught at resolution time (or later when things don’t work), and that these are a bug in the package, but it shouldn’t matter here.

As an aside, I regularly use/recommend pip-compile to get all the dependencies into a requirements file, then modify it manually and install with --no-deps to exclude dependencies. Would not be opposed to using an extra for more targeted handling, but it works fine.

That was effectively my suggestion in the next sentence yes :wink: I could possibly make an argument that a user would expect -thing to mean never install thing, but I think the UX around enabling that is significantly worse, and it’s rarely what anyone actually wants.

1 Like

I don’t see how the former idea leads to the latter? If the default extras that are included are indeed just extras then couldn’t you just rebuild the default set sans whatever you want left out and skip the sytnax? IOW why is doing foo[spam] to leave out c-accelerated from the default extras set so troublesome as to require special syntax support? I can people making the argument of “but what if I add a new extra to that default set?”, but then I can turn that around and say, “yes, what are you going to do about since you now have a new implicit dependency to add/remove?”

I think if we are going to try and push packages to make small, targeted extras to all for a more composable way to build up indirect dependencies then I don’t think pushing a subtraction mechanism is going to (at least) initially be important.

I am strongly in favor of the subtraction mechanism. If specifying any extras in a dependency would clear the set of extras to be installed, It would also negate any future default extras which means no package author can ever reliably benefit from adding to the default set. Imagine a situation where an author moves a dependency from hard requirements to default extras. Now any dependent package which specifies an extras set for that dependency could break because they’re no longer getting the necessary sub-dependency installed because they didn’t explicitly specify it in the extras set.

In other words, if all dependents say “install this dependency but I don’t need this particular extra from it”, then and only then should that extra be removed from the set at install time.

1 Like

There are 3 basic possible mechanisms we could select here, with varying degrees of usefulness:

  1. The default set of extras just always get included.
    • Not very useful, we’ve basically just added a second install_requires.
  2. The default set of extras get cleared as soon as someone selects ANY extra.
    • IMO this also ends up becoming just a second set of install_requires, because any library that depends on the project in question is faced with a choice. They can either only depend on the extras they care about (and possibly break things for people who are doing a dependency with no extras) or they can attempt to reproduce the entire set of default extras… which is basically just implicitly turning those default extras into a sort of psuedo install_requires. I know that if I was publishing something that depended on a library with default extras where I wanted to override one, I would probably feel compelled to include the default extras to avoid my choices breaking things for other people.
    • I also think the behavior of implicitly clearing the entire set of default extras is a surprising action at a distance that will confuse new people and experienced people alike. I can easily forsee people having to trawl their dependency tree trying to figure out which project selected an extra and caused the entire set of default extras to no longer be included. Likely the way most people will fix this will be by duplicating transitive dependencies into their own projects with more extras included.
  3. The default extras never get cleared implicitly, but you can optionally choose to remove them.
    • This requires introducing new syntax, but I think it matches the existing semantics of extras better, and is far less surprising to people.

It is important, because when you include default extras, you have to pick which of the above 3 strategies you’re going to use. If you do nothing, then you’ve just implicitly selected #1, and added a new field for little to no purpose. So you need some mechanism that enables not installing those default extras, and if you pick #2, you can’t really move to #3 without silently changing behavior (which will break people) and likewise you can’t really move to #2 from #3 without also breaking people. The only way to do a transition like that would be to introduce yet another piece of metadata that controlled what kind of default extras it is… but that sounds like the worst possible outcome to me.

So yea, I think we need to pick what mechanism we’re going to use for causing the default extras to not be installed (because otherwise they’re not extras, they’re just dependencies), and I don’t think it’s a decision we can put off till later, or easily change once it’s been made.

For option 2:

I think “The default set of extras get cleared as soon as someone selects ANY extra.” should work at the individual requirement level, not globally. If, say:

  • package[someextra] == 1.0 works as before
  • package == 1.0 becomes syntax sugar for package[default] == 1.0
  • package[] == 1.0 explicitly selects no extras.

then if one library needs package[] and another needs package, the default extra does get installed.


Update: I personally believe option 3 is better than this, for reasons Donald explains later. But I want the best version of each option to be considered.

1 Like

A possible fourth option is to let extras remove install requirements (from the package dependencies, not the whole resolution context) when they are specified (by fixing The ‘extra’ environment marker and its operators).

This way you could include all your default dependencies, and use an extra to remove one and add another. Personally I think doing it through the environment marker system is fine, as we’re talking about a fairly complex case here.

I’ve been assuming this, because it seems like the only viable option (in essence, an extra is treated like a separate empty package with a requirement of its base package plus the extras).

I’m not sure how you would take extras specified on individual requirements and somehow calculate their effect globally, but since this feels like an argument I guess some people think there’s a way to do this? I’d love to hear what that approach is, because I can’t imagine it myself.

(Not you, Petr. I’m agreeing with you :wink:)

package[] == 1.0 explicitly selects no extras.

And what about the use case I mentioned – moving a hard dependency into optional dependencies? If the dependent package is using a part of the dependency that relies on a subdependency coverered by an extra, that subdependency would be automatically excluded and the application would stop working if the subdependency is moved to default extras from install_requires.

I’m not sure how you would take extras specified on individual requirements and somehow calculate their effect globally, but since this feels like an argument I guess some people think there’s a way to do this? I’d love to hear what that approach is, because I can’t imagine it myself.

I’m not understanding what’s difficult about this. Any extra specified by any dependent should be included.

Maybe I’m misunderstanding, but isn’t that precisely what an installer has to do when resolving a set of requirements?

I can’t tell you exactly how it works, because pip’s handling of extras makes my head hurt, but feel free to go and look at the code :slight_smile: And as an added bonus, if you want to see more than one interpretation of the process, pip currently has two resolvers so you can look at both :slight_smile::slight_smile:

More seriously, I’m starting to find it hard to follow what people are expecting again (in this case, I don’t know what “individual requirement level” and “globally” are intended to mean). If someone could clarify a bit, that might help avoid any miscommunication or misunderstandings.

1 Like

The context seems to be two conflicting extra specifications on the same package in the global context (e.g. A->C[e1], B->C[e2], pip install A B).

I think some of us see this as “A requires C and A also requires the extras under e1”, so essentially extending the requirements of A.

The alternative view is (I think?) “A requires C and C requires [e1]”, which implies that B has a conflicting requirement because C[e1] is not the same as C[e2] (because there is only one “C”, which means the “requires e1” and “requires e2” have to be combined programmatically into a specification that was never written down by a user/author).

Given that extras are deep inside the metadata, I can see the appeal of the latter approach. It means that C remains a single node in the dependency graph, regardless of the extras that are specified. However, it does lead directly to all the issues we’re seeing here (removing extras, conflicting extras (as opposed to merely conflicting requirements)).

I haven’t looked at the implementation, because implementations should be following designs, not the other way around :wink: But I suspect the resolver implementations ought to treat “C[e1]” as a separate node from “C[e2]” so that both can be installed together. Then conflicting extras can only exist within the context of a single specification (e.g. X[d, not_d]) and the conflicting requirements implied by two separate specifications (e.g. X[d], X[not_d]) are resolved as for normal requirement conflicts.

Given that extras are deep inside the metadata, I can see the appeal of the latter approach. It means that C remains a single node in the dependency graph, regardless of the extras that are specified. However, it does lead directly to all the issues we’re seeing here (removing extras, conflicting extras (as opposed to merely conflicting requirements)).

Maybe I’m not understanding, so can you explain what “conflicting extra” means? I can understand conflicting version specifiers, but not conflicting extra. From my POV, having C[e1] and C[e2] as separate dependencies automatically means installing C[e1,e2]. Where is the conflict in that?

2 Likes

having C[e1] and C[e2] as separate dependencies automatically means installing C[e1,e2].

Indeed, and (as per my memory) that’s how both the resolvers in pip will treat them as well.

Indeed, and (as per my memory) that’s how both the resolvers in pip will treat them as well.

My experience is that right now pip picks up the first extras set and disregards anything found later along the way. Very annoying.

To slightly update @uranusjr’s example from another thread:

Name: a
Version: 1.0
Provides-Extra: e1
Provides-Extra: e2
Requires-Dist: b >= 2; extra == 'e1'
Requires-Dist: b < 2; extra == 'e2' 

But also, the issue with a default extra and needing to replace/remove it would mean that it’s trivial to cause a conflict when one requirement wants the default extra and another doesn’t. So there are more future conflicts coming as we work down the path of this thread.