FWIW, here is a use case that I have. I’ll be concrete, but hopefully not so detailed as to be distracting.
The xarray package has two dependencies: numpy and pandas. My package uses numpy, and could use some features of xarray that don’t require pandas, but I don’t want to pull pandas into my dependency tree. (I’m willing to accept ImportErrors if my code accesses pandas-dependent features.)
As it stands, I would either need to convince them to break out the pandas-independent concepts into a separate package (plausible, but a nontrivial effort), or move the pandas dependency into an extra and make all packages depending on that functionality install xarray[pandas] (non-starter).
If I could submit a PR to make xarray[minimal] or xarray[!pandas] a way to just get xarray and numpy, the maintenance burden on xarray would be substantially lower. It would also only (okay, mostly) be used by downstream packages like mine, that understand the risk they’re taking to not pull in all dependencies and to watch upstream to ensure that functionality is maintained with the reduced dependency set.
Based on the example outlined here, Dustin’s proposal is that explicitly specifying any extras disables the default extra. This further emphasises the point that someone really needs to write something down before this topic can be meaningfully discussed
I don’t think that works in practice TBH, otherwise you get into weird situations like what does it mean if someone depends on foo and bar, and foo depends on spam, and bar depends on spam[thing]. It also sounds like the kind of weird, implicit action at a distant thing that trips up a lot of folks with packaging.
Woah now! I lay no claim to this idea. Just trying to help figure it out
How is this different than what we already have now?
The challenge of getting everyone here to grok the problem is definitely making me feel like it’s really going to be challenging for the average user to understand it. That said, the amount of people coming out of the woodwork asking for this (and the lack of suitable alternatives) makes it still feel, worthwhile.
Re-reading this, I think this is where the disconnect happens: I’m talking about a single default extra, which is a list of one or more dependencies that are effectively appended to install_requires if no extra is requested. There isn’t ever more than one “default extra” available (hence why I was confused about trying to negate it).
Then, if you just do pip install kivy it’s the same as doing pip install kivy[base] and kivy works out of the box. If you’re a advanced user and don’t want the default extras you could do pip install kivy[nothing].
Although it does feel a little bit hackier than having a explicit --no-default-extra pip option.
Personally I feel any user wanting to disable defaults should automatically count as above above average. They should accept the default instead if the concept is beyond them
With that said, I don’t think it’s a good solution to do this as a pip option either. Once we do that, people will start wanting to include it in their requirements.txt, lock file, Requires-Dist, etc. It would be better if we come up a way to encode this information into the PEP 508 format.
Right, I think that’s a bad pattern. For a few reasons:
I think it scales poorly, if I have 5 different “axis” that my extras could install (e.g. maybe I can select what asyncio reactor I want to use based on a extra, and select optional C speed ups on multiple different acceleration packages, etc) then my “single list of default extra things to install” ends up being this weird mish mash of dependencies that aren’t related to each other at all, other than I want them optional, but installed by default.
Similar to the above, the granularity is wrong. If I have 5 different axis, and I want to select a new extra for any of them (or even for something completely unrelated) I now have to specify the default extras manually for every single one of those axis.
It makes backwards compatibility harder OR it’s not really allowing the minimal install a number of cases. In my foo/bar example above, either bar depending on an extra unselects the entire set of defaults (in which case, my dependencies starting to use a new extra when before they didn’t suddenly becomes a backwards incompatible change, when before it never was) OR we continue to install the default dependencies anytime anyone depends without explicit extras… which basically means that anytime a thing with default extras is depended on in a library (since libraries are not likely going to want to make decisions about backends or so) then the default extra just degrades to being effectively the same as install requires in all but simple cases.
Likewise to the above, backwards compatibly for the project becomes harder, since adding a new axis to the default extras requires just adding it to the flat list of things, any downstream that is manually specifying extras, has to go and update their dependency specifier to add this new extra, since it won’t be selected by default (because i’m explicitly listing extras), even if I have no opinion about which way to go, and I’m just going to copy the default.
It forces me to keep two lists of dependencies in sync, with no real mechanisms to automate keeping those in sync other than inside a dynamic setup.py, which we’ve been trying to de-emphasize.
It makes discovering what the default options are harder. If I want to specify only one out of N axis, I have to figure out what extras I need to select to get the other N-1 axis back to their default state, but since there’s just a single flat list of dependencies in this implicit list of extra dependencies, I have to either hope the project documented it on their own, or reverse engineer it by comparing the actual contents of the different lists of requirements.
It allows projects to get their metadata in weird states where downstream users can end up having to manually specify dependencies, and not only does it allow it, but this is the path of least resistance option for projects to be in. If a project only lists their default extras in (to avoid the problems in #5), and I want to customize my select for 1 axis, but use the default for other axis, and the project hasn’t duplicated that metadata, my only options are to manually copy that dependency data into my dependencies.
Instead my suggestion is literally everything gets defined as an extra like we have today, and this new metadata field literally just lists the name of extras that should be selected by default. Essentially this means that a package foo, with a metadata field like Default-Extras: spam, c-accelerated is equivalent to doing pip install foo[spam, c-accelerated]. Since extras are always additive, if two things depend on foo, one with the default, one with some explicit set of extras, we’d just take the union of all of those extras.
That then raises the question of how do we unselect extras, which hasn’t been a problem before, because there was no such thing as this implicitly selected extra. My rough idea for that is to extend PEP 508 with some explicit syntax to negate a selected extra, say for instance we say you can negate an extra by prefixing it with a - symbol. So if I wanted to install foo, but without the c-accelerator, I would do pip install foo[-c-accelerated] (we could use ! or whatever, I don’t care that much about specific syntax).
The ability to unselect an extra raises the question of what do we do if foo and bar both depend on spam, and one does spam[-thing] and the other does spam[thing], how do we resolve the situation? My suggestion would be to make an explicit request for an extra always take precedence, while an explicit request to exclude an extra only takes precedence over the list of default extra names to install.
Looking at my list of problems with the nameless default extra, this solution solves those problems, in more details:
All extra definitions remain defined as they are today, separately keyed by the extra name, so they’re each grouped by the “feature” they’re related to. It doesn’t matter how many different extras I want, they’re all cleanly separated.
If I want to specify an extra, It doesn’t affect the list of what’s being installed at all other than to be additive, which is exactly the same semantics extras have today. So say opting into a C accelerator (or a GPU specific package or whatever) doesn’t suddenly opt me into managing my list of extras for every axis that a project might have extras for. This is particularly nice for extras that are wholly unrelated to the things that are being installed by default.
Backwards compatibility is basically the same as it is today. I can depend on things with extras without worrying that doing so will suddenly unselect a bunch of default options for other packages, since depending on things with extras is by default just adding new dependencies.
For the project providing the extras, backwards compatibility is much nicer for the most part. I can add new named extras to my list of default extras, and my downstream consumers will just get my implicit default without having to do any work, it’ll just work for them. The one situation where the other proposal is nicer is if I want to make absolutely sure that no default extras are selected, (the minimal example above), then a new version might start adding a new extra to the default that I have to update my dependency spec to add another negation. I don’t think this is a big deal AND I think it’s going to be the least common use case, however, if we’re worried about supporting that use case, then we can add some sigil that says to deselect all default extras (foo[-:default:]? foo[-*]? I dunno).
Since the dependency lists are only ever specified in the actual extra definitions, there’s no need to keep anything in sync.
I can simply look at the metadata for a project (using pip show or something) and see exactly which extras are selected by default. So when I’m deciding what to add or remove, I have all that information available to me (which is super useful if we add the syntax in 4, because you could do something like say I want to remove all the defaults… except one, then you can see what those defaults are, and then do foo[-:default:, something-that-was-a-default-to-reselect-it-explicitly].
Every state that can be expressed with default extras must be capable of being expressed with the extra syntax. There’s no possibility where some set of dependencies only would ever get installed with no extras selected. Which is better both for consistency and for phasing this feature in, since users with older installers will need to continue to manually select which extras they want installed.
Honestly, the “default implicit unnamed extra” proposal feels like it really only works in very simple cases where there is only a single axis someone might want to select different extras for. It also brings in this weird, action at a distance behavior that I think will be very surprising to end users both new and old, whereas my proposal doesn’t change how extras function, unless you’re explicitly using the new syntax.
Oh, the other thing I dislike about the single implicitly named default extra option, is it doesn’t lend itself well to future expansion. Now I’m not saying we’re ever going to expand extras to be more comprehensive.
However, one could envision a situation where what extras have been selected are made available at build or runtime, to allow them to act as a more fully featured (heh) feature flag system (they’re already sort of feature flags, just feature flags that are limited to only adding new dependencies). The single implicitly named thing is a lot harder to work with in that situation, because there’s no extras name to key off of for whether a feature has been toggled on or off, and because of the fact it doesn’t handle the “multiple axis” problem well, it would make code that consumes those feature flags harder, because it would have to check for both the explicitly named feature flag, and whatever marker we use for no feature flags selected at all.
Another possible enhancement is the ability to make mutually exclusive extras, or to make mandatory extras. Basically things like “Ok well you can use any backend you want, but you must pick at least one, but by default we’ll pick one for you” or whatever, would most likely be easiest to implement by using extra names in some constraint language of some kind. Say that you have to pick at one backend, one could imagine a constraint that says like, backend1 or backend2 or backend3 or something.
These aren’t really fully fleshed out ideas, and I’m not really even saying any of them are a good idea, but the named list of implicit default extras proposal makes implementing those kinds of additional enhancements much easier I think, and makes the implicit defaults much less of a special case, whereas the single implicitly named default extra option sort of just makes all of those kinds of features harder.
Reminder that there is currently no restrictions on extra names (unlike package names), so we’ll need to specify that first to make any syntax possible. IIRC setuptools has some restrictions on what you can use as extra_requires keys, but they only apply to setuptools (IIRC even pip’s behaviour is different!)
And just to verify, attempting to use an invalid name fails currently:
$ pip install 'requests[-asds]'
Traceback (most recent call last):
File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_vendor/packaging/requirements.py", line 98, in __init__
req = REQUIREMENT.parseString(requirement_string)
File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_vendor/pyparsing.py", line 1955, in parseString
File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_vendor/pyparsing.py", line 3814, in parseImpl
raise ParseException(instring, loc, self.errmsg, self)
pip._vendor.pyparsing.ParseException: Expected stringEnd, found '[' (at char 11), (line:1, col:12)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_internal/cli/base_command.py", line 188, in _main
status = self.run(options, args)
File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_internal/cli/req_command.py", line 185, in wrapper
return func(self, options, args)
File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_internal/commands/install.py", line 300, in run
reqs = self.get_requirements(
File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_internal/cli/req_command.py", line 321, in get_requirements
req_to_add = install_req_from_line(
File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_internal/req/constructors.py", line 396, in install_req_from_line
parts = parse_req_from_line(name, line_source)
File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_internal/req/constructors.py", line 348, in parse_req_from_line
extras = convert_extras(extras_as_string)
File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_internal/req/constructors.py", line 77, in convert_extras
return Requirement("placeholder" + extras.lower()).extras
File "/Users/dstufft/.virtualenvs/tmp-94e32e344e5efa5/lib/python3.8/site-packages/pip/_vendor/packaging/requirements.py", line 100, in __init__
pip._vendor.packaging.requirements.InvalidRequirement: Parse error at "'[-asds]'": Expected stringEnd
Additionally, Metadata 2.1 explicitly declares in the PEP that extras must be valid Python identifiers (emphasis mine):
A string containing the name of an optional feature. Must be a valid Python identifier. May be used to make a dependency conditional on whether the optional feature has been requested.
setuptools does allow you to specify invalid extra names currently (which should probably be treated as a bug). However those names are basically useless, so I don’t think that worrying about widespread use of them is something we need to do.
Thanks for the correction, that’s good to know. It seems we’re well covered in this area.
The valid Python identifier part brings out one more problem since it means that Provides-Extra: foo-bar is technically invalid metadata. It is however not only recognised by pip but also not uncommon in the wild. But that’s an issue for another day, I guess.
Because excluding an extra means “install this as if the extra were not specified”, surely this would be resolved exactly as it would be today, by including the packages from thing.
We discussed this a bit in The ‘extra’ environment marker and its operators, and while there was disagreement, it seems pretty clear that contradictory extras can exist and should be caught at resolution time (or later when things don’t work), and that these are a bug in the package, but it shouldn’t matter here.
As an aside, I regularly use/recommend pip-compile to get all the dependencies into a requirements file, then modify it manually and install with --no-deps to exclude dependencies. Would not be opposed to using an extra for more targeted handling, but it works fine.
That was effectively my suggestion in the next sentence yes I could possibly make an argument that a user would expect -thing to mean never install thing, but I think the UX around enabling that is significantly worse, and it’s rarely what anyone actually wants.