(pre-publish) PEP 771: Default Extras for Python Software Packages

It is really not fair to say this proposal is fully backward-compatible.

  • It’s a complete blocker for all repackagers of Python libraries. What are Conda, brew, MSYS2 and every Linux distribution ever [1] supposed to do? Specify the minimal set of dependencies and pepper users with missing module errors? Make the recommended group mandatory and bloat potentially size-sensitive containers? Just because it’s not our problem doesn’t mean we can ignore it. I don’t see how we can even consider this proposal without an answer to that question.
  • The aforementioned pip freeze problem is still there.
  • It is a massive pain for anyone who doesn’t want to spend the rest of their life reading small prints or source code to find what [minimal] will avoid installing unnecessary bloat.
  • Anything that uses importlib.metadata.requires() will probably gain a logic error and possibly the same impossible conundrum as the distro mantainers.

Being able to unambiguously resolve a dependency tree is not a limitation – it’s a feature.


One area that’s not really covered here is, for interchangeable dependencies case, what installers should do if a non-default choice is already installed. Should it insist on (possibly redundantly) installing the default group or save space by using the non-default? I ask because both answers are nasty consequences. The first approach is obviously a waste but the second one breaks the assuption that pip install foo; pip install bar leads to the same environment as pip install bar; pip install foo.


  1. even the small handle of distributions that do have a concept of recommended or interchangeable dependencies don’t align with this particular model ↩︎

2 Likes

At an absolute minimum the PEP should describe this issue and say “we know this is a problem but we believe the benefits of the proposal justify the problems this issue will cause” (assuming you do believe that to be the case).

Of course, if the community consensus is that the problem is big enough that such a statement is insufficient, then the PEP will be rejected. But that’s just how the PEP process works - if you can’t convince the community, the proposal will probably fail.

I don’t think this is just a packaging issue. It applies everywhere. How does Python 3.8 warn that the match statement isn’t supported? Answer - it doesn’t.

As a pip maintainer, I’ll point out that we don’t support anything other than the latest version of pip, so our response would be simply “upgrade pip”. In particular, we won’t backport warnings to older pip releases. It’s only as an ecosystem that we have to consider how (or if) we support users who continue to use older, unsupported tools.

If you want to start a topic on that, then that would be fine. It certainly affects a number of areas (metadata versions, versioning of the index API, and the wheel format are three that come to mind) But I doubt it would be very productive without a concrete use case to discuss. We’ve already had some debate on this under the “new wheel format” discussions, and it was fairly inconclusive. If you don’t want to just burn people out, you’d probably have to come with a concrete proposal for a better solution…

I’m a strong -1 on making this a packaging summit topic (except in the sense that “some interested people will be able to talk face to face there”). This is something that affects the community as a whole, and should be discussed in a forum that’s accessible to all community members. In other words, here. If individuals want to have offline discussions to come up with a proposal to present to the community, that’s fine, but all of the relevant arguments need to be available in public.

In my experience (as someone who can’t attend the packaging summits for personal reasons) I find that topics discussed at the summits are often difficult to present to the wider community, or as a PEP - there’s a strong sense of “you had to be there” which is hard to get past.

I think there are two possible choices here:

  1. Accept that right now, the process is to require a new metadata version for a proposal like this. Work out a way to ensure a smooth transition under the constraints imposed by this, or accept that the proposal cannot realistically be implemented without changes to the metadata versioning system.
  2. Put the proposal on hold pending a rewrite of the metadata versioning system.

I’ll be honest, I think we’ve had plenty of previous PEPs that have managed with the current versioning system, and with a bit of effort, I don’t see why it’s not possible to do so for this proposal.

Worst case scenario, this is a useful feature for new projects, but can’t be used for existing projects. Or maybe can only be used by existing projects when enough time has passed that the likelihood of an installer not supporting this PEP existing “in the wild” is vanishingly small. That’s not ideal, but it’s not a disaster either.

You should probably take some actual examples of projects affected by the motivating problems in the PEP. Describe the details of how they currently handle the problem, what a solution would look like using default extras, and how they would transition from the current approach to the new one. And work through some possible user scenarios - a user upgrading their installation, a user installing a project that depends on the project you’re discussing (and how their experience would change when the version using default extras gets released), a user who hasn’t upgraded their copy of pip, etc.

If you can’t tell project owners how to support their users when they adopt this new feature, that’s a problem with the PEP. It should be covered in the “How to teach this” section (it’s currently glossed over in the extremely superficial “teaching package authors” section), and may also warrant a separate “Transition” section if it gets lengthy (which I suspect it will).

I think this issue may be insoluble without adding some way to record in an installed distribution, what extras were requested when the distribution was installed. And I’m afraid I agree this is a significant issue with the proposal.

Speaking for pip, at least, the installer doesn’t know if a non-default choice is installed. To determine that, it would have to go through every declared extra, and check whether all of the dependencies of that extra are installed. That’s a lot of work that would almost never be needed, and I don’t think the cost is justifiable.

So the answer is that the first approach is what would be taken. And it’s worse than that. Because there’s no record of what extras got installed, it means that pip install foo[bar]; pip install foo would install both bar and the default extra. And given that the two could easily be mutually incompatible (if they are alternative backends, for example) this could result in a broken installation.

1 Like

On reflection, this isn’t quite as bad, because typically you would use --no-deps to restore an environment from a freeze file. But it’s still a change in behaviour that can cause issues for the user, so it does need to be addressed.

1 Like

It is indeed workaround-able but it’s also a broken assumption that even pip’s docs didn’t anticipate.

And it’s more the concept anyway. It’s pretty presumptuous to assume that no other tools exist which feature writing down a list of packages and/or installing them would run afoul of needing to specify an extra or discarding dependencies to get the same environment.

This discussion is very long, so I apologize if these points I’m raising have already been addressed. I have two concerns right now:

  1. I would like the PEP to address whether a default extra that doesn’t exist throws an error or not.

Currently most install tools have made the design decision that package[extra] where extra doesn’t exist for the candidate they are selecting will just ignore extra. There are pros and cons to this approach when an install tool is faced with multiple candidates, and my understanding is this currently sits under the remit of tool design, specifications aren’t explicit about this.

In this case the Default-Extra metadata is only describing itself and not other versions of the package, so there is no reason to expect that a package would describe a Default-Extra that doesn’t exist. I would like to see language that says that tools SHOULD throw an error for packages that have a Default-Extra that does not exist.

  1. If the package has a default extra and the user specifies a non-existing extra, e.g. package[this-extra-does-not-exist-ecdf6822b], should a tool install the Default-Extra or not?

This comes up naturally during package resolution, let’s say package adds a default extra in version 3, and users complain about this so in version 5 they add a new extra no-default which has no additional dependencies, users start specifying package[no-default] but some users get backtracked to package version 3 or 4. Because most install tools will install package without the extra no-default for version 3 would they get the default extra or not?

If the answer is no, they will not get the default package, then likely users will start using some convention to avoid default packages, e.g. package[nothing], this is problematic because it introduces a non-standard convention which will restrict tools from changing their behavior about how to handle non-existent extras in the future.

If the answer is yes, they should get the default package, then a user who specified package[no-default] in this example gets no extra defaults for package versions 5 and higher here, but does get extra defaults for versions 3 to 5 even though they specified package[no-default] which they might find surprising behavior.

@konstin @dustin @pf_moore @oscarbenjamin @bwoodsend @notatallshaw - thank you all for taking the time to look at the PEP and provide comments!

I am going to have a think over the next week about how to address the various concerns, and will make some edits/additions to the PEP. As @pf_moore (and others have) said, even if we can’t actually solve all the issues raised, we should make all these issues explicit in the PEP so that people can then decide whether the benefits are worth the possible downsides.

One quick note to make sure we don’t discuss this one further: I have already made a change to the PEP to mention Metadata-Version:

Since this introduces a new field in the core package metadata, this will require Metadata-Version to be bumped to the next minor version (2.5 at the time of writing).

2 Likes

I’d strongly suggest posting your thoughts here for feedback, before updating the PEP. The problems that have been raised don’t (IMO) have simple solutions, so discussing the options before picking a position for the PEP to take seems like it would be more productive.

1 Like

Sounds good, I’ll post my thoughts here and will hold off on making any updates to the PEP

2 Likes

I’d also say… let’s start a new thread for the PEP once https://github.com/python/peps/pull/4198 lands (and wait until then before continuing the discussion). :slight_smile:

1 Like

I think it is worth attempting to address the concerns raised here before going ahead with the PEP publication and subsequent discussion since if we can’t get close to consensus here, what hope do we have with the wider community :sweat_smile:. I doubt we’ll all agree on everything here and I don’t think it’s realistic to wait for that to happen, but let’s see what we can converge on :pray:

I’ll try and reply to the various open questions concerns in no specific order. Please let me know if I have missed anything! I numbered the different questions/concerns to make it easier for people to refer to them.

Q1: Errors for non-existent extras in Default-Extra

@notatallshaw mentioned:

I would like to see language that says that tools SHOULD throw an error for packages that have a Default-Extra that does not exist

This seems reasonable to me - in fact the PEP currently says that any Default-Extra entry must match an existing Provides-Extra entry, but I could add a sentence to say explicitly that tools that tools should raise an error if that isn’t the case?

Q2: What happens when users ask for non-existent extras?

@notatallshaw also said:

If the package has a default extra and the user specifies a non-existing extra, e.g. package[this-extra-does-not-exist-ecdf6822b], should a tool install the Default-Extra or not?

This is a trickier one but I believe that yes the default extra should get installed (I can add this explicitly to the PEP). In the example you give it is true that package[no-default] will still install defaults for versions 3 and 4, but I don’t think that should act any differently to say package[non-existent-extra]. Essentially if you give a non-existent default, I think that e.g. pip should emit a warning and then act as if it hasn’t been passed since it is getting ignored. One important reason that we should install the defaults if the extra is not recognized is because otherwise people could systematically add [no-default] to literally all packages to try and always disable defaults, which would be bad.

Q3: Repackagers of Python libraries

@bwoodsend asked:

What do we do about repackagers of Python libraries?

I agree that this is an important aspect to mention in the PEP.

Some packaging systems such as conda don’t even have the concept of extras, and I’ve seen cases where e.g. a conda recipe actually includes several of the extras by default, so that some conda packages are already bloated compared to the strict minimum they need to function. Some linux installers such as apt have the concept of recommended vs minimal dependencies, so in some (but not all) cases where default extras are used for this, they might actually line up well with that model.

However at the end of the day it’s difficult to have a single answer to this question, and given that the PEP could be applied in different ways (e.g. minimal vs recommended dependencies, default backends/frontends/etc), I think that repackagers will need to judge on a case-by-case basis what to do. The key is that they already need to make judgment calls as described above, but it is true that this PEP may add a little extra cognitive work for packages that make use of it. I can edit the PEP to acknowledge this.

In any case I am going to reach out to some repackagers that have not been involved in the discussion so far to ask them how they feel about the PEP.

Q4: pip freeze and pip install -r

@bwoodsend mentioned again the issue related to the fact that restoring from a pip freeze command may not round-trip if there are packages with defaults, and @pf_moore mentioned that normally one should use --no-deps to restore an environment from a freeze file. Note that this is true even prior to this PEP. Here is a concrete example: astropy has a required dependency on pyyaml, but in fact a lot of astropy could function without it. So if I wanted to make a lightweight install, I could in principle do:

pip install astropy
pip uninstall pyyaml
pip freeze > requirements.txt
pip install -r requirements.txt

However in this case, pyyaml (which is listed as required by astropy but not strictly required for a lot of the package) would get installed again, so the final result wouldn’t match the content of the requirements.txt file. So even right now there are use cases which highlight that one should use --no-deps. I think we should do three things:

  • Update the PEP to mention that correctly using --no-deps when installing from a freeze file (and equivalent for other packaging tools) is going to be especially important if the PEP is accepted.
  • Acknowledge in the PEP more clearly that this kind of issue could happen with other tools too.
  • Update the pip docs as soon as possible to recommend the use of --no-deps when restoring from a freeze file, regardless of what happens with this PEP.

Q5: Burden to check for minimal installs

@bwoodsend said:

It is a massive pain for anyone who doesn’t want to spend the rest of their life reading small prints or source code to find what [minimal] will avoid installing unnecessary bloat.

This might have a technical solution: in principle it should be simple to write a tool that can scan the core metadata for a given package and determine if a package is using default extras and if so whether it defines an empty extra that effectively disables the defaults. Interestingly, as a user I find it a pain sometimes to have to figure out which extras I can enable, so in some ways this is not a new problem.

Perhaps a longer term solution orthogonal to this PEP would be to actually have a way to document extras in project metadata, to provide a one-liner about what each one does, and tools could then do e.g.

pip list-extras astropy

to show what extras are available and this could then show the description too. Perhaps a separate PEP? :slight_smile:

Q6: Interchangeable dependencies

@bwoodsend said:

One area that’s not really covered here is, for interchangeable dependencies case, what installers should do if a non-default choice is already installed

As written, the current PEP would lead to installers being agnostic to what is already installed. So yes in principle it would be a waste to install a dependency that isn’t needed, but there’s no other sane option, because how would pip (or other packages) know that a package already installed fulfills the same needs as the default extras? There’s no mechanism for specifying that two dependencies or two extras are equivalent and interchangeable.

Q7: Conflicting dependencies

@pf_moore replied to the above comment and said:

And it’s worse than that. Because there’s no record of what extras got installed, it means that pip install foo[bar]; pip install foo would install both bar and the default extra. And given that the two could easily be mutually incompatible (if they are alternative backends, for example) this could result in a broken installation.

I’d be interested in whether such a case of actually mutually incompatible alternative dependencies exists as it is mentioned sometimes but it’s hard for me to address without actually seeing it in the wild - it seems like it would be poor design for a package to actually not be able to work if two alternative backends are present for instance (at least it should just pick one and ignore the other).

Let alone the present PEP, there is no way to guarantee that two other packages might not each pull in one of the mutually incompatible dependencies. Let’s say package A needs package B or C but it will crash if B and C are both present. What if a user then installs packages D and E which depend on B and C respectively? In this sense this PEP is not really introducing a completely new problem.

The PEP as written states:

Note that this PEP does not aim to address the issue of disallowing conflicting
or incompatible extras - for example if a package requires exactly one frontend
or backend package. There is currently no mechanism in Python packaging
infrastructure to disallow conflicting or incompatible extras to be installed,
and this PEP does not change that.

Should anything be added beyond this to address this point?

Q8: Concrete examples

@konstin said:

Some real world examples with popular package, such as fastapi-cli[standard], would make the PEP more tangible and ensure we’re solving those cases.

I’m reluctant to add too specific examples, because it’s hard to find specific cases that will actually resonate with people (for example, I have no idea what fastapi is!). I’m also worried that specific examples aren’t timeless and could even be out of date before a decision would be made on the PEP. I would prefer to try and describe different general hypothetical cases that people can map onto what they are familiar with, as I’ve tried to do in the examples of usage in the PEP (but happy to be convinced otherwise).

Q9: Compatibility with older versions of tooling

@konstin and @dustin also raised the following point:

Wouldn’t this apply with the current approach too, because installing a package relying on default extras for e.g. a good getting started experience would have a broken installation with older tooling, except it’s not shown as an error? [and subsequent comment about authors moving required to optional default dependencies]

Ok so I think this is a very interesting and important question and we should add something about this to the PEP. I think the key here is that package authors should take the same care with default extras as they do with other aspects of dependencies at the moment, in the sense that they should think about what will happen for users who don’t have the latest tooling. Not everyone will use default extras to reduce their required dependencies, some may simply add dependencies that were regular extras before.

At the end of the day, authors are the ones responsible for ensuring that they know their audience and support their users, and this is true even before this PEP. As a package maintainer myself, for packages where I know I have users that may be using old versions of pip I’ve had to be careful before (in the past) adopting pyproject.toml too soon, or completely getting rid of setup.cfg/setup.py too soon. So while I agree that it means that it might be hard for some maintainers to adopt this in the short term, isn’t it pretty common for established packages to have to wait a bit before adopting the latest shiny feature?

From my perspective, the bottom line is that I think a lot of the responsibility for this specific issue lies with the authors, but I think the PEP should also include a description of these potential gotchas in ‘How to teach this’.

Q10: PEP organization

@pf_moore: I have noted your comments regarding splitting out some of the sections from ‘How to teach this’ into an ‘Examples’ section and expanding ‘How to teach this’, I will try and address this on the next edit once we converge on other issues.

Other issues

Anything that uses importlib.metadata.requires() will probably gain a logic error and possibly the same impossible conundrum as the distro mantainers.

@bwoodsend I’m not sure I understand this one above, can you give a specific example?

This is “the wider community”, for what it’s worth :slightly_smiling_face:

Deliberately removing a required dependency is not a good supporting example, IMO. The behaviour of the sequence of commands you give is correct, IMO, because it fixes the broken environment you created. However, the case with a default extra is different, because it doesn’t fix anything - instead, it fails to reproduce the correct result that the user quite reasonably would expect.

And in spite of the fact that --no-deps is often the case, it is not a requirement to use it. There are many uses of requirements files (even in conjunction with pip freeze) which are perfectly correct to use without --no-deps, and breaking those use cases isn’t acceptable.

To give a concrete example, pip freeze > reqs.txt, then edit reqs.txt to remove the version constraints, then pip install -r reqs.txt, is a perfectly acceptable way of upgrading everything in an environment to the latest versions. How would you do that if one of the packages in the environment had been installed with a [minimal] extra that did nothing except suppress the default extras?

I consider that unacceptable, for the reasons stated above.

Again, no. There are valid use cases for omitting --no-deps, and we’re not going to prohibit them (or even recommend against them). In actual fact, it’s the “freeze then install --no-deps” approach that I would imagine being recommended against in the future, once lockfiles get standardised, as that use case is much better served with a proper lockfile.

My general rule, born of bitter experience, is that any example of problematic behaviour you could imagine almost certainly is used somewhere. And if you’re particularly unlucky, it’s used in a closed-source environment so you don’t even have a chance to find it by scanning PyPI (or similar). Personally, I can easily imagine a tool that says “install the backend you want”, which won’t work with both installed.

Agreed. The new aspect is that you can now do this without needing to install anything other than foo, and there’s nothing in the installed metadata for foo that says that the default extra was installed. So just looking at what’s installed by pip install foo[bar], you have no way of knowing that the default extra was not installed short of reading the definitions of the bar extra and the default extra, and matching them against what’s installed.

This is simply another manifestation of how the PEP interacts badly with the existing problem that there’s no record in the “installed package metadata” of what extras were requested at install time.

I don’t know, to be honest. The fundamental problem is that you cannot recover information about what extras were installed. That’s an existing problem. What this PEP brings to make it worse, is the fact that “assume that no extras were requested” will no longer install the mimimum safe set of packages. Breaking that assumption has various consequences, which are what we’re bringing up now, and it’s far from obvious (to me, at least) that those consequences are acceptable.

The fix might be to fix extras at a deeper level, making them a real thing that gets recorded in the installation data. At the moment they have a weird sort of “partial” existence, that we’ve managed to live with but the more we try to “improve” them, the more the cracks start showing. Having said that, I’m on record as having a deep dislike of extras, precisely because of the problems they cause - so I won’t even try to pretend my view here is unbiased…

Examples don’t need to be timeless, they only need to support the proposal in the PEP for as long as it takes to get it through to pronouncement. And honestly, if you find that a project has worked around the need for a default extra in the time it takes to get this PEP to the approval stage, that in itself isn’t a good sign for the usefulness of the idea in any case - we’re supposed to be fixing something that projects need, and are struggling to work without, after all.

I wouldn’t worry too much about the reader’s background yet. Write up some (open source) examples that you, personally, know of and find compelling[1]. If they aren’t projects people know about, you’ll find out - and you can ask when you do for better examples.

In which case, “How do we get that message across” is a crucial question for the “How do we teach this?” section of the PEP. Because I would expect that the gut reaction of most package authors would be “how can I use this?”, and not “is this OK for me to use?”


  1. You do have a number of such examples, don’t you? ↩︎

2 Likes

I understand the apprehension about the scenario where a new default version causes unexpected problems for users who aren’t expecting it. That said, I think it’s worth considering: are we being over-protective? That scenario is a migration puzzle for the package maintainers, who introduced it–if they introduce a default extra with no warning and it breaks things, they’re going to receive the majority of the user feedback about it.

Surely it will be the responsibility of the package to figure out their own transition plan? A package with a large user-base should be appropriately cautious, while a small package with a short support window can be more aggressive about it.

The discussion about how this will work is definitely valuable (and some of it could fall under “How to teach this” in the end), I just think there’s a balance here. The modification is totally backwards compatible in the sense that no one is forced to define a default extra–the concern here is that new package releases will require from troubleshooting, but that’s hardly a new problem.

Edit: realized I am mostly just paraphrasing the most recent posts! :upside_down_face:

2 Likes

While this scenario might happen, could it be solved by providing a simple escape hatch by the frontend? Similar to how PEP-517 can still be escaped by the minority. :thinking:

In general, I agree with you. But my experience on pip is that we receive a lot of the issues around this. People characterise any install issues as “pip didn’t do what I wanted”. So I’m making a point of this largely out of self-interest.

I agree that ultimately this is for package authors to handle. But I think the PEP needs to explain the risks and potential problems, so that they can make an informed decision.

2 Likes

Likewise, as a maintainer of a packaging tool, I also receive a lot of issues from users who falsly blame me for other projects’ unwise decisions. Unfortunately, the people who get hit by the issues worst are the people who don’t understand packaging enough to even triangulate their issues to the right component.

I didn’t want to get into this before but since you bring it up: This to me is an excellent example of why this feature shouldn’t exist. pyyaml is a large and vulnerability ridden [1] dependency and it’s being sucked in probably for no reason. I don’t care whether the opt out is pip install astropy[nopyyaml] or pip uninstall pyyaml – it should’ve been opt-in to begin with [2]. This feature actively encourages package maintainers to do the wrong thing. (In fact I’d consider all the motivations for this whole proposal to be people wanting to do the wrong thing. I’d fast track a package out of my dependency tree if it were to do any of them.)

That sounds like quite a nasty blow to distributions like Fedora that try to automate away as much of the copying dependency lists drudgery as possible. It also puts distro maintainers in the crosshairs for the my package manager made the wrong arbitrary guess complaints.


  1. even its so called safe mode can be trivially made to DoS or segfault using nested anchors ↩︎

  2. or better yet, decoupled from its config parsing so users can use whatever formats and libraries they choose ↩︎

1 Like

Q1: Errors for non-existent extras in Default-Extra

This seems reasonable to me - in fact the PEP currently says that any Default-Extra entry must match an existing Provides-Extra entry, but I could add a sentence to say explicitly that tools that tools should raise an error if that isn’t the case?

Thanks for drawing my attention to this, it seems sufficient to me, tools and services can enforce it how they see fit (early exit error in most cases hopefully).

Q2: What happens when users ask for non-existent extras?

This is a trickier one but I believe that yes the default extra should get installed (I can add this explicitly to the PEP). In the example you give it is true that package[no-default] will still install defaults for versions 3 and 4, but I don’t think that should act any differently to say package[non-existent-extra]. Essentially if you give a non-existent default, I think that e.g. pip should emit a warning and then act as if it hasn’t been passed since it is getting ignored. One important reason that we should install the defaults if the extra is not recognized is because otherwise people could systematically add [no-default] to literally all packages to try and always disable defaults, which would be bad.

After thinking about this myself, I also come down on this side for much of the same reasoning. A non-existent extra should at best be treated as no extra, and if there is no extra then a tool should select the default extra(s). I would like to see explicit language, as I think it would be painful if tools were split on how to handle this.

1 Like

FYI I just tested and indeed I was able to remove entirely (at least for the build phase):

  • validate-pyproject
  • importlib_metadata

So that’s nice. I removed both from the demo.

I’m working at fixing a few bugs I found (and a few people reported) mostly in PIP.

1 Like

Even if they don’t end up in the PEP, I think engaging with a couple real-world projects that have a need for this would be beneficial. Here are a few popular packages that I know of that instruct users to install an extra by default:

3 Likes

It’s not necessarily so easy to be exhaustive …

Many packages (at least in the compute/AI world) have taken the bias to force you to install dependencies that you may not need just because “it’s easier to install too much than too little”.

Or they separated “the default” by different indexes, depending on which index you ask for you get different dependency set.

One good example of that is pytorch which is a mastodon of the AI ecosystem.

Examples don’t need to be exhaustive, they are just examples.

6 Likes