PEP 771: Default Extras for Python Software Packages (Round 2)

I think post you’re referring to is (mostly) talking about what’s proposed in PEP 771, but suggesting a further enhancement to it?

I could be wrong :slight_smile: But it reads to me like @flying-sheep was saying that PEP 771 currently is pretty close in behavior to how cargo treats features and default features, and folks have suggested that there are complaints about how that feature works in cargo, so we should learn from that. So @flying-sheep was saying they tried to find what those complaints wrt cargo actually were and when they looked, they didn’t find much, other than the fact that you can’t selectively request to omit default features, you can only omit all of them or none of them.

Then the rest of the post appears to be a proposal to extend PEP 771 to try and solve that limitation of how cargo has implemented default features/extras which PEP 771 currently also has… but I think it’s still about PEP 771 in general (other than some minor syntactical difference in spelling “omit all default extras” as package[-] instead of package[]).

I think that the implication in that post that the point of contention is that in the granularity of omitting default extras isn’t accurate, and it’s really more foundational about whether it makes sense to have some sort of optional, but installed by default, dependencies or not.

What I’m specifically trying to talk to more is the ones you summarized (which I agree, the concerns themselves aren’t vague, but I personally think a lot of them feel like they’re assuming authors are going to act with disregard towards their users):

1 and 2 here both feel very much like basically saying that project authors aren’t capable of handling the implications of default extras without just adding tons of stuff willy-nilly. I would argue that if a project author was going to just add stuff willy-nilly, they’re just as likely to do so with required dependencies as they are with default extras (and default extras is strictly better for people wanting minimal dependencies than adding optional stuff to the required dependencies just to get them installed by default).

(3) is true… but is also inherent in any solution to this problem, and already exists really, dependent packages currently have to choose whether they’re going to only install the “required” (which may or may not be the actual minimal set of dependencies) or whether they’re going to add any additional packages.

Even if projects are being wholly honest in their required vs optional dependencies today, that’s still true-- those dependent packages have to decide if they’re going to preserve the “full” vs “minimal” split. It’s just the “default” today is the supposed minimal split (but not really, because lots of packages are putting more than their minimal in there).

(4) is obviously not true, given that other ecosystems exist that have this modeled in a similar way, and those other packaging systems have managed to package those ecosystems.

4 Likes

Taken literally, yes, it’s not true. “Impossible” is overstatement. IMO this is a real problem though, and totally tractable to address in the PEP. There can be a recommendation for repackaging scenarios, or the PEP can say that this has to be case by case.

I don’t even think it’s that hard to take a stance on this. The text just doesn’t address it (or at least, didn’t the last time I gave it a full reread).

I consider repackagers to be part of the audience for this spec, so I’d like their needs to be addressed directly.

3 Likes

There is a section on Packaging repository maintainers which acknowledges this aspect but doesn’t take a stance at the moment – I’m open to concrete suggestions on how we could improve this section.

3 Likes

I think there are three sub-threads going on:

One sub-thread should should be settled by now I think: about the PEP in general, the fact that the semantics are opt-out, how it can be “misused”, … I think that at this point, we shouldn’t have to defend it like @dstufft @takluyver and I felt compelled to do.

The 2nd sub-thread is about my proposal of 1. making the proposed semantics less confusing (specifying extras doesn’t deselect default extras) and 2. taking the lessons learned by cargo into account by allowing to deselect individual default extras. I think it’s super important to take cargo’s learned lessons seriously, they collected a lot of experience. I wouldn’t hate if the PEP went into effect as-is, but think it would be the forward-thinking thing to do to give package maintainers the ability to evolve extras without every change being breaking, whereas the current proposal would force them to paint themselves into a corner.

This is the 3rd sub-thread. As said, I maintain 40 repackagings of Python packages in the AUR and was offered to become an Arch trusted user (which I turned down for time constraints). As said, I think judgment calls about how to map subtly incompatible concepts is the core of what repackagers do, and we do it all the time. Some (Debian) err at the side of max granularity, others (Arch) on the side of pragmatism and less time spending resolving dependencies at the cost of some kilobytes of inert Python code.

3 Likes

As it currently stands (with the reinterpretation of the bare package name), PEP 771 doesn’t give them the option of not using the GPL parts. I don’t think anyone objected to the metadata existing, it was all about how it was interpreted, and out-out isn’t included (and the PEP is very definite on what pip install foo must do). I’m not sure why the option of suggesting higher level interactive tools/interfaces (e.g. poetry add or uv add, IDEs) include the default extras/ask the user what they want to do wasn’t taken up (rather than mandating a certain UX, as the PEP does currently), but I think it resolves most (if not all) the issues people raised?

They’d do scanpy[] (current proposed syntax) to exclude the default optional dependencies. We’d make sure all GPL dependencies would be optional. They’d have to take care that nothing else pulls in scanpy, just like they currently have to take care that nothing pulls in scanpy[leiden] (with the leiden extra pulling in GPL’d igraph).

That’s what SBOMs and other tools to enumerate and flag incompatible licenses are for.

I really don’t think it needs more than it has. As said, mapping different dependency concepts is the bread and butter of everyday repackager routine. E.g. in conda, we map extras to constraints – optional-dependencies.foo = ['scikit-misc>=1.4', 'scikit-image'] becomes constraints: ['scikit-misc>=1.4']and people have to manually figure out the set of packages enabling a feature.

2 Likes

But every dependant package must also change scanpy to scanpy[], and because PyPI does not allow for replacing metadata, no existing packages can be fixed. Hence why not changing what the bare package name means in dependency specifiers, and instead allowing tools to guide users to the default extras is a better path forward.

poetry add already adds additional syntax (see Commands | Documentation | Poetry - Python dependency management and packaging made easy), and doing poetry add numpy adds numpy (>=2.4.4,<3.0.0) to the pyproject.toml file (rather than just numpy), so I see no reason why poetry add scanpy can’t add scanpy[leiden] (assuming leiden is in the default extra list)? poetry init already asks a bunch of questions, there’s no reason Add default extras? can’t be an option (or when it’s looking up packages, asking if the default extras should be included if they exist).

3 Likes

I don’t think anyone thinks package authors (or the PEP authors for that matter) aren’t trying to do what’s best for their users, but I do think that they will likely be unaware of the network effects that will result from adding default extras (as the PEP currently stands). Using astropy as an example, there are going to be a load of unmaintained or undermaintained packages which depend on it, if astropy included all the packages in its recommended extra as a default extra, using one of those unmaintained or undermaintained packages is probably going to double the size of packages installed, and with no way to fix this without forking all the packages (or astropy removing the recommended extra as a default extra).

2 Likes

Why do you think that though? It feels pretty self obvious to me that a “here’s a list of extras that are installed by default” means that… those dependencies are going to be installed by default. That doesn’t feel like some sort of subtle edge case that’s tricky to understand.

To use your astropy example, if astropy thought those dependencies were important enough to be installed by default for all of those unmaintained or under maintained packages today, they could just add them to the required dependencies even though they aren’t strictly required. If astropy did that, then not only would your worry about default extras also apply, but it would be even worse because you’d have to fork all of those packages and astropy to fix it.

Which isn’t a hypothetical, since we have documented examples of people over specifying their required dependencies to do just that!

Also, looking at astropy, their “recommended” extra only adds 3 dependencies?

# Recommended run-time dependencies to enable a lot of functionality within Astropy.
recommended = [
    "scipy>=1.13",
    "matplotlib>=3.8.4",
    "narwhals>=1.42.0",  # keep in sync with dependency-groups.dataframe
]

That hardly seems like it’s going to double the size of packages installed, and it looks like the extra that would be problematic to install by default is actually “all”, which includes a ton of stuff:

# Recommended run-time dependencies to enable a lot of functionality within Astropy.
recommended = [
    "scipy>=1.13",
    "matplotlib>=3.8.4",
    "narwhals>=1.42.0",  # keep in sync with dependency-groups.dataframe
]
# Optional IPython-related behavior is in many places in Astropy. IPython is a complex
# dependency that occasionally requires pinning one of it's upstream dependencies. If
# you are using Astropy from an IPython-dependent IDE, like Jupyter, this should enforce
# the minimum supported version of IPython.
ipython = [
    "ipython>=8.0.0",
]
jupyter = [  # these are optional dependencies for utils.console and table
    "astropy[ipython]",
    "ipywidgets>=7.7.3",
    "ipykernel>=6.16.0",
    "ipydatagrid>=1.1.13",
    # jupyter-core is a transitive dependency via ipykernel, we declare it as
    # a direct dependency in order to set a lower bound for oldest-deps testing
    "jupyter-core>=4.11.2",
    "pandas>=2.2.2",
]
# This is ALL the run-time optional dependencies.
all = [
    # Install grouped optional dependencies
    "astropy[recommended]",
    "astropy[ipython]",
    "astropy[jupyter]",
    # Install all remaining optional dependencies
    "certifi>=2022.6.15.1",
    "dask[dataframe]>=2024.8.0", # keep in sync with dependency-groups.dataframe
    "h5py>=3.11.0",
    "pyarrow>=16.0",
    "beautifulsoup4>=4.11.2", # imposed by pandas==2.2.2
    "html5lib>=1.1",
    "bleach>=3.2.1",
    "sortedcontainers>=2.1.0", # imposed by testing with hypothesis
    "pytz>=2016.10", # (older versions may work)
    "jplephem>=2.17.0",
    "mpmath>=1.2.1",
    "asdf-astropy>=0.7.0",
    "bottleneck>=1.4.0",
    "fsspec[http,s3]>=2023.4.0",  # keep in sync with dependency-groups.dataframe
    "s3fs>=2023.4.0",
    "uncompresspy>=0.4.0"
]

So TBH astropy feels to me like an example that package authors can be trusted to do reasonable things here, because even with “recommended” being fully opt in, astropy seems to have kept it pretty minimal and only included the optional stuff that gives you the most bang for your buck.

2 Likes

I just wanted to call this out, that I do think it’s totally reasonable to say the PEP should talk about the fact that there will be this bit of impedance mismatch between capabilities, and ideally provide some non-normative recommendations to those ecosystems. If we think the section that is already in the PEP is insufficient for that purpose, then we should figure out how to improve it :slight_smile: .

I was just calling it out because I agreed with Paul’s summary that there at least appeared to me to be a concern up thread that adding this feature would somehow make it impossible or really hard for downstream ecosystems to package these projects, and I don’t think that specific concern holds up in reality.

1 Like

Totally acceptable. Those companies who don’t want GPL stuff can then add feature requests or patch things if they want it faster. We just want to make it possible without compromising the average experience.

2 Likes

scipy + matplotlib pull in (legitimately) a bunch of additional packages, and doing a test showed that a minimal venv (with just astropy) doubled in size once matplotlib and scipy are installed (based on du -h before and after). While most interactive use will probably include one or both of them, there are many use cases where they not needed (e.g. parsing and validating FITS files). If you build/use dockerfiles you do start to notice the accumulating size of the images.

I can’t see as the PEP is written currently how it’s going to be possible to avoid dependency trees with unnecessary packages, because it’s not going to be possible to effectively opt-out of default extras. Dropping the dependency specifier change and instead allowing tools to use the default extras as they feel best (as poetry already does with bare names and specifying ranges) avoids that issue, while still enabling discovery.

…could be misused AND is superficially an extremely attractive option with large ramifications that aren’t obvious until it’s way too late (and sometimes never to the people who get to make that decision). That’s the key. Otherwise, yeah I’d agree anything can be misused.

We’re in an unusually privileged position with this proposal in that we don’t need to speculate how it’ll pan out. We’ve already got extras which bring a lot of the same issues. Rust, Debian, Fedora already demonstrate what recommended dependencies are like. The concerns we’re raising here aren’t guesses – they’re what’s already happened.

Adding “you may also want” dependencies willy-nilly? Try dnf install fedpkg in Fedora[1]. Try setting up any environment in a Ubuntu Docker container with and without --no-install-recommends, observe the differences in size[2] and see if you can figure why even one of those unused recommendations are recommended. Or just look at the “yes I can’t wait to do this!” comments in these threads despite the surrounding conversation being about precisely not doing that[3].

Giving packages an easy way out of good scoping? Take the astropy example from the PEP. If you’re looking for a minimal installation and are only using the bit of astropy that doesn’t use their recommended dependencies, are you really going to be content to carry the entirety of astropy? I know I wouldn’t.

I would contest any claims that just pick one is a good answer for repackagers. I have to make these calls fairly regularly. It’s horrible. It takes a lot of research and usually results in “hmm, bloated or broken? Well both options suck!”. The bloat that Conda has to accept is real. Yes, they accept it[4]. That doesn’t stop people from complaining how big and slow Conda gets. And It’ll be worse for Linux distros that are more likely to be used in a lambda server have much stronger focus on avoiding dead weight.

How would you know which packages chose not to scope creep because of lack of this feature? I know I tried to make that mistake several times in my early packages before I’d done enough deployment work to appreciate just how valuable an unambiguous dependency tree is. I’m so glad I was forced to split things up.

This one’s more about interchangeable dependencies than optional ones. If A depends on requests or httpx and then B depends on A and also needs an HTTP client then B has to make an awkward decision about preserving the requests/httpx optionality. It’s maybe not the end of the world if two HTTP packages end up being used redundantly provided the output of one never becomes an input to the other. It is the end of the world however if say multiple Qt variants are imported[5].


  1. A chain of dubious recommendations pulls in half a gigabyte of qemu binaries ↩︎

  2. usually around double – the network effect is strong ↩︎

  3. I’d say this rules out any chance that misuse can be prevented by documentation ↩︎

  4. not sure what alternative everyone thinks they have ↩︎

  5. it can crash Python ↩︎

Why do you think the ramifications aren’t obvious? “These dependencies are going to be installed by default, but some people could opt out of them” seems like it’s pretty obvious what the ramifications are-- most people are going to have them installed by default unless they go out of there way to not install them.

There’s basically no subtle issues there that I can see.

Well looking at rust, I almost never feel a need to turn off default extras except in particularly odd environments. Most of the real world Rust projects I’ve seen also do not typically feel a need to add a default-features = false to every dependency nor do I see most crates inappropiately adding default features.

Certainly there are a non-zero number of crates that have done that, but in my experience they are the exception not the rule.

You’re also ignoring the comments in those threads from authors saying if this were made available, they’d want to move some dependencies out of the list of required dependencies and into a default extra.

Why are you assuming that astropy[recommended] is going to carry the "entirety of astropy when it does not do that today?

Look at my post above, astropy[recommended] adds 3 dependencies, the extra that pulls in “the entirety of astropy” is astropy[all]. This is feeling like a bit of a straw man at this point? Unless I’ve missed it, there’s literally no evidence to suggest that astropy is likely to use default extras to pull in “all of astropy”. They don’t even do that today when the “recommended” extra is fully opt in, why would we assume they’re going to do that with PEP 771?

Surely, at a minimum, those linux distros could do the foo and foo-core split if they did not have the recommended functionality and are also focusing on avoiding dead weight-- and hey, with PEP 771 we now have some new metadata that makes their job easier for projects that are already over specifying their “required” dependencies to work around a lack of default extras.

We should not be making users lives worse to try and force people to “structure their projects better”, which isn’t even an objective quality. Small, tightly scoped projects is not an inherently better or worse thing. It’s an engineering and architectural decision that has trade offs, and whether or not those trade offs make sense is going to depend greatly on the context someone is operating in.

Honestly this line of rationale feels insulting to our package authors? It’s suggesting that if not for us refusing to add this feature, they’re going to be incapable of “correctly” (which I contest that’s the correct answer at all) refusing to engage in scope creep.

Also projects that are operating in a context where keeping their default installation footprint minimal, are perfectly capable of just not using this feature. Nobody is being forced to use it.

We should be trying to enable projects authors to work in the architectural style that best suits their projects wherever reasonable rather than trying to arbitrarily force one particular style that some subset of us think is best.

4 Likes

I’m mostly keeping out of this discussion (my original post simply restated Brénainn’s comments, I don’t have strong views on them personally), but I do have a comment on this.

The ramifications of a project adding a default extra, or of an end user directly installing a project with a default extra, are pretty simple. I don’t think anyone really feels that’s debatable. But where things do get complex is when projects with default extras get used in dependencies:

  • I honestly don’t remember what the PEP says about that case. I assume that a dependency of foo will include both the required and the default dependencies of foo, but I’d have to check the PEP to be sure - neither option feels self-evidently correct to me.
  • If I want the option that isn’t what you get by just depending on foo, I’d have to check the PEP to see how to get it. Do I depend on foo[], or foo[-], or something else? I understand that this is because I’ve read all the various iterations of the proposal, and once we have a standard, the syntax will be agreed. But again, nothing feels self-evidently obvious or intuitive.
  • Going beyond mere spelling, if I depend on the “minimal” version of foo, but expect that my users might want to use the default version, how do I warn my users that while installing my package looks like it means foo is installed, they should still do pip install foo if they want to use the “full” version of foo. Would pip install foo even work? Surely it would see that foo was installed and just say “nothing to do”? So what should I tell my users? Is there an explicit spelling for “install the default extras of foo if they aren’t present, even if foo is already installed”?

I’m not particularly invested in any of these questions. I doubt I’ll ever be in a position personally where I’ll need to think about default extras at this level[1]. But I don’t agree that the ramifications are always obvious.


  1. Especially as I will now probably be able to leave the decision on this PEP to the new packaging council :slightly_smiling_face: ↩︎

2 Likes

I would expect that the package itself adds a default dependency group depending upon foo and a required dependency on foo[-] (or whatever the spelling for “minimal” is standardised as).

Yes this then makes a viral requirement that projects need to make use of the default extra feature for their own project, but is that a bad thing? I don’t think so.

Either your project needs the recommended dependencies of foo (in which case you put foo in your required dependencies). Or your project only needs the minimal set of foo’s dependencies, but you also want to offer foo’s recommended defaults as your own recommended default (in which case you use the suggestion I gave in my first paragraph).

Does this seem as an acceptable solution?

If you’re asking me, all I’ll say is that it seems like a “non-obvious ramification” :slightly_smiling_face:

I’ll leave it to people who actually have to deal with this situation to work out what the best approach is.

1 Like

The PEP currently says that foo includes the required and the default dependencies in all contexts.

I don’t think there is an inherently correct answer, but I personally tend to agree with the PEP that foo should include the required and the default dependencies in all contexts.

  • I think it’s a much simpler thing to explain; that what foo installs is always the same, the set of things that foo thinks you should install by default, rather than trying to categorize “types” of install based on context and educate users when what type of context applies.
  • I think given that’s how “optional but installed by default” works in all of the other ecosystems/tooling I’m aware of, that would end up being a regular point of confusion for people coming from/going to other ecosystems/tools.
  • I think that projects that are currently over specifying their dependencies are unlikely to migrate their “required but actually optional” dependencies to use default extras if it means that all of the existing projects that already depend on them and get those extras by default are going to be broken and require foo[something] to fix themselves if they do.
    • I’ll concede that it likely means that some projects that currently are not over specifying their dependencies that might use default-extras if it only applied in certain contexts but not others. I think it’s somewhat unavoidable that there are some projects that could use this feature decide not to due to how we decide the answer to this question, whichever way we answer it. I think projects that want default-extras only in certain contexts is likely a smaller subset, and even defining what exactly those contexts are would exclude some projects.

      But overall, I think it’s better to get the projects that are currently over specifying their dependencies to be more accurate, than to exclude those projects to to try and make it more usable to projects that have already decided that “minimal by default” is an acceptable answer for their users.

  • Unless we add some syntax or shortcut for foo but with defaults, having foo be minimal would require projects manually copying over the list of default extras when they depend on foo if they want foo but with the default extras. That’s problematic because any syntax we add is either going to cause errors for older tools (if we pick a syntax option) or has potential for collision with existing extras (if we pick a “blessed” default name)-- that doesn’t mean we can’t do that, but the bar is higher for breakages like that IMO.

The PEP currently says foo[], which is currently valid and has some nice logic in the sense of “I’m specifying an empty set of extras”, and everything else just flows from the fact that extras are additive only.

The foo[-] example was @flying-sheep asking for a change so that you could ask for "defaults extras, minus some. I’m not a fan of that since it feels more special-case to me and I suspect will be confusing because it suggests that extras are not just additive, but they still are only additive, it’s just you can subtract a specific default extra-- but you can’t force it not to be installed.

pip install foo I think would work for this? It’d be good to call this out explicitly one way ro the other though!

I don’t think the documentation part is unique to this PEP, it exists for extras in general I think? Even without default extras, if I depend on the “minimal” version of foo, I have to communicate to my users they might want to get foo with some extras to be useful.

I don’t think it would. If I do pip install foo and foo is installed, nothing happens. And pip install --upgrade foo would upgrade to a new version, which could break the dependency. If the default dependency had a name, you could say pip install foo[default] and that might work[1] (except for the fact that it doesn’t have a name…). You could try pip install --reinstall foo. But that’s starting to get more advanced, and I wouldn’t want people to start cargo-culting the idea that “you should use --reinstall just in case” :slightly_frowning_face:

Yes, but it’s the fact that you can’t name the default extra that’s the issue here. For normal extras pip install foo[extra] works fine (I believe).

Someone should test this. But I think my point has already been adequately made - the fact that we’re even having this discussion demonstrates that there are “non-obvious ramifications” :slightly_smiling_face:


  1. I’d have to test - I honestly don’t know what pip does in that situation, as we don’t keep a record of what extras were installed ↩︎

That wouldn’t be true anymore. If you do pip install foo and foo has default extras, that means “make sure the environment contains foo and its default extras”

2 Likes