This puts more burden on the tools since they have to implement this rule, but I’d rather put the complication on them rather than the vast number of package consumers.
Today, package consumers face a huge burden. They need to decide which tool to use, what dependencies to choose, how to install for different operating systems, and determine whether to pin or float dependencies. These are real problems for sophisticated users as well as novices.
The burden becomes even heavier when using Qt or other libraries with multiple packagers and versions. This is a burden on maintainers in creating their package but also in responding to support questions from users.
I would love to see this conversation move away from Accept or Reject toward how do we improve the proposed PEP. Right now, this is the reasonable option on the table. Let’s improve this option.
I want to stress that this goes beyond installing “nice to have” dependencies. It affects the required dependencies, like Qt, that come in different flavors.
Maybe. That’s my point, in some sense - there are no easy “one size fits all” answers. I don’t know anything about the astropy ecosystem, so I don’t know what the default extras would be or what the impacts are. If the users with the “higher level of expertise” can solve the problems caused for them by the change, then maybe that’s OK. But maybe they won’t be able to (for example, retrofitting [] onto dependencies in old versions of packages). The astropy maintainers could quite reasonably choose not to care about those cases, but the maintainers of those packages might have to.
Making a better experience for new users is a key incentive, but it shouldn’t be the only incentive.
Does it, though? Doesn’t that depend on what the package developers decided to make default? It’s not inconceivable that a project might design their defaults for a small but crucial minority of their users.
To be clear, I’m not saying this is likely. Just that there are a lot of hard trade-offs, and the proposed default extras feature doesn’t make any of the decisions easier - it just makes one existing option simpler (without there being any clear indication that it’s the best option).
Will this be a disaster? No. Will it help some projects? Yes. Is it the right solution in the long term? I don’t know. Do we need to solve this problem right now? I haven’t seen any real evidence that it’s that urgent.
(As something of a side issue, there’s a bunch of work going on “behind the scenes” at the moment - the wheel-next discussions, selector packages, etc. None of which I know anything about as they are all going on in other forums, and I have enough trouble just keeping up with what’s here. But will they offer alternatives here, or affect the use cases this PEP is targetting?)
That’s a possibility, yes. But it also puts a lot more burden on the PEP, to define what this would mean precisely. The concept of a “direct install” isn’t part of any standard at the moment, so we’d need a proper definition of that. But it does avoid a lot of unnecessary complications, as there’s actually no clear use case for default extras in dependency lists…
Lock files should be installed using the equivalent of --no-deps, so extras and dependencies are irrelevant in that situation, luckily.
Strong +1. Even if it does get rejected, I’d like us to learn something from the process, and that only happens if we try to improve this proposal or explore the faults in detail.
Here’s an idea for spec rules which match @barry’s suggestion for direct install specialization without running afoul of the fact that there is not a strict definition for direct installs:
the default extra is defined in pyproject and core metadata
installers/resolvers MAY include the default extra when installing a package
tools MUST NOT include the default extra when [] is used
everything else is tool choice
Plus recommend that tools include the default extra when a package is explicitly requested, but not otherwise.
This lets us work with the idea that this only applies for “direct installs” without even trying to define them.
For example, should the default extra be included when poetry add $pkg is used? We’d leave that decision up to poetry.
Please, no. Allowing such a divergence between tools would be terrible for users, especially for those who aren’t packaging experts. I can’t imagine the confusion users would face if simply changing their installer became a breaking change, forcing me to put separate instructions in my README for multiple popular installers.
I’m currently a strong +1 on the PEP as is but I would be against the PEP if it allowed installers to significantly diverge on what $tool (add|install) pkg does.
To be clear, poetry add isn’t an installation command. poetry add adds a dependency to your current project.
So that’s not installation at all. The current PEP doesn’t say what it should do, which is good. It should not make decisions unnecessarily for tools. This kind of decision making is Poetry’s prerogative.
In fact, this is part of why I’m preeeeeettty uncomfortable with how much we’re talking about pip and pip freeze. pip maintainers get to decide what pip freeze does – the PEP’s commentary should be limited to “we’ve thought about this and confirmed that there is at least one sensible option here” and at most “here’s a recommendation for pip and similar tools”.
I agree that what pip freeze does should be for pip to decide. But that’s not the same as leaving tools to decide what “install foo” means (whether that’s spelled pip install foo, or uv pip install foo, or something else).
I don’t actually think that the concept of “directly installing” a package is hard to define. I just think that it’s something the PEP needs to do, if we want to take this approach. What’s likely to be the hard part is going through all of the possible ways in which a direct install could be requested, and deciding if we really do want the same behaviour for all of them. In particular, requirement files (which pip would view as a “direct install”) are a very general mechanism, used in many different ways - is starting to install default extras the right choice for all of those use cases? I certainly don’t know.
How would my example command, poetry add fall into this? It’s not doing installation at all. Would such behaviors be specified as well?
Installation is only a (highly significant!) part of how users interact with package names.
If we specify what happens when you install, we’re still letting workflow tools make all kinds of decisions – and they can align their behaviors more or less closely with installation as they prefer.
So, would we only be describing behavior for a “pure installer” like pip and uv-pip?
I agree 100%. For me, what’s important is to accept that the PEP needs work. I don’t think it’s about accept/reject, so much as “the PEP isn’t ready yet, we need to continue working on it and improving it”. I do want to remain open to the possibility that some other option might end up being the right solution, though. We mustn’t fall into the mindset of “we need to make default extras work because they are the only option on the table”. What matters is getting the best solution, no matter what form it takes.
I’m not sure what you mean by this - could you explain? The best guess I can make is that you mean a package requiring one of a set of dependencies, but with the constraint that “selecting none” isn’t an option (that’s what I infer from the term “required dependencies”)? The PEP as it stands doesn’t allow that, precisely because the pkg[] notation is always available. In fact, I thought pkg[] was added precisely because the consensus was that we shouldn’t allow projects to insist on some extras being installed (that goes against the fact that they are “extra”).
If we want to support options for required dependencies, IMO that’s something we should look at explicitly, and not try to force it into the “extras” model. Which may mean refocusing the PEP away from being designed around extras…
I want to elaborate a little more on my earlier statement. I’m a strong +1 on this PEP as is. This PEP truly solves a real issue I face. I have a few packages that effectively have two “interfaces”:
The library for other developers: This is for programmatic use.
The CLI for users: This is the command-line interface, and it needs additional packages that are completely unnecessary for developers who are just using my package as a library.
Currently, I’m stuck with a few less-than-ideal options:
Force developers to pull in junk: I could force developers using my library to pull in completely unrelated dependencies that they’ll never use. This makes them grumpy, it makes me grumpy, and it makes my package less appealing to developers who rightly don’t want unused dependencies (and I can’t blame them – I wouldn’t want any either!).
Push the problem to users: I could put the CLI dependencies in an “extra” and document it for my users, but frankly, users do not want to deal with any of this. Then I’d have to write code so a ModuleNotFoundError tells them to install it again, but this time with the CLI extra. I’d also have to explain to them that just copying the big “pip install pkg” from pypi.org and pasting it into their terminal will result in a broken install.
Double my workload and risk: I could take upon the burden myself and publish two packages, build two pipelines, version two packages independently while also trying to keep them synced. This is just a ton of duplicate work with twice the room for error.
Since I deem options 1 and 3 to be completely unreasonable, the second option is what I currently do. This means I’m pushing the complexities of Python packaging to the demographic that’s the least equipped to deal with it.
I expect the developers using my package will either:
Be mindful of their dependencies: They’ll think about their dependencies and easily add pkg[] to turn default extras off.
Not care about extras: They simply won’t care, so they’ll just do $tool add mypkg and still get a working install, living with the unused dependencies.
While this PEP cannot remove all the complexities, I truly think it’s moving them to developers who are far better equipped to handle it.
I don’t know if you’re asking me specifically, but I’m not a Poetry user, so I don’t know how poetry add should work.
Maybe that means I’m wrong in thinking that “directly installing” isn’t hard to define. But I stand by my position that the PEP should define what it means if it wants to describe how default extras work in terms of direct installs.
Can you elaborate on why (3) is so strongly objectionable to you?
I have projects which are split into the Library + CLI division and it works quite well, so I don’t think your presentation of how much extra work it is can be accurate – either that, or I’m not doing a boatload of work that I should be doing.
It’s not right to call it a “ton of duplicate work” or “double” or “twice” anything. It is more but let us be precise.
The added burdens:
You have to setup a second build pipeline – the more packages you maintain, the more marginal the cost of “yet another one”. We’re talking about a minimum of going from 1 to 2.
When doing releases, you have to release a separate package for your library, then update the lower bound in your CLI to the newly released version, re-test, and release that as well.[1]
Handling of external dependencies (primarily Python version) needs coordination. The library always precedes the CLI for support changes, etc.
You need to maintain two separate changelogs and sets of docs, potentially two RTD sites or similar. Primarily this is a setup cost.
Occasionally you have to move tickets from the CLI to the library. Almost never the reverse.
It’s additional labor, but your phrasing is so strong that you make it sound like it’s 2x the work.
The bigger challenge here than the maintenance cost of “two packages rather than one” is the switching cost. Once you have a package and people are using it, changing to two packages is going to inflict pain on someone – a redistributor or an end user or a downstream package maintainer. And splitting a package is a messy process that takes a lot of time and effort for a maintainer.
Looking at it this way, default extras would be a way of giving packages which have some adoption a new tool for changing their interface while still being only one package. So we do away with the switching cost almost entirely – it’s reduced down to putting out a migration doc for your consumers.
I was curious about your opinion in particular, but also other folks should feel free to answer.
The main thing about poetry add that is relevant is that it does the following two steps:
add the requested package to project.dependencies
update the lockfile (poetry.lock)
Neither of those is installation. We can still get to a pretty straightforward definition of “direct installing”, but poetry add establishes that there are usages which don’t match it but still should think about the default extra.
Here’s another fun puzzle for “should it include the default extra”: script metadata. If I write
# /// script
# dependencies = ["astropy"]
# ///
and invoke the script with pip-run, pipx run, and uv run, it would be good to ensure that all three get the same packages installed. Presumably we’d define that as a “direct install” situation for astropy?
But if I move that same data into a pyproject.toml and write
[project]
dependencies = ["astropy"]
and pip install ., pipx install ., or uv tool install ., then the source tree in . is presumably the “direct install” target? So astropy would no longer be a direct install?
This might be fine, but if we go down this path it’s going to be confusing for users in some situations. That’s why I’m not sure we gain very much by specifying exactly when an installer should or should not install the default extras for a package. Even with the same tool, similar usages can produce differing results.
We actually pin the exact library version in one case at $WORK, where we want to do our best to guarantee that old CLI versions can install and work. ↩︎
The primary aim of this PEP has been to reduce burden on novice/typical users and I think that the idea of having installers sometimes install recommended dependencies and sometimes not (without any explicit opting in or out) is not really going to achieve this. Imagine the following scenario (similar to what @sirosen is mentioning):
User writes a script, which requires say astropy (since we’ve been using this as an example package here). The script relies on some of the functionality that needs some of the recommended optional dependencies. The user installs astropy into their environment with:
pip install astropy
and runs their script. So far so good. Now they decide to try and run their script on an online cloud platform. The documentation for the cloud platform mentions the existence of requirements.txt files. Great! The user creates a file and puts astropy in it, and pushes to the cloud. If pip does not consider requirements.txt to be a direct install, then the script will not run. There will be no way for the user to easily know what is missing, and even if they did realize that they might need to add an extra (which they might not even be aware of is a thing at first) they would then need to poke around the astropy documentation to find out the name of the extra that is missing. In a real situation it could be a lot worse, because the user might be using 10 different packages, some of which use default extras, and some of which don’t.
Ok, so let’s consider requirements.txt as a direct install to avoid this kind of issue. But what if instead someone tells the user: wow your script is great, could you make a package with it? The user will then look up some information about how to make packages, and will then write up a simple pyproject.toml file and put astropy in the dependencies. But now, when the package is installed into a new environment, it doesn’t work. Again, lots of searching around for solutions and reading of documentation to find out about extras.
Even if we could define direct installation as a thing (which I don’t think we really can), I don’t think it’s helpful, because it’s not as if writing requirements.txt files or simple packages is something that is reserved for advanced users, and having context-dependent behavior is going to have a big (negative) impact on users who are new to packaging. I think the behavior should be consistent (by default) in any context in which PEP 508 syntax is used.
Pip has a pretty clear definition of a “top-level requirement”, which is what I’d expect a direct install to mean. It’s currently pip-specific, and would need extending for things like poetry add, as @sirosen pointed out, but I don’t think it’s impossible to do.
It is extra work, though, and as the PEP author, I can understand if you don’t want to do that work. You do need to work out an alternative way of solving the problem that the direct install idea was introduced to solve, though.
To be clear, I don’t mind extra work as a PEP author
However, I guess what I am trying to get to is that even if we can come up with a general definition that matches pip’s concept of a ‘direct install’, I’m not sure that it is necessarily desirable to have default extras behave differently for direct and indirect installs. I think that distinction will likely be lost on typical users (for example in the case I mentioned of a user wanting to make a simple package) and will lead to a lot of confusion.
Understood. But in that case, how do you intend to handle the issue where packages currently depend on astropy (no version pin, no extras) and astropy wants to add a default extra? Doing so would be a breaking change for all those already-published versions of that package. Possibly just “extra clutter” with the new default extras getting installed, but possibly actually breaking, if some other dependency is incompatible with one of the new default extras.
How do you intend to advise projects like astropy, who may be considering adding default extras, in terms of how they should evaluate the potential risks and benefits of adding a default extra given that cases like this might exist?
Just to make sure I understand correctly, how would this be different to astropy just adding a new required dependency which is incompatible with another dependency of the downstream package that depends on astropy?
My advice would be that projects should consider them with the same rigor as if they were adding a required dependency, since the default immediate result will be effectively the same until people start opting out.
Technically, no different. In practice, though, the difference is that users (and the astropy developers) may view the default extra (which presumably already exists, but has to be specified manually) as a “normal dependency”, and may not think of switching it to be “on by default” as a big deal. Conversely, projects that depended on just astropy might not have realised that they were requesting a “stripped down” version.
For a brand new project, using a default extra might be (probably is!) a perfectly reasonable option. But for existing projects, it’s just as much a potentially breaking change as adding a new required dependency. And that’s not immediately apparent - indeed, just by presenting default extras as a solution for projects that currently use a named default extra, you’re downplaying that risk[1]. Also, the “backward compatibility” section of the PEP doesn’t mention this risk - again implying that it’s easy to not notice the issue until it’s too late.
Maybe this isn’t as important as I’m suggesting - my experience with pip and Python core development has given me an extremely conservative view of backward compatibility - but I don’t feel like I’m the only one with concerns like these, so I think we do need to see some consensus here if it’s the approach we want to take.
I’ll try not to continue making an issue of this, as I’ve said all I wanted to say at this point. But that doesn’t mean I’ve accepted that the PEP’s approach is sufficient, it simply means I’m waiting to see what the community consensus is
Not deliberately - I’m simply saying that the PEP is falling into the same trap of thinking this solves the problem easily that users will fall into ↩︎
I know this is a joke but I want to emphasise that this (both the workload and the blame games) affects more than PyPA. Try [project.dont_blame_pypa_or_brew_or_linux_distros_or_any_other_packagers_or_repackagers_or_end_user_packaging_also_since_this_doesnt_translate_to_repackagers_it_is_unusable_for_anything_likely_to_be_installed_via_conda_or_system_package_managers_so_dont_use_for_webservers_or_any_domain_with_a_significant_proportion_of_conda_users.default-optional-dependencies].
Does it matter if it’s only the major ones? What proportion of your dependency trees would you consider so major as to be above packaging mishaps [1]? How many packages with extras do you know that test both with and without the extras (I know one and I’m the one who insisted on putting the test there!)?
I don’t understand why the answer to interchangeable dependencies isn’t just a runtime check?
No install time guess work required and is compatible with every package management system I know of . Better yet, if you can make the backend selection explicit in the way the library is used (e.g. make the user type from qt_agnostic_library.pyqt5 import ThingViewer) then you can not only make the error more precise but also greatly reduce the reproducibility, mutual exclusivity and testing pains I wrote about before [2].
Is this really such a burden to users? They have to run a command that they probably just ran <30 seconds ago but with [cli] appended.
Well lets propose to make this message customisable or list all options or possibly even remove it if the decision is sufficiently nuanced to need a proper explanation?
This is getting heavily into perspective driven territory but if keeping the two in sync is anything more than a minimum version constraint and a test run on that minimum version then I’d say that the library has stability/usability issues that prevent it from really being a suitable library. I actually find this splitting process improves the library since it forces you to really see the usability of the library’s public API from a consumer’s perspective.
Specifically for this go-to astropy example [3], I’m not an astropy user but I have had to work out packaging-themed issues within astropy based projects on behalf or real astropy users. What really strikes me is that the issues all stem from the unfortunate choice to stuff an entire domain of science into one single PyPI package. You can get a sense by looking in their API table of contents at the huge range of functionality for core data structures, specific (independent) types of calculations or analysis, visualisations as well as IO for umpteen different file formats. This results in the awkward dependency situation but it also means that the installation footprint is 40MB without even including the dependencies which the user has to pay for even if they only want to do one thing.
This monopackage pattern is something I desperately want to see less of. It leads to these dependency issues. It encourages people to do crazy things like rm -rf -ing bits of site-packages[4]. Such packages almost certainly increase in size with each release so I even see people deliberately try to lock themselves to as out-of-date-as-possible versions just to get their deployments sizes down. It makes the contributor side miserable since you have to read a book, learn about some build system then run an insanely long build+test [5] just to submit a patch that was quite likely in a pure Python part of the code base. Splitting astropy into a tree of single-function packages would solve every single one of these issues. I know modularisation comes with extra baggage (version management [6], cross project documentation and navigation, occasionally duplicating helpers, the time it takes to do the split itself) but it brings so many benefits whereas this PEP can only solve (or rather hide) the dependencies issue from beginners (at the expense of making it worse for everyone else).
I was wary of touching this because I was worried that it would read this as a dig at astropy. I promise that it isn’t intended that way ↩︎
Yes, I really saw this happen. The issues it caused got quite a long way from the offending developer due to heavy usage of lazy importing in libraries (also an unsavoury side effect of monopackages) ↩︎
think it was about 48 hours last time I ran scipy’s tests, not something I’d be keen to come back to ↩︎
lower version bounds testing solves this much easier than you’d expect ↩︎
default-optional-dependencies definitely reads better to me, too (and I think the benefit of matching the table name is significant enough for that to be preferable to the shorter default-extras)
I’m in the same boat as @Monarch. For apps that natively have a CLI, I don’t want to make rich a required dependency (since you don’t need it for JSON-centric programmatic use), but I do want to recommend installing it for interactive use. For libraries with an optional CLI, I’d like to make it easy to exclude the CLI-only dependencies entirely, but have it work by default when people are setting up a local development environment.
With PEP 771, that’s straightforward to do (define a rich-cli extra and include it as a default extra, splitting it into cli/rich-cli in the “also usable as a plain library” case).
Right now, I have to settle for defining the extras and hoping that people look at the documentation long enough to learn how to opt-in to the nicer experience.
As far as the backwards compatibility concern goes I’m with @trobitaille on this one: yes, old applications may start getting additional transitive dependencies installed if projects they depend on mark some of their existing extras as default extras (and the deployed dependencies aren’t locked or constrained by an installer feature like UV_EXCLUDE_NEWER). This can already happen without PEP 771, since new versions of existing dependencies can add new required dependencies (and sometimes those will be dependencies that were previously only brought in as optional dependencies). If PEP 771 means that more optional dependencies get added as default extras instead of as required dependencies, that’s still strictly better than the status quo (since, unlike required dependencies, affected projects that don’t want the added transitive dependencies will have a way to opt out while still updating to the newer version of the library itself).
As others have noted, having the meaning of an unqualified package dependency differ based on whether it was “direct” or “indirect” would be adding significant complexity and opportunities for confusion as people wonder why pip install projectA projectB works, but making projectA depend on projectB and then doing pip install projectA fails (due to projectA relying on a transitive dependency brought in via projectB’s default extras). Those kinds of “missing dependency declaration” issues can already be hard enough to debug without adding that potential complication to the mix.