Adding a default extra_require environment

@steve.dower Would you be more in favor of extras + default_extra or the idea of environments and default_environment I proposed just above ?

The main advantage of adding default_extra or default_environment would be that it’s 100% backward compatible. We don’t change anything to already present public APIs. If you don’t use the feature, absolutely nothing will change.

And hopefully the change shall be very minor. When we read if the user asked for any extra we can say:

  • if None: read the default field (set to None by default)
  • else use what the user specified

@pf_moore @dustin @uranusjr any opinion ?

Well, I do like to keep things simple, and I don’t like overkill :slight_smile:

Basically, my view is that project developers (and I include myself) have a strong tendency to assume that “their” complexity is essential and unavoidable, and tools must clearly support their use case. However, Python packaging is struggling under the weight of the complexity we’ve accumulated over many years, and I strongly believe that what we need is simplification, not further complexity. We should be spending time looking at our core concepts and models, and considering how to make them simpler, better defined, and more manageable and composable, and not just loading more and more features on an ill-defined and adhoc base.

This is not to say that we can’t support additional use cases, or that we can’t grow, but we should do so by looking at our underlying model and questioning whether it really supports the needs of today’s community. That’s why I think we should be considering whether we "need something that’s conceptually slightly different than extras”, as I mentioned above. And it’s why I think that @steve.dower’s question, why can’t you ship two projects, is important here. Is it not possible that what you really need is for the tools and ecosystem to make it so that handling two projects is more manageable, rather than trying to “fix”

By pushing the complexity back onto the tools, you’re adding overhead to the packaging infrastructure, making it more fragile, harder to maintain, and more prone to failure. When projects like pip are maintained by a team of less than 10 people, all of whom are working on a totally volunteer basis in their spare time, that’s adding a huge risk to your projects. So looking at it that way, maintaining a single package also has an impact, and that impact isn’t just on your projects, but also on other projects which are competing with you for limited tool developer resource.

This is going more in the direction I’m thinking of - but I’m not sure if you’re intending it as a full replacement to extras, or just as a way of restating your proposal using different terminology. It will need a lot more thinking through if it’s to replace “extras” as a concept, so I reserve judgement on it until I see a full proposal (which would need to include a discussion of how we migrate existing projects off extras - just adding another approach to the mix won’t simplify anything). @steve.dower’s selector packages proposal is another idea that may be worth looking at here.

Why should we agree to that? Why not have project, project-lib, project-gui, project-cli, project-cpu, and users can install whichever parts of the functionality they want? From a user perspective, that seems far more straightforward to me. Of course, it’s more work for the project to structure itself to make the various parts intuitive for the user. And can we all agree that it’s making things simpler for users is the ultimate aim here? (That’s not rhetorical, this discussion is about making development easier, but it seems to me that we’re just changing which developers have to invest effort, rather than looking at what works best for the users).

Sorry - this got a bit beyond the scope of a single change to how extras work. But I do think the proposal exposes a lot of the issues

2 Likes

It was intended as a pun to be honest (notice the emoji) :wink: To be fair, I was expecting a remark on that one … In any case, I wasn’t implying the packaging mechanism may be more complex but more that someone may have a complex package with convoluted dependencies.

Agreed.

Honestly it sounds unrealistic to me. What if tomorrow I want to add whatever other default feature. Are we going to recommend: “Oh yeah sure, please create a combinatorial number of pip packages. That now you need to maintain properly and document. This is the way to go and definitely how you should advertise to your users…”. I gave this somehow realistic (not overly complexified) example to show that this approach doesn’t look realistic to me.
One of the reasons extra_dependencies was created at the beginning (which I might have misunderstood) is to avoid having to create N packages for each scenarios and allow users to specificy additional use-cases without installing a different package. So I agree that having a “default extra” sound kind of weird and maybe a re-branding is needed. However, following the approach please create N packages sounds even more against the original idea IMHO. And that wouldn’t solve at all this issue of the dependency chain:


Totally agree, I’m absolutely not trying to “impose” any requirement or something. I’m trying to highlight a number of limitations and trying to find with you all the best way we can address them.


If the decision was only on me, I would do a complete replacement of extra_requires by environments. And keep the extra_requires for a while in the API (backward compat.) but effectively makes it creates a set of environments. Then add the default_environment argument. IMO this approach has a few advantages:

  • Can easily be made 100% backward compatible: can be understood as a rename for the most part.

  • Quite simple to implement, as I just said, in principle it works as an arg rename.

  • We can issue a nice deprecation warning for a very long time if extra_requires is not None, and the fix is just dead simple: please replace 'extra_requires' by 'environments' in your setup() call.

  • default_environment makes a lot more sense “intuitively” compared to default_extra. Hopefully nicely addressing the “ergonomic” issue raised by @uranusjr and @dustin.

  • very simply & naively plug default_environment at the location where we read if the user asked for any extra (now called environment). If we read no extra / environment requested (aka. extra is None) then look if default_environment has been defined in the METADATA file. If so, use that value. If not keep None.

I don’t see how we could make it more simple to implement and maintain while keeping the number of changes as small as possible (thus minimizing risk to introduce a bug).

The other solution would be to create default_extra and only keep the last point (which might cause some ergonomic issues, but we can’t have it all, and is still a much better trade-off in my opinion than not being able to do this at all):

  • very simply & naively plug default_extra at the location where we read if the user asked for any extra. If we read no extra requested (aka. extra is None) then look if default_extra has been defined in the METADATA file. If so, use that value. If not keep None.

That sounds like you’re seeing this as simply a renaming, with the new feature “sneaked in” as default_environment. But I’m more asking about semantics, and just renaming doesn’t clarify semantics at all.

I’ve read your post and you make some good points. But I think the crux of my concern here is that we’ve ended up talking about implementation and maintenance (which is what I’ve been pushing for details on, so that’s on me) where in reality the big problem is that we need to start with a proposed change to the design spec, so we can see the implications in context.

And that’s where the real issue lies - there isn’t a design spec for extras. There are some places where we document how to reference them (in metadata, for example) but there’s no documentation of the concept, or how they should work¹ - at least not in https://packaging.python.org/specifications/ or as a PEP.

As a community, we’re trying to move away from implementation defined behaviour and towards standards. Extras are a particularly bad case of this, as they were originally implementation-defined by setuptools, and then pip added support for them, resulting in a second level of implementation defined behaviour. My point here is that we need to stop and write a spec before going further down this path. How should build tools other than setuptools implement extras? How about front ends other than pip? How would the new feature impact their design choices? These aren’t rhetorical questions.

(Sorry, I know you have a specific issue here, and weren’t looking for a big debate on principles², but this is the biggest problem with the current state of Python packaging, IMO, we need to get to a situation where we can look at focused proposals without having to consider the whole of the ecosystem every time…)

¹ At least, not that I’m aware of. If I missed something, please let me know!
² You want room 12A down the corridor, this is “abuse” :slight_smile:

3 Likes

Fair enough. In all honesty I was trying to understand which direction could have the most chance to converge. This work is a first timer to me in the Python “core”. And I’m really eager to learn and hopefully will contribute more globally in the future.

So let me go ahead and try to formalize things in a PEP. Let’s see how I manage that :slight_smile:
Let me a few days and I’ll come back to you.

If someone, is opened to a rapid “mentorship” to contributing to the Python core. I’ll be glad to have someone to ask a few questions :slight_smile:

Thanks everyone

1 Like

I just wanted to add another example showing why I was looking for this kind of feature.

Something similar came up here, but our GUI package Kivy supports various backends for text, image, video, etc. But, at least one backend must be installed for Kivy to work.

Currently, pip install kivy does not install any backends so kivy won’t work without additional steps. Previously we just told users to manually install the dependencies along with listing all our backend options (e.g. pip install kivy_deps.sdl2==x.y.z).

This led to many opened issues of failed installations over time due to user confusion. And this seemingly got worse in recent times because many users don’t read install docs and simply install Kivy graphically in PyCharm by searching for Kivy and then clicking install. Naturally this doesn’t install any dependencies.

My improvement was to add base and full keys in extras_require. When specified, base will install a set of per-platform dependencies that I judged a average user would want. So now we just tell users to install it with pip install kivy[base]. But this doesn’t solve e.g. the PyCharm problem.

So I was looking for a way to make pip install kivy “default” to installing base as well, but also a way to say pip install kivy --no-extras or pip install kivy[] so advanced users who want the install_requires dependencies but not the base dependencies can do that. If this doesn’t work out, perhaps we’ll just add our base dependencies to install_requires and have advanced users install kivy with --no-deps and then manually install the “real” install_requires. But ideally that would not be required.

3 Likes

I wonder if it would be significantly easier to have an extra == None/extras == '<none>'/extra == "", so that we can have the following in a setup.py:

setup(
    ...
    extras_require={
        "<none>": ["pymarkdown"],
        "PDF":  ["ReportLab>=1.2", "RXP"],
        "reST": ["docutils>=0.3"],
    },
    ...
)

I think this may be a much smaller PEP, and a much simpler change for build systems (setuptools, poetry et al).

1 Like

I wonder if anything has to be done at all. If people are not able to follow the simple instructions to install a library with the appropriate extras, how would they manage to write the Python code needed to use the library to begin with? I am always dumbfounded that such trivial tasks are an issue at all. (Which makes me think it might be a documentation issue, more than a packaging/tooling issue.)

It seems harsh, but I feel like at least some of the burden has to be delegated to the user at some point. I’m all for providing a nice user experience, but if the user is a Python developer (even a beginner) then I believe it is safe to put the bar for entry a bit higher than one would expect for an app store for example.

In the case of PyCharm, I have never used it, so a question: Does PyCharm proposes in its UI a list of the available extras and makes it part of its search-and-install wizard or whatever is being talked about here? Because that would be essential and I would consider it a serious UX issue if it weren’t there. Maybe the burden should be pushed on PyCharm here.

Also I couldn’t find the issue with pip freeze being addressed in the suggestions.

I think there’s a lot of people who use Python as a tool, but who don’t necessarily understand programming or things like package management. Data science is a good example here in my experience. There’s certainly people who know business intelligence, data analysis, and the like and who don’t really care that much about Python except as the tool that lets them do that.

Having said that, you do make a good point here. At some point we have to decide how much it’s down to us to make Python easy to use for people with less interest/experience, and how much we should expect them to meet us half way and try to learn the basics for themselves.

There’s definitely a good argument for expecting tools that make it easier for non-experts to use Python to do some work on simplifying the harder aspects of the process, and not just put a gloss on the easy bits.

Having said all of this, I do think there’s a case here that extras are complex and fiddly to use, and don’t model the problem that needs to be solved particularly well. So I do support the idea of revisiting them with the idea of improving things. I just don’t think “making it easier for users to get started without reading the instructions” is the correct goal - we should be looking at “providing a good model for how (modern!) projects want to deploy their functionality” and “having a consistent and understandable design” instead. Advanced features can be understandable and usable without dumbing down.

So I think “kivy is a core plus backends, and we want to allow users to select what backends to install but also provide a default set if the user doesn’t have a particular set they want” is a reasonable use case to explore. I don’t think “users don’t read install docs and expect to click and magically get exactly what they want” is (even if addressing the former incidentally improves the story for the latter).

But there are “advanced” questions that should be answered, relevant to the “deployment model” scenario, that don’t matter to the “click to install” scenario. For example, how does someone install just the core project, without any backends (maybe it’s a backend developer)? Or how do developers install sets of backends that don’t match up with a predefined “extra”? Can users define their own bundles to install (maybe for common use throughout an organisation), or are they restricted to just what the project defines?

All of this goes way beyond the original idea of “add a default extra”. And that’s intentional, in a way - as I said before, there’s not really a well-defined underlying model for extras, so it’s not clear how to answer any of these questions. Answering just one, the “we need a default” situation, leaves implementations having to figure out interactions like this on their own.

2 Likes

I work with some of the best engineers in the world, and packaging (particularly Python) trips everyone up regularly. Doesn’t stop them from writing code. There’s a very real complexity imbalance here.

Most often it seems to be due to referring to online search results and hearsay (blogs, tweets) rather than project docs. (As one concrete but unrelated example, I often surprise people by showing them the official “how to run Python” page on docs.python.org, because everyone “out there” talks about how to do it but nobody refers people to the real docs.) I often get to see Frankenstein combinations of three different “best practices” from different blog posts/StackOverflow. Sometimes people find the real docs and are very successful.

So people can follow simple instructions, but we’re in a culture where people don’t even look for them. The best we can do in that case is fail well and include links to the real docs in error messages (see numpy’s ImportError for a very good example).

As a tool developer myself, I don’t think we (as in Python/PyPA) get to push the burden there, though we can encourage them that it would be in the best interests of their users.

Unfortunately, you have to download the package to see the extras. Downloading every package to fill in that UI is not in anyone’s best interest, so it seems unlikely. We could help by improving index metadata, though that comes with vast complexity and cannot be assumed by tools, but projects could help themselves sooner by handling the “no backends” case with a good error, and having a clear installation command at the top of their package long description (a.k.a. Readme file).

My overall impression (and others seem to somewhat concur) is that a large portion of the use cases for a default extra can be solved with better documentation, and better error messages when dependencies are missing. I don’t see a way to add such a default extra while still having the pip freeze > req.txt; pip install -r req.txt workflow behave as expected.

In order to provide the nice out of the box experience for first-time users, projects might want to have a top level project MyThing that is basically just an empty thing (no code) that depends on MyThingCore[backend_default] and nothing more (not even a version range). And returning users, who are willing to put the effort and actually read installation instructions would naturally move to installing MyThingCore[backend_gpu_accelerated,bells,whistles]>=1.2.3, once they feel the need to go beyond what the default installation provides.

For more advanced needs (which I agree they might be legitimate), seems like it has to be something else than extras and for a possible solution (or at least inspiration) I would like to draw attention again on the ongoing work being done in poetry (they seem to be seriously going for it since it’s on their roadmap):

Aside:

True. That could be partly covered by this proposition (which could also reduce – at least slightly – the urgency of specifying source distribution file names, and maybe more):

True. And the history of Python packaging means that the internet is full of outdated and inaccurate information, which makes the issue worse (for both us and the end users). I’m absolutely 100% in agreement that we should fail in an informative and helpful way when we hit an issue. And I can’t speak for other tools, but pip is bad at doing that at the moment.

But that’s not what we’re talking about here - we’re talking about people who want to install X and when they read the documentation for X and it says “do A, B, C” then they don’t do that. Again, to be very clear, I’m fine with people taking that approach (I’m prone to doing it myself - “I don’t want to install pipenv globally, so I’ll do my own hack to do what I want”). But it comes with a responsibility - if you break it, you have to fix it yourself (or accept your approach is wrong and use the instructions as written). Personally, I’m arguing that we don’t optimise for end users who do their own thing and aren’t willing to put effort into working with us, not that we set any sort of barrier to entry based on knowledge/understanding alone.

True, and an important point. But unless the PyCharm (or VS Code, or whoever else) developers engage with us, and explain the constraints they are working under, we can’t know that. When I said “expecting tools that make it easier for non-experts to use Python to do some work on simplifying the harder aspects of the process”, I was trying to say that we get them to give us design input, not that we do nothing and make them do all the work.

2 Likes

FYI the Python extension for VS Code is very engaged here. :wink:

2 Likes

I know. Sorry, re-reading my post, I realise it gave the impression I thought otherwise. That wasn’t my intention. (The way VS code is represented is ideal, IMO - you’re engaged with the processes, without pushing a specific “VS Code needs this” agenda. We need more communities giving input on that sort of basis - or maybe we have them, it’s by definition hard to tell :slightly_smiling_face:)

1 Like

@pf_moore would you be available to help me start that PEP (if not could you point me to someone who could help) ? I’m completely lost in the process …

I found this: https://www.pypa.io/en/latest/specifications/#proposing-new-specifications but honestly it’s quite confusing and not really helpful.

I try to understand:

  • How to obtain a PEP number ?
  • Where should I go to open the PEP ?
  • Is the PR to " Python Packaging User Guide repository" the PEP itself ? Or it’s a completely different process and shall be done once we have reached consensus in the PEP ?
  • Is there a template I can follow ?

If you or someone could help, I would truly appreciate: contact@jonathandekhtiar.eu

I’m available over emails / phone / slack / webex / google hangout / IRC :wink:

https://www.python.org/dev/peps/pep-0001/ outlines the process for a PEP.

Once you have posted a draft here that people are happy with you submit a PR to https://github.com/python/peps and we will assign you a number.

See above.

This part I have never understood the point of (the “PEP + packaging.python.org” part, not why the specs are kept where they are part), so I can’t help answering the question.

https://www.python.org/dev/peps/pep-0012/ for PEPs.

Sorry, I’m snowed under with other responsibilities at the moment, so I can’t offer to help.

One of the requirements for submitting a PEP is to find a “sponsor”, specifically to assist with guiding the author through the process. So I think you should definitely look for someone to sponsor you here (or maybe even a co-author who’s more experienced with Python packaging who can help).

@jonathandekhtiar I’m happy to help, stick some time on my calendar here and we can have a chat.

For whatever it’s worth, here’s my suggestions:

  • Add a new field to the metadata, Default-Extras or something, which is basically just a list of extras that will be selected by default.
  • Extend PEP 508 to support negating an extra, which would allow people to unselect one or more of the default extras.
    • This will require defining what happens if you have explicit opt in and opt out of some named dependency.

That’s all I would really do, I dislike all of the attempts to shuffle some implicit extra in by not having a name for it or whatever.

2 Likes

FWIW, here is a use case that I have. I’ll be concrete, but hopefully not so detailed as to be distracting.

The xarray package has two dependencies: numpy and pandas. My package uses numpy, and could use some features of xarray that don’t require pandas, but I don’t want to pull pandas into my dependency tree. (I’m willing to accept ImportErrors if my code accesses pandas-dependent features.)

As it stands, I would either need to convince them to break out the pandas-independent concepts into a separate package (plausible, but a nontrivial effort), or move the pandas dependency into an extra and make all packages depending on that functionality install xarray[pandas] (non-starter).

If I could submit a PR to make xarray[minimal] or xarray[!pandas] a way to just get xarray and numpy, the maintenance burden on xarray would be substantially lower. It would also only (okay, mostly) be used by downstream packages like mine, that understand the risk they’re taking to not pull in all dependencies and to watch upstream to ensure that functionality is maintained with the reduced dependency set.