Adding a default extra_require environment

I wonder if anything has to be done at all. If people are not able to follow the simple instructions to install a library with the appropriate extras, how would they manage to write the Python code needed to use the library to begin with? I am always dumbfounded that such trivial tasks are an issue at all. (Which makes me think it might be a documentation issue, more than a packaging/tooling issue.)

It seems harsh, but I feel like at least some of the burden has to be delegated to the user at some point. I’m all for providing a nice user experience, but if the user is a Python developer (even a beginner) then I believe it is safe to put the bar for entry a bit higher than one would expect for an app store for example.

In the case of PyCharm, I have never used it, so a question: Does PyCharm proposes in its UI a list of the available extras and makes it part of its search-and-install wizard or whatever is being talked about here? Because that would be essential and I would consider it a serious UX issue if it weren’t there. Maybe the burden should be pushed on PyCharm here.

Also I couldn’t find the issue with pip freeze being addressed in the suggestions.

I think there’s a lot of people who use Python as a tool, but who don’t necessarily understand programming or things like package management. Data science is a good example here in my experience. There’s certainly people who know business intelligence, data analysis, and the like and who don’t really care that much about Python except as the tool that lets them do that.

Having said that, you do make a good point here. At some point we have to decide how much it’s down to us to make Python easy to use for people with less interest/experience, and how much we should expect them to meet us half way and try to learn the basics for themselves.

There’s definitely a good argument for expecting tools that make it easier for non-experts to use Python to do some work on simplifying the harder aspects of the process, and not just put a gloss on the easy bits.

Having said all of this, I do think there’s a case here that extras are complex and fiddly to use, and don’t model the problem that needs to be solved particularly well. So I do support the idea of revisiting them with the idea of improving things. I just don’t think “making it easier for users to get started without reading the instructions” is the correct goal - we should be looking at “providing a good model for how (modern!) projects want to deploy their functionality” and “having a consistent and understandable design” instead. Advanced features can be understandable and usable without dumbing down.

So I think “kivy is a core plus backends, and we want to allow users to select what backends to install but also provide a default set if the user doesn’t have a particular set they want” is a reasonable use case to explore. I don’t think “users don’t read install docs and expect to click and magically get exactly what they want” is (even if addressing the former incidentally improves the story for the latter).

But there are “advanced” questions that should be answered, relevant to the “deployment model” scenario, that don’t matter to the “click to install” scenario. For example, how does someone install just the core project, without any backends (maybe it’s a backend developer)? Or how do developers install sets of backends that don’t match up with a predefined “extra”? Can users define their own bundles to install (maybe for common use throughout an organisation), or are they restricted to just what the project defines?

All of this goes way beyond the original idea of “add a default extra”. And that’s intentional, in a way - as I said before, there’s not really a well-defined underlying model for extras, so it’s not clear how to answer any of these questions. Answering just one, the “we need a default” situation, leaves implementations having to figure out interactions like this on their own.

2 Likes

I work with some of the best engineers in the world, and packaging (particularly Python) trips everyone up regularly. Doesn’t stop them from writing code. There’s a very real complexity imbalance here.

Most often it seems to be due to referring to online search results and hearsay (blogs, tweets) rather than project docs. (As one concrete but unrelated example, I often surprise people by showing them the official “how to run Python” page on docs.python.org, because everyone “out there” talks about how to do it but nobody refers people to the real docs.) I often get to see Frankenstein combinations of three different “best practices” from different blog posts/StackOverflow. Sometimes people find the real docs and are very successful.

So people can follow simple instructions, but we’re in a culture where people don’t even look for them. The best we can do in that case is fail well and include links to the real docs in error messages (see numpy’s ImportError for a very good example).

As a tool developer myself, I don’t think we (as in Python/PyPA) get to push the burden there, though we can encourage them that it would be in the best interests of their users.

Unfortunately, you have to download the package to see the extras. Downloading every package to fill in that UI is not in anyone’s best interest, so it seems unlikely. We could help by improving index metadata, though that comes with vast complexity and cannot be assumed by tools, but projects could help themselves sooner by handling the “no backends” case with a good error, and having a clear installation command at the top of their package long description (a.k.a. Readme file).

My overall impression (and others seem to somewhat concur) is that a large portion of the use cases for a default extra can be solved with better documentation, and better error messages when dependencies are missing. I don’t see a way to add such a default extra while still having the pip freeze > req.txt; pip install -r req.txt workflow behave as expected.

In order to provide the nice out of the box experience for first-time users, projects might want to have a top level project MyThing that is basically just an empty thing (no code) that depends on MyThingCore[backend_default] and nothing more (not even a version range). And returning users, who are willing to put the effort and actually read installation instructions would naturally move to installing MyThingCore[backend_gpu_accelerated,bells,whistles]>=1.2.3, once they feel the need to go beyond what the default installation provides.

For more advanced needs (which I agree they might be legitimate), seems like it has to be something else than extras and for a possible solution (or at least inspiration) I would like to draw attention again on the ongoing work being done in poetry (they seem to be seriously going for it since it’s on their roadmap):

Aside:

True. That could be partly covered by this proposition (which could also reduce – at least slightly – the urgency of specifying source distribution file names, and maybe more):

True. And the history of Python packaging means that the internet is full of outdated and inaccurate information, which makes the issue worse (for both us and the end users). I’m absolutely 100% in agreement that we should fail in an informative and helpful way when we hit an issue. And I can’t speak for other tools, but pip is bad at doing that at the moment.

But that’s not what we’re talking about here - we’re talking about people who want to install X and when they read the documentation for X and it says “do A, B, C” then they don’t do that. Again, to be very clear, I’m fine with people taking that approach (I’m prone to doing it myself - “I don’t want to install pipenv globally, so I’ll do my own hack to do what I want”). But it comes with a responsibility - if you break it, you have to fix it yourself (or accept your approach is wrong and use the instructions as written). Personally, I’m arguing that we don’t optimise for end users who do their own thing and aren’t willing to put effort into working with us, not that we set any sort of barrier to entry based on knowledge/understanding alone.

True, and an important point. But unless the PyCharm (or VS Code, or whoever else) developers engage with us, and explain the constraints they are working under, we can’t know that. When I said “expecting tools that make it easier for non-experts to use Python to do some work on simplifying the harder aspects of the process”, I was trying to say that we get them to give us design input, not that we do nothing and make them do all the work.

2 Likes

FYI the Python extension for VS Code is very engaged here. :wink:

2 Likes

I know. Sorry, re-reading my post, I realise it gave the impression I thought otherwise. That wasn’t my intention. (The way VS code is represented is ideal, IMO - you’re engaged with the processes, without pushing a specific “VS Code needs this” agenda. We need more communities giving input on that sort of basis - or maybe we have them, it’s by definition hard to tell :slightly_smiling_face:)

1 Like

@pf_moore would you be available to help me start that PEP (if not could you point me to someone who could help) ? I’m completely lost in the process …

I found this: https://www.pypa.io/en/latest/specifications/#proposing-new-specifications but honestly it’s quite confusing and not really helpful.

I try to understand:

  • How to obtain a PEP number ?
  • Where should I go to open the PEP ?
  • Is the PR to " Python Packaging User Guide repository" the PEP itself ? Or it’s a completely different process and shall be done once we have reached consensus in the PEP ?
  • Is there a template I can follow ?

If you or someone could help, I would truly appreciate: contact@jonathandekhtiar.eu

I’m available over emails / phone / slack / webex / google hangout / IRC :wink:

https://www.python.org/dev/peps/pep-0001/ outlines the process for a PEP.

Once you have posted a draft here that people are happy with you submit a PR to https://github.com/python/peps and we will assign you a number.

See above.

This part I have never understood the point of (the “PEP + packaging.python.org” part, not why the specs are kept where they are part), so I can’t help answering the question.

https://www.python.org/dev/peps/pep-0012/ for PEPs.

Sorry, I’m snowed under with other responsibilities at the moment, so I can’t offer to help.

One of the requirements for submitting a PEP is to find a “sponsor”, specifically to assist with guiding the author through the process. So I think you should definitely look for someone to sponsor you here (or maybe even a co-author who’s more experienced with Python packaging who can help).

@jonathandekhtiar I’m happy to help, stick some time on my calendar here and we can have a chat.

For whatever it’s worth, here’s my suggestions:

  • Add a new field to the metadata, Default-Extras or something, which is basically just a list of extras that will be selected by default.
  • Extend PEP 508 to support negating an extra, which would allow people to unselect one or more of the default extras.
    • This will require defining what happens if you have explicit opt in and opt out of some named dependency.

That’s all I would really do, I dislike all of the attempts to shuffle some implicit extra in by not having a name for it or whatever.

2 Likes

FWIW, here is a use case that I have. I’ll be concrete, but hopefully not so detailed as to be distracting.

The xarray package has two dependencies: numpy and pandas. My package uses numpy, and could use some features of xarray that don’t require pandas, but I don’t want to pull pandas into my dependency tree. (I’m willing to accept ImportErrors if my code accesses pandas-dependent features.)

As it stands, I would either need to convince them to break out the pandas-independent concepts into a separate package (plausible, but a nontrivial effort), or move the pandas dependency into an extra and make all packages depending on that functionality install xarray[pandas] (non-starter).

If I could submit a PR to make xarray[minimal] or xarray[!pandas] a way to just get xarray and numpy, the maintenance burden on xarray would be substantially lower. It would also only (okay, mostly) be used by downstream packages like mine, that understand the risk they’re taking to not pull in all dependencies and to watch upstream to ensure that functionality is maintained with the reduced dependency set.

To restate this a bit more simply, you want:

  • pip install xarray to install pandas, numpy
  • pip install xarray[nopandas] to just install numpy

so a setup.py would look something like this:

install_requires=[],
default_extras_requires=["numpy", "pandas"],
extras_require={
    "nopandas":  ["numpy"],
}
1 Like

I think I’m +1 to this. I don’t see this as being easy to solve with the existing metadata.

I’m not convinced this is necessary? I think just the availability of default extra dependencies would cover all the use cases here.

Sure. I’m not wedded to any syntax. I just wanted to explain a use case a bit more concretely.

1 Like

If you can’t negate a default extra, then what’s the difference between a default extra and install requires?

Based on the example outlined here, Dustin’s proposal is that explicitly specifying any extras disables the default extra. This further emphasises the point that someone really needs to write something down before this topic can be meaningfully discussed :slightly_smiling_face:

I don’t think that works in practice TBH, otherwise you get into weird situations like what does it mean if someone depends on foo and bar, and foo depends on spam, and bar depends on spam[thing]. It also sounds like the kind of weird, implicit action at a distant thing that trips up a lot of folks with packaging.

Woah now! I lay no claim to this idea. Just trying to help figure it out :slightly_smiling_face:

How is this different than what we already have now?

The challenge of getting everyone here to grok the problem is definitely making me feel like it’s really going to be challenging for the average user to understand it. That said, the amount of people coming out of the woodwork asking for this (and the lack of suitable alternatives) makes it still feel, worthwhile.

Re-reading this, I think this is where the disconnect happens: I’m talking about a single default extra, which is a list of one or more dependencies that are effectively appended to install_requires if no extra is requested. There isn’t ever more than one “default extra” available (hence why I was confused about trying to negate it).

1 Like