Adding a default extra_require environment

I think there may be scope for both cases being needed depending on circumstances. More reason why I think this needs designing properly based on a consistent underlying model.

And roughly analogously, we have an ongoing discussion on pip where the new resolver is going to be more strict and we may need to allow the user to override declared dependencies :slightly_smiling_face: (I won’t go into the details here, as they aren’t relevant, but I think the point remains that sometimes user control is needed).

I can give an example for what I was trying to do.

We are shipping one of our products as a wheel that requires a number of dependencies. In order to ensure users that our product is tested and stable with a specific set of dependencies we pin the dependencies for all of the dependencies.

However, very often one of the dependencies get updated in between our release cycles. And this can lead to significant performance increase and bugs.

Therefore we want our users to benefit from the latest performance release and updates (within the same minor release range) but also prevent users with critical jobs to be using the specific dependencies that we have tested and are confident about.

It would give something like this

pip install pkg # gives you for instance numpy==1.13.0

# or

pip install pkg[latest] # gives you numpy>=1.13.0,<1.14

Problem is there is no way that I’m aware to do such a thing.
And of course numpy==1.13.0 is compliant with the range, so having the two at the same time just returns 1.13.0 all the time


Might be a very corporate requirement but it sounds to me like we can’t be the only company who would wish to do smthg like this

1 Like

My use case would be for projects that have both an API/library component and an app/CLI/GUI component. I would like the average end user to simply do pip install project and the dependency definition would be pip install project[lib]. Currently we’re forced to do the opposite: https://datadoghq.dev/integrations-core/setup/#installation

But since your extra is going to just add another dependency, why wouldn’t you just publish project and project-lib and use a normal dependency from A to B?

2 Likes

@steve.dower In this specific case, it might be doable (I can’t speak for datadog). However, please consider the following:

Maintaining two packages (even if identical) has some impact, namely if you have a chain of dependencies:

  • project and project-lib || project[lib] and now let’s say you want to build project-cronjob. Which gives you some automation capabilities. Sweet, how do you enforce project-cronjob to depend on project or project-lib. I mean sure, your proposal works for a very specific use-case, but can potentially create a chain reaction of problems.

  • having two packages means that you need to package two projects (even if identical) and make sure you keep them in sync. And we all know that over time, this tends to create issues. Who didn’t have the issue of two packages supposed to be updated together and effectively one of CI jobs failed and only one was pushed to PyPI (even if it’s just a flaky error). Or whatever other issue that made the two package to fall out of sync.

  • Shall I even talk about the fact that it creates more mess in the documentation and the risk of confusing users ? (This point is arguable I agree).

  • Let’s add some funk to @ofek scenario (because I don’t like simple projects, and because overkill is fun :sunglasses: ). So in addition to the GUI by default which needs extra dependencies compared to the CLI (like QT or TKinter). Why not adding some GPU compute (like with cupy #CUDA). After all Datadog provides data processing tools, could be a realistic scenario. But on the other hand, project managers at datadog may not want to block users who don’t have CUDA enabled GPU (like with Tensorflow: tensorflow and tensorflow-gpu) and provide a CPU backend based on numpy on option (they want to provide CUPY by default for the sweet performance you would get by default). So following your proposal, this is what I should do:

    • project-lib # gives a GUI + CUPY
    • project-lib-cpu # gives a GUI + NUMPY
    • project # gives a CLI + CUPY
    • project-cpu # gives a CLI + NUMPY
  • I think the previous point shows that it might be acceptable in some cases to say okay please create a 2nd package and you’re fine. However, this approach is not scalable. Let’s agree that doing the following, is just considerably less convoluted:

    • project # gives a GUI + CUPY
    • project[cpu] # gives a GUI + NUMPY (keep GUI dependencies & replace CUPY with Numpy)
    • project[cli] # gives a GUI + CUPY (remove GUI dependencies & keep CUPY)
    • project[cli,cpu] # gives a CLI + NUMPY (remove GUI dependencies & replace CUPY with Numpy)

Maybe the term extra_requires is just not the one that shall be kept and we shall change that to something more like:

  • environments: CPU vs GPU, CLI vs GUI
  • default_environment: which combination of environments (str or list/tuple of str) is default if None is specified.

Would be nicely backward compatible since we can build on top of extra_requires, issue a sweet and easy to understand deprecation warning, retain most of the current mechanic and finally have a more “easy to understand” and flexible way to manage default “extra” (which, as some of you pointed out, could be a little weird, if it’s “extra”, why shall there be a default)

from setuptools import setup

environments = {
   'cpu': ["numpy"],
   'gpu': ["cupy"],
   'gpu': ["tkinter"],
   'cpu': [],
}

setup(
    name='test_pkg',
    version='1.0.0',
    environments=environments,
    default_environment=["gpu", "gui"]  # solution A
    default_environment="gpu,gui"       # solution B
)
1 Like

Sure, the expanded options get way more complicated (which is why I proposed Idea: selector packages), but even using extras you are still going to have to create those packages. And if you want to provide project[lib] for your users, then the interface of that matters as much (more?) than the interface of just project. Or else your project[lib] users are going to have to carry your entire GUI implementation as well, even though they don’t get the dependencies needed to make it work.

Extras are just that - extra dependencies. Everything else is architecture :wink: (and yes, it’s complicated, which is why we avoid doing it for free).

I like the idea of explicitly specifying default extras though. Doesn’t solve the bigger problem of users needing to know what they need before they know that they need to know, but at least it smooths off one of the current issues.

@steve.dower Would you be more in favor of extras + default_extra or the idea of environments and default_environment I proposed just above ?

The main advantage of adding default_extra or default_environment would be that it’s 100% backward compatible. We don’t change anything to already present public APIs. If you don’t use the feature, absolutely nothing will change.

And hopefully the change shall be very minor. When we read if the user asked for any extra we can say:

  • if None: read the default field (set to None by default)
  • else use what the user specified

@pf_moore @dustin @uranusjr any opinion ?

Well, I do like to keep things simple, and I don’t like overkill :slight_smile:

Basically, my view is that project developers (and I include myself) have a strong tendency to assume that “their” complexity is essential and unavoidable, and tools must clearly support their use case. However, Python packaging is struggling under the weight of the complexity we’ve accumulated over many years, and I strongly believe that what we need is simplification, not further complexity. We should be spending time looking at our core concepts and models, and considering how to make them simpler, better defined, and more manageable and composable, and not just loading more and more features on an ill-defined and adhoc base.

This is not to say that we can’t support additional use cases, or that we can’t grow, but we should do so by looking at our underlying model and questioning whether it really supports the needs of today’s community. That’s why I think we should be considering whether we "need something that’s conceptually slightly different than extras”, as I mentioned above. And it’s why I think that @steve.dower’s question, why can’t you ship two projects, is important here. Is it not possible that what you really need is for the tools and ecosystem to make it so that handling two projects is more manageable, rather than trying to “fix”

By pushing the complexity back onto the tools, you’re adding overhead to the packaging infrastructure, making it more fragile, harder to maintain, and more prone to failure. When projects like pip are maintained by a team of less than 10 people, all of whom are working on a totally volunteer basis in their spare time, that’s adding a huge risk to your projects. So looking at it that way, maintaining a single package also has an impact, and that impact isn’t just on your projects, but also on other projects which are competing with you for limited tool developer resource.

This is going more in the direction I’m thinking of - but I’m not sure if you’re intending it as a full replacement to extras, or just as a way of restating your proposal using different terminology. It will need a lot more thinking through if it’s to replace “extras” as a concept, so I reserve judgement on it until I see a full proposal (which would need to include a discussion of how we migrate existing projects off extras - just adding another approach to the mix won’t simplify anything). @steve.dower’s selector packages proposal is another idea that may be worth looking at here.

Why should we agree to that? Why not have project, project-lib, project-gui, project-cli, project-cpu, and users can install whichever parts of the functionality they want? From a user perspective, that seems far more straightforward to me. Of course, it’s more work for the project to structure itself to make the various parts intuitive for the user. And can we all agree that it’s making things simpler for users is the ultimate aim here? (That’s not rhetorical, this discussion is about making development easier, but it seems to me that we’re just changing which developers have to invest effort, rather than looking at what works best for the users).

Sorry - this got a bit beyond the scope of a single change to how extras work. But I do think the proposal exposes a lot of the issues

2 Likes

It was intended as a pun to be honest (notice the emoji) :wink: To be fair, I was expecting a remark on that one … In any case, I wasn’t implying the packaging mechanism may be more complex but more that someone may have a complex package with convoluted dependencies.

Agreed.

Honestly it sounds unrealistic to me. What if tomorrow I want to add whatever other default feature. Are we going to recommend: “Oh yeah sure, please create a combinatorial number of pip packages. That now you need to maintain properly and document. This is the way to go and definitely how you should advertise to your users…”. I gave this somehow realistic (not overly complexified) example to show that this approach doesn’t look realistic to me.
One of the reasons extra_dependencies was created at the beginning (which I might have misunderstood) is to avoid having to create N packages for each scenarios and allow users to specificy additional use-cases without installing a different package. So I agree that having a “default extra” sound kind of weird and maybe a re-branding is needed. However, following the approach please create N packages sounds even more against the original idea IMHO. And that wouldn’t solve at all this issue of the dependency chain:


Totally agree, I’m absolutely not trying to “impose” any requirement or something. I’m trying to highlight a number of limitations and trying to find with you all the best way we can address them.


If the decision was only on me, I would do a complete replacement of extra_requires by environments. And keep the extra_requires for a while in the API (backward compat.) but effectively makes it creates a set of environments. Then add the default_environment argument. IMO this approach has a few advantages:

  • Can easily be made 100% backward compatible: can be understood as a rename for the most part.

  • Quite simple to implement, as I just said, in principle it works as an arg rename.

  • We can issue a nice deprecation warning for a very long time if extra_requires is not None, and the fix is just dead simple: please replace 'extra_requires' by 'environments' in your setup() call.

  • default_environment makes a lot more sense “intuitively” compared to default_extra. Hopefully nicely addressing the “ergonomic” issue raised by @uranusjr and @dustin.

  • very simply & naively plug default_environment at the location where we read if the user asked for any extra (now called environment). If we read no extra / environment requested (aka. extra is None) then look if default_environment has been defined in the METADATA file. If so, use that value. If not keep None.

I don’t see how we could make it more simple to implement and maintain while keeping the number of changes as small as possible (thus minimizing risk to introduce a bug).

The other solution would be to create default_extra and only keep the last point (which might cause some ergonomic issues, but we can’t have it all, and is still a much better trade-off in my opinion than not being able to do this at all):

  • very simply & naively plug default_extra at the location where we read if the user asked for any extra. If we read no extra requested (aka. extra is None) then look if default_extra has been defined in the METADATA file. If so, use that value. If not keep None.

That sounds like you’re seeing this as simply a renaming, with the new feature “sneaked in” as default_environment. But I’m more asking about semantics, and just renaming doesn’t clarify semantics at all.

I’ve read your post and you make some good points. But I think the crux of my concern here is that we’ve ended up talking about implementation and maintenance (which is what I’ve been pushing for details on, so that’s on me) where in reality the big problem is that we need to start with a proposed change to the design spec, so we can see the implications in context.

And that’s where the real issue lies - there isn’t a design spec for extras. There are some places where we document how to reference them (in metadata, for example) but there’s no documentation of the concept, or how they should work¹ - at least not in https://packaging.python.org/specifications/ or as a PEP.

As a community, we’re trying to move away from implementation defined behaviour and towards standards. Extras are a particularly bad case of this, as they were originally implementation-defined by setuptools, and then pip added support for them, resulting in a second level of implementation defined behaviour. My point here is that we need to stop and write a spec before going further down this path. How should build tools other than setuptools implement extras? How about front ends other than pip? How would the new feature impact their design choices? These aren’t rhetorical questions.

(Sorry, I know you have a specific issue here, and weren’t looking for a big debate on principles², but this is the biggest problem with the current state of Python packaging, IMO, we need to get to a situation where we can look at focused proposals without having to consider the whole of the ecosystem every time…)

¹ At least, not that I’m aware of. If I missed something, please let me know!
² You want room 12A down the corridor, this is “abuse” :slight_smile:

3 Likes

Fair enough. In all honesty I was trying to understand which direction could have the most chance to converge. This work is a first timer to me in the Python “core”. And I’m really eager to learn and hopefully will contribute more globally in the future.

So let me go ahead and try to formalize things in a PEP. Let’s see how I manage that :slight_smile:
Let me a few days and I’ll come back to you.

If someone, is opened to a rapid “mentorship” to contributing to the Python core. I’ll be glad to have someone to ask a few questions :slight_smile:

Thanks everyone

1 Like

I just wanted to add another example showing why I was looking for this kind of feature.

Something similar came up here, but our GUI package Kivy supports various backends for text, image, video, etc. But, at least one backend must be installed for Kivy to work.

Currently, pip install kivy does not install any backends so kivy won’t work without additional steps. Previously we just told users to manually install the dependencies along with listing all our backend options (e.g. pip install kivy_deps.sdl2==x.y.z).

This led to many opened issues of failed installations over time due to user confusion. And this seemingly got worse in recent times because many users don’t read install docs and simply install Kivy graphically in PyCharm by searching for Kivy and then clicking install. Naturally this doesn’t install any dependencies.

My improvement was to add base and full keys in extras_require. When specified, base will install a set of per-platform dependencies that I judged a average user would want. So now we just tell users to install it with pip install kivy[base]. But this doesn’t solve e.g. the PyCharm problem.

So I was looking for a way to make pip install kivy “default” to installing base as well, but also a way to say pip install kivy --no-extras or pip install kivy[] so advanced users who want the install_requires dependencies but not the base dependencies can do that. If this doesn’t work out, perhaps we’ll just add our base dependencies to install_requires and have advanced users install kivy with --no-deps and then manually install the “real” install_requires. But ideally that would not be required.

3 Likes

I wonder if it would be significantly easier to have an extra == None/extras == '<none>'/extra == "", so that we can have the following in a setup.py:

setup(
    ...
    extras_require={
        "<none>": ["pymarkdown"],
        "PDF":  ["ReportLab>=1.2", "RXP"],
        "reST": ["docutils>=0.3"],
    },
    ...
)

I think this may be a much smaller PEP, and a much simpler change for build systems (setuptools, poetry et al).

1 Like

I wonder if anything has to be done at all. If people are not able to follow the simple instructions to install a library with the appropriate extras, how would they manage to write the Python code needed to use the library to begin with? I am always dumbfounded that such trivial tasks are an issue at all. (Which makes me think it might be a documentation issue, more than a packaging/tooling issue.)

It seems harsh, but I feel like at least some of the burden has to be delegated to the user at some point. I’m all for providing a nice user experience, but if the user is a Python developer (even a beginner) then I believe it is safe to put the bar for entry a bit higher than one would expect for an app store for example.

In the case of PyCharm, I have never used it, so a question: Does PyCharm proposes in its UI a list of the available extras and makes it part of its search-and-install wizard or whatever is being talked about here? Because that would be essential and I would consider it a serious UX issue if it weren’t there. Maybe the burden should be pushed on PyCharm here.

Also I couldn’t find the issue with pip freeze being addressed in the suggestions.

I think there’s a lot of people who use Python as a tool, but who don’t necessarily understand programming or things like package management. Data science is a good example here in my experience. There’s certainly people who know business intelligence, data analysis, and the like and who don’t really care that much about Python except as the tool that lets them do that.

Having said that, you do make a good point here. At some point we have to decide how much it’s down to us to make Python easy to use for people with less interest/experience, and how much we should expect them to meet us half way and try to learn the basics for themselves.

There’s definitely a good argument for expecting tools that make it easier for non-experts to use Python to do some work on simplifying the harder aspects of the process, and not just put a gloss on the easy bits.

Having said all of this, I do think there’s a case here that extras are complex and fiddly to use, and don’t model the problem that needs to be solved particularly well. So I do support the idea of revisiting them with the idea of improving things. I just don’t think “making it easier for users to get started without reading the instructions” is the correct goal - we should be looking at “providing a good model for how (modern!) projects want to deploy their functionality” and “having a consistent and understandable design” instead. Advanced features can be understandable and usable without dumbing down.

So I think “kivy is a core plus backends, and we want to allow users to select what backends to install but also provide a default set if the user doesn’t have a particular set they want” is a reasonable use case to explore. I don’t think “users don’t read install docs and expect to click and magically get exactly what they want” is (even if addressing the former incidentally improves the story for the latter).

But there are “advanced” questions that should be answered, relevant to the “deployment model” scenario, that don’t matter to the “click to install” scenario. For example, how does someone install just the core project, without any backends (maybe it’s a backend developer)? Or how do developers install sets of backends that don’t match up with a predefined “extra”? Can users define their own bundles to install (maybe for common use throughout an organisation), or are they restricted to just what the project defines?

All of this goes way beyond the original idea of “add a default extra”. And that’s intentional, in a way - as I said before, there’s not really a well-defined underlying model for extras, so it’s not clear how to answer any of these questions. Answering just one, the “we need a default” situation, leaves implementations having to figure out interactions like this on their own.

2 Likes

I work with some of the best engineers in the world, and packaging (particularly Python) trips everyone up regularly. Doesn’t stop them from writing code. There’s a very real complexity imbalance here.

Most often it seems to be due to referring to online search results and hearsay (blogs, tweets) rather than project docs. (As one concrete but unrelated example, I often surprise people by showing them the official “how to run Python” page on docs.python.org, because everyone “out there” talks about how to do it but nobody refers people to the real docs.) I often get to see Frankenstein combinations of three different “best practices” from different blog posts/StackOverflow. Sometimes people find the real docs and are very successful.

So people can follow simple instructions, but we’re in a culture where people don’t even look for them. The best we can do in that case is fail well and include links to the real docs in error messages (see numpy’s ImportError for a very good example).

As a tool developer myself, I don’t think we (as in Python/PyPA) get to push the burden there, though we can encourage them that it would be in the best interests of their users.

Unfortunately, you have to download the package to see the extras. Downloading every package to fill in that UI is not in anyone’s best interest, so it seems unlikely. We could help by improving index metadata, though that comes with vast complexity and cannot be assumed by tools, but projects could help themselves sooner by handling the “no backends” case with a good error, and having a clear installation command at the top of their package long description (a.k.a. Readme file).

My overall impression (and others seem to somewhat concur) is that a large portion of the use cases for a default extra can be solved with better documentation, and better error messages when dependencies are missing. I don’t see a way to add such a default extra while still having the pip freeze > req.txt; pip install -r req.txt workflow behave as expected.

In order to provide the nice out of the box experience for first-time users, projects might want to have a top level project MyThing that is basically just an empty thing (no code) that depends on MyThingCore[backend_default] and nothing more (not even a version range). And returning users, who are willing to put the effort and actually read installation instructions would naturally move to installing MyThingCore[backend_gpu_accelerated,bells,whistles]>=1.2.3, once they feel the need to go beyond what the default installation provides.

For more advanced needs (which I agree they might be legitimate), seems like it has to be something else than extras and for a possible solution (or at least inspiration) I would like to draw attention again on the ongoing work being done in poetry (they seem to be seriously going for it since it’s on their roadmap):

Aside:

True. That could be partly covered by this proposition (which could also reduce – at least slightly – the urgency of specifying source distribution file names, and maybe more):

True. And the history of Python packaging means that the internet is full of outdated and inaccurate information, which makes the issue worse (for both us and the end users). I’m absolutely 100% in agreement that we should fail in an informative and helpful way when we hit an issue. And I can’t speak for other tools, but pip is bad at doing that at the moment.

But that’s not what we’re talking about here - we’re talking about people who want to install X and when they read the documentation for X and it says “do A, B, C” then they don’t do that. Again, to be very clear, I’m fine with people taking that approach (I’m prone to doing it myself - “I don’t want to install pipenv globally, so I’ll do my own hack to do what I want”). But it comes with a responsibility - if you break it, you have to fix it yourself (or accept your approach is wrong and use the instructions as written). Personally, I’m arguing that we don’t optimise for end users who do their own thing and aren’t willing to put effort into working with us, not that we set any sort of barrier to entry based on knowledge/understanding alone.

True, and an important point. But unless the PyCharm (or VS Code, or whoever else) developers engage with us, and explain the constraints they are working under, we can’t know that. When I said “expecting tools that make it easier for non-experts to use Python to do some work on simplifying the harder aspects of the process”, I was trying to say that we get them to give us design input, not that we do nothing and make them do all the work.

2 Likes

FYI the Python extension for VS Code is very engaged here. :wink:

2 Likes

I know. Sorry, re-reading my post, I realise it gave the impression I thought otherwise. That wasn’t my intention. (The way VS code is represented is ideal, IMO - you’re engaged with the processes, without pushing a specific “VS Code needs this” agenda. We need more communities giving input on that sort of basis - or maybe we have them, it’s by definition hard to tell :slightly_smiling_face:)

1 Like