Adding a default extra_require environment

pf_moore · August 4, 2020, 3:13pm

As a further thought, there are two distinct use cases here:

Users installing readme_renderer may choose at install time whether they want the large dependency. This is an interactive choice and may vary each time the package is installed.
Projects with readme_renderer will decide statically which “version” of readme_renderer they want to install. The end user gets no choice, the decision is hard coded in the higher level project’s metadata.

This seems like it’s potentially the wrong solution - wouldn’t the end user want the choice in case (2) as well?

sinoroc · August 4, 2020, 3:27pm

Thinking about it more:

That would install null_extra_dependency no matter what, once it goes through pip freeze>req.txt, wouldn’t it?

dustin · August 4, 2020, 3:51pm

Yes, but I think that’s due to this pip bug: Extras not getting installed when the package that includes the extras has already been installed · Issue #4957 · pypa/pip · GitHub.

dustin · August 4, 2020, 3:55pm

I think, no? In this case, we wanted the top-level dependency (twine) to be able to have the final say on what extra gets installed. Imagine if the top-level dependency was only compatible with one extra – shouldn’t it be able to declare what it’s compatible with?

This feels roughly analogous to saying “why do we need version specifiers, shouldn’t the user have the final say on what version of a subdependency gets installed?”

uranusjr · August 4, 2020, 4:53pm

I think the new resolver does this as expected, and have commented under the linked issue to ask for feedback.

Edit: After re-reading the original reply, I believe #4957 is not the cause to the problem. It’s because pip does not record extras information when installing a package. If “default extras” become a thing, pip freeze should persuambly be modified to freeze mypackage in a way that does not install any extras, instead of installing the “default extra”.

sinoroc · August 4, 2020, 4:55pm

Isn’t it slowly drifting into the territory of what Obsoletes-Dist (or similar metadata) should have been?

I don’t see the relation. I must be misunderstanding something.

pf_moore · August 4, 2020, 4:56pm

I think there may be scope for both cases being needed depending on circumstances. More reason why I think this needs designing properly based on a consistent underlying model.

And roughly analogously, we have an ongoing discussion on pip where the new resolver is going to be more strict and we may need to allow the user to override declared dependencies (I won’t go into the details here, as they aren’t relevant, but I think the point remains that sometimes user control is needed).

jonathandekhtiar · August 4, 2020, 7:25pm

I can give an example for what I was trying to do.

We are shipping one of our products as a wheel that requires a number of dependencies. In order to ensure users that our product is tested and stable with a specific set of dependencies we pin the dependencies for all of the dependencies.

However, very often one of the dependencies get updated in between our release cycles. And this can lead to significant performance increase and bugs.

Therefore we want our users to benefit from the latest performance release and updates (within the same minor release range) but also prevent users with critical jobs to be using the specific dependencies that we have tested and are confident about.

It would give something like this

pip install pkg # gives you for instance numpy==1.13.0

# or

pip install pkg[latest] # gives you numpy>=1.13.0,<1.14

Problem is there is no way that I’m aware to do such a thing.
And of course numpy==1.13.0 is compliant with the range, so having the two at the same time just returns 1.13.0 all the time

Might be a very corporate requirement but it sounds to me like we can’t be the only company who would wish to do smthg like this

ofek · August 5, 2020, 4:33pm

My use case would be for projects that have both an API/library component and an app/CLI/GUI component. I would like the average end user to simply do pip install project and the dependency definition would be pip install project[lib]. Currently we’re forced to do the opposite: https://datadoghq.dev/integrations-core/setup/#installation

steve.dower · August 5, 2020, 6:12pm

But since your extra is going to just add another dependency, why wouldn’t you just publish project and project-lib and use a normal dependency from A to B?

jonathandekhtiar · August 5, 2020, 8:25pm

@steve.dower In this specific case, it might be doable (I can’t speak for datadog). However, please consider the following:

Maintaining two packages (even if identical) has some impact, namely if you have a chain of dependencies:

project and project-lib || project[lib] and now let’s say you want to build project-cronjob. Which gives you some automation capabilities. Sweet, how do you enforce project-cronjob to depend on project or project-lib. I mean sure, your proposal works for a very specific use-case, but can potentially create a chain reaction of problems.
having two packages means that you need to package two projects (even if identical) and make sure you keep them in sync. And we all know that over time, this tends to create issues. Who didn’t have the issue of two packages supposed to be updated together and effectively one of CI jobs failed and only one was pushed to PyPI (even if it’s just a flaky error). Or whatever other issue that made the two package to fall out of sync.
Shall I even talk about the fact that it creates more mess in the documentation and the risk of confusing users ? (This point is arguable I agree).
Let’s add some funk to @ofek scenario (because I don’t like simple projects, and because overkill is fun ). So in addition to the GUI by default which needs extra dependencies compared to the CLI (like QT or TKinter). Why not adding some GPU compute (like with cupy #CUDA). After all Datadog provides data processing tools, could be a realistic scenario. But on the other hand, project managers at datadog may not want to block users who don’t have CUDA enabled GPU (like with Tensorflow: tensorflow and tensorflow-gpu) and provide a CPU backend based on numpy on option (they want to provide CUPY by default for the sweet performance you would get by default). So following your proposal, this is what I should do:
- project-lib # gives a GUI + CUPY
- project-lib-cpu # gives a GUI + NUMPY
- project # gives a CLI + CUPY
- project-cpu # gives a CLI + NUMPY
I think the previous point shows that it might be acceptable in some cases to say okay please create a 2nd package and you’re fine. However, this approach is not scalable. Let’s agree that doing the following, is just considerably less convoluted:
- project # gives a GUI + CUPY
- project[cpu] # gives a GUI + NUMPY (keep GUI dependencies & replace CUPY with Numpy)
- project[cli] # gives a GUI + CUPY (remove GUI dependencies & keep CUPY)
- project[cli,cpu] # gives a CLI + NUMPY (remove GUI dependencies & replace CUPY with Numpy)

Maybe the term extra_requires is just not the one that shall be kept and we shall change that to something more like:

environments: CPU vs GPU, CLI vs GUI
default_environment: which combination of environments (str or list/tuple of str) is default if None is specified.

Would be nicely backward compatible since we can build on top of extra_requires, issue a sweet and easy to understand deprecation warning, retain most of the current mechanic and finally have a more “easy to understand” and flexible way to manage default “extra” (which, as some of you pointed out, could be a little weird, if it’s “extra”, why shall there be a default)

from setuptools import setup

environments = {
   'cpu': ["numpy"],
   'gpu': ["cupy"],
   'gpu': ["tkinter"],
   'cpu': [],
}

setup(
    name='test_pkg',
    version='1.0.0',
    environments=environments,
    default_environment=["gpu", "gui"]  # solution A
    default_environment="gpu,gui"       # solution B
)

steve.dower · August 5, 2020, 8:46pm

Sure, the expanded options get way more complicated (which is why I proposed Idea: selector packages), but even using extras you are still going to have to create those packages. And if you want to provide project[lib] for your users, then the interface of that matters as much (more?) than the interface of just project. Or else your project[lib] users are going to have to carry your entire GUI implementation as well, even though they don’t get the dependencies needed to make it work.

Extras are just that - extra dependencies. Everything else is architecture (and yes, it’s complicated, which is why we avoid doing it for free).

I like the idea of explicitly specifying default extras though. Doesn’t solve the bigger problem of users needing to know what they need before they know that they need to know, but at least it smooths off one of the current issues.

jonathandekhtiar · August 5, 2020, 8:48pm

@steve.dower Would you be more in favor of extras + default_extra or the idea of environments and default_environment I proposed just above ?

The main advantage of adding default_extra or default_environment would be that it’s 100% backward compatible. We don’t change anything to already present public APIs. If you don’t use the feature, absolutely nothing will change.

And hopefully the change shall be very minor. When we read if the user asked for any extra we can say:

if None: read the default field (set to None by default)
else use what the user specified

@pf_moore @dustin @uranusjr any opinion ?

pf_moore · August 5, 2020, 10:24pm

Well, I do like to keep things simple, and I don’t like overkill

Basically, my view is that project developers (and I include myself) have a strong tendency to assume that “their” complexity is essential and unavoidable, and tools must clearly support their use case. However, Python packaging is struggling under the weight of the complexity we’ve accumulated over many years, and I strongly believe that what we need is simplification, not further complexity. We should be spending time looking at our core concepts and models, and considering how to make them simpler, better defined, and more manageable and composable, and not just loading more and more features on an ill-defined and adhoc base.

This is not to say that we can’t support additional use cases, or that we can’t grow, but we should do so by looking at our underlying model and questioning whether it really supports the needs of today’s community. That’s why I think we should be considering whether we "need something that’s conceptually slightly different than extras”, as I mentioned above. And it’s why I think that @steve.dower’s question, why can’t you ship two projects, is important here. Is it not possible that what you really need is for the tools and ecosystem to make it so that handling two projects is more manageable, rather than trying to “fix”

By pushing the complexity back onto the tools, you’re adding overhead to the packaging infrastructure, making it more fragile, harder to maintain, and more prone to failure. When projects like pip are maintained by a team of less than 10 people, all of whom are working on a totally volunteer basis in their spare time, that’s adding a huge risk to your projects. So looking at it that way, maintaining a single package also has an impact, and that impact isn’t just on your projects, but also on other projects which are competing with you for limited tool developer resource.

This is going more in the direction I’m thinking of - but I’m not sure if you’re intending it as a full replacement to extras, or just as a way of restating your proposal using different terminology. It will need a lot more thinking through if it’s to replace “extras” as a concept, so I reserve judgement on it until I see a full proposal (which would need to include a discussion of how we migrate existing projects off extras - just adding another approach to the mix won’t simplify anything). @steve.dower’s selector packages proposal is another idea that may be worth looking at here.

Why should we agree to that? Why not have project, project-lib, project-gui, project-cli, project-cpu, and users can install whichever parts of the functionality they want? From a user perspective, that seems far more straightforward to me. Of course, it’s more work for the project to structure itself to make the various parts intuitive for the user. And can we all agree that it’s making things simpler for users is the ultimate aim here? (That’s not rhetorical, this discussion is about making development easier, but it seems to me that we’re just changing which developers have to invest effort, rather than looking at what works best for the users).

Sorry - this got a bit beyond the scope of a single change to how extras work. But I do think the proposal exposes a lot of the issues

jonathandekhtiar · August 5, 2020, 11:10pm

It was intended as a pun to be honest (notice the emoji) To be fair, I was expecting a remark on that one … In any case, I wasn’t implying the packaging mechanism may be more complex but more that someone may have a complex package with convoluted dependencies.

Agreed.

Honestly it sounds unrealistic to me. What if tomorrow I want to add whatever other default feature. Are we going to recommend: “Oh yeah sure, please create a combinatorial number of pip packages. That now you need to maintain properly and document. This is the way to go and definitely how you should advertise to your users…”. I gave this somehow realistic (not overly complexified) example to show that this approach doesn’t look realistic to me.
One of the reasons extra_dependencies was created at the beginning (which I might have misunderstood) is to avoid having to create N packages for each scenarios and allow users to specificy additional use-cases without installing a different package. So I agree that having a “default extra” sound kind of weird and maybe a re-branding is needed. However, following the approach please create N packages sounds even more against the original idea IMHO. And that wouldn’t solve at all this issue of the dependency chain:

Totally agree, I’m absolutely not trying to “impose” any requirement or something. I’m trying to highlight a number of limitations and trying to find with you all the best way we can address them.

If the decision was only on me, I would do a complete replacement of extra_requires by environments. And keep the extra_requires for a while in the API (backward compat.) but effectively makes it creates a set of environments. Then add the default_environment argument. IMO this approach has a few advantages:

Can easily be made 100% backward compatible: can be understood as a rename for the most part.
Quite simple to implement, as I just said, in principle it works as an arg rename.
We can issue a nice deprecation warning for a very long time if extra_requires is not None, and the fix is just dead simple: please replace 'extra_requires' by 'environments' in your setup() call.
default_environment makes a lot more sense “intuitively” compared to default_extra. Hopefully nicely addressing the “ergonomic” issue raised by @uranusjr and @dustin.
very simply & naively plug default_environment at the location where we read if the user asked for any extra (now called environment). If we read no extra / environment requested (aka. extra is None) then look if default_environment has been defined in the METADATA file. If so, use that value. If not keep None.

I don’t see how we could make it more simple to implement and maintain while keeping the number of changes as small as possible (thus minimizing risk to introduce a bug).

The other solution would be to create default_extra and only keep the last point (which might cause some ergonomic issues, but we can’t have it all, and is still a much better trade-off in my opinion than not being able to do this at all):

very simply & naively plug default_extra at the location where we read if the user asked for any extra. If we read no extra requested (aka. extra is None) then look if default_extra has been defined in the METADATA file. If so, use that value. If not keep None.

pf_moore · August 6, 2020, 7:54am

That sounds like you’re seeing this as simply a renaming, with the new feature “sneaked in” as default_environment. But I’m more asking about semantics, and just renaming doesn’t clarify semantics at all.

I’ve read your post and you make some good points. But I think the crux of my concern here is that we’ve ended up talking about implementation and maintenance (which is what I’ve been pushing for details on, so that’s on me) where in reality the big problem is that we need to start with a proposed change to the design spec, so we can see the implications in context.

And that’s where the real issue lies - there isn’t a design spec for extras. There are some places where we document how to reference them (in metadata, for example) but there’s no documentation of the concept, or how they should work¹ - at least not in PyPA specifications - Python Packaging User Guide or as a PEP.

As a community, we’re trying to move away from implementation defined behaviour and towards standards. Extras are a particularly bad case of this, as they were originally implementation-defined by setuptools, and then pip added support for them, resulting in a second level of implementation defined behaviour. My point here is that we need to stop and write a spec before going further down this path. How should build tools other than setuptools implement extras? How about front ends other than pip? How would the new feature impact their design choices? These aren’t rhetorical questions.

(Sorry, I know you have a specific issue here, and weren’t looking for a big debate on principles², but this is the biggest problem with the current state of Python packaging, IMO, we need to get to a situation where we can look at focused proposals without having to consider the whole of the ecosystem every time…)

¹ At least, not that I’m aware of. If I missed something, please let me know!
² You want room 12A down the corridor, this is “abuse”

jonathandekhtiar · August 6, 2020, 5:35pm

Fair enough. In all honesty I was trying to understand which direction could have the most chance to converge. This work is a first timer to me in the Python “core”. And I’m really eager to learn and hopefully will contribute more globally in the future.

So let me go ahead and try to formalize things in a PEP. Let’s see how I manage that
Let me a few days and I’ll come back to you.

If someone, is opened to a rapid “mentorship” to contributing to the Python core. I’ll be glad to have someone to ask a few questions

Thanks everyone

matham · August 12, 2020, 11:08pm

I just wanted to add another example showing why I was looking for this kind of feature.

Something similar came up here, but our GUI package Kivy supports various backends for text, image, video, etc. But, at least one backend must be installed for Kivy to work.

Currently, pip install kivy does not install any backends so kivy won’t work without additional steps. Previously we just told users to manually install the dependencies along with listing all our backend options (e.g. pip install kivy_deps.sdl2==x.y.z).

This led to many opened issues of failed installations over time due to user confusion. And this seemingly got worse in recent times because many users don’t read install docs and simply install Kivy graphically in PyCharm by searching for Kivy and then clicking install. Naturally this doesn’t install any dependencies.

My improvement was to add base and full keys in extras_require. When specified, base will install a set of per-platform dependencies that I judged a average user would want. So now we just tell users to install it with pip install kivy[base]. But this doesn’t solve e.g. the PyCharm problem.

So I was looking for a way to make pip install kivy “default” to installing base as well, but also a way to say pip install kivy --no-extras or pip install kivy[] so advanced users who want the install_requires dependencies but not the base dependencies can do that. If this doesn’t work out, perhaps we’ll just add our base dependencies to install_requires and have advanced users install kivy with --no-deps and then manually install the “real” install_requires. But ideally that would not be required.

pradyunsg · August 13, 2020, 4:50am

I wonder if it would be significantly easier to have an extra == None/extras == '<none>'/extra == "", so that we can have the following in a setup.py:

setup(
    ...
    extras_require={
        "<none>": ["pymarkdown"],
        "PDF":  ["ReportLab>=1.2", "RXP"],
        "reST": ["docutils>=0.3"],
    },
    ...
)

I think this may be a much smaller PEP, and a much simpler change for build systems (setuptools, poetry et al).

sinoroc · August 13, 2020, 7:44pm

I wonder if anything has to be done at all. If people are not able to follow the simple instructions to install a library with the appropriate extras, how would they manage to write the Python code needed to use the library to begin with? I am always dumbfounded that such trivial tasks are an issue at all. (Which makes me think it might be a documentation issue, more than a packaging/tooling issue.)

It seems harsh, but I feel like at least some of the burden has to be delegated to the user at some point. I’m all for providing a nice user experience, but if the user is a Python developer (even a beginner) then I believe it is safe to put the bar for entry a bit higher than one would expect for an app store for example.

In the case of PyCharm, I have never used it, so a question: Does PyCharm proposes in its UI a list of the available extras and makes it part of its search-and-install wizard or whatever is being talked about here? Because that would be essential and I would consider it a serious UX issue if it weren’t there. Maybe the burden should be pushed on PyCharm here.

Also I couldn’t find the issue with pip freeze being addressed in the suggestions.