Adding a default extra_require environment

jonathandekhtiar · August 5, 2020, 8:25pm

@steve.dower In this specific case, it might be doable (I can’t speak for datadog). However, please consider the following:

Maintaining two packages (even if identical) has some impact, namely if you have a chain of dependencies:

project and project-lib || project[lib] and now let’s say you want to build project-cronjob. Which gives you some automation capabilities. Sweet, how do you enforce project-cronjob to depend on project or project-lib. I mean sure, your proposal works for a very specific use-case, but can potentially create a chain reaction of problems.
having two packages means that you need to package two projects (even if identical) and make sure you keep them in sync. And we all know that over time, this tends to create issues. Who didn’t have the issue of two packages supposed to be updated together and effectively one of CI jobs failed and only one was pushed to PyPI (even if it’s just a flaky error). Or whatever other issue that made the two package to fall out of sync.
Shall I even talk about the fact that it creates more mess in the documentation and the risk of confusing users ? (This point is arguable I agree).
Let’s add some funk to @ofek scenario (because I don’t like simple projects, and because overkill is fun ). So in addition to the GUI by default which needs extra dependencies compared to the CLI (like QT or TKinter). Why not adding some GPU compute (like with cupy #CUDA). After all Datadog provides data processing tools, could be a realistic scenario. But on the other hand, project managers at datadog may not want to block users who don’t have CUDA enabled GPU (like with Tensorflow: tensorflow and tensorflow-gpu) and provide a CPU backend based on numpy on option (they want to provide CUPY by default for the sweet performance you would get by default). So following your proposal, this is what I should do:
- project-lib # gives a GUI + CUPY
- project-lib-cpu # gives a GUI + NUMPY
- project # gives a CLI + CUPY
- project-cpu # gives a CLI + NUMPY
I think the previous point shows that it might be acceptable in some cases to say okay please create a 2nd package and you’re fine. However, this approach is not scalable. Let’s agree that doing the following, is just considerably less convoluted:
- project # gives a GUI + CUPY
- project[cpu] # gives a GUI + NUMPY (keep GUI dependencies & replace CUPY with Numpy)
- project[cli] # gives a GUI + CUPY (remove GUI dependencies & keep CUPY)
- project[cli,cpu] # gives a CLI + NUMPY (remove GUI dependencies & replace CUPY with Numpy)

Maybe the term extra_requires is just not the one that shall be kept and we shall change that to something more like:

environments: CPU vs GPU, CLI vs GUI
default_environment: which combination of environments (str or list/tuple of str) is default if None is specified.

Would be nicely backward compatible since we can build on top of extra_requires, issue a sweet and easy to understand deprecation warning, retain most of the current mechanic and finally have a more “easy to understand” and flexible way to manage default “extra” (which, as some of you pointed out, could be a little weird, if it’s “extra”, why shall there be a default)

from setuptools import setup

environments = {
   'cpu': ["numpy"],
   'gpu': ["cupy"],
   'gpu': ["tkinter"],
   'cpu': [],
}

setup(
    name='test_pkg',
    version='1.0.0',
    environments=environments,
    default_environment=["gpu", "gui"]  # solution A
    default_environment="gpu,gui"       # solution B
)