Help packaging optional application features, using extras?

Hey everyone! Excuse the long post. I’ve broken it up into sections to hopefully make it easier to jump between sections. Any thoughts/recommendations would be very welcome.

Intro

I have a long-term minesweeper project, which is a GUI application that I make available both with PyInstaller packages and on PyPI. I try to keep up with the latest best practices, and I’ve been following this Discourse category for a while now! I’m in the process of trying to move away from having a setup.py, but while diving deeper into python packaging there’s a few things I’m trying to understand…

I’ve reached the point where I’d like to add some complexity to the packaging of my app:

  • Make some features optional (e.g. I provide a CLI tool for querying online highscores, which has a few extra dependencies)
  • Provide some performance-critical functionality using Zig/C/Rust and package up into platform-dependent wheels

I’ll keep this post focused on the first point, but may start a separate post about the second point later on.

The question

What’s the recommended way to give the user the ability to control the set of features that gets installed (e.g. with pip)?

More specifically, what I’m interested in is:

  • Making extra features available without making them mandatory, to minimise the default set of dependencies
  • Having the option for a ‘minimal install’ that installs the minimum code/dependencies required for the app to run, i.e. a way to remove dependencies from the default set

I’m also confused that some examples online seem to suggest having ‘test’ as an optional extra (for running the project’s tests) - is this recommended practice? What’s the expected case where this is useful to the user performing a ‘pip install’?

What I’ve tried and thoughts

The relevant part of my setup.py (as it is at the time of writing) is here.
This does the following:

  • packages and package_data include everything needed for all extras - I haven’t found a way to conditionally include based on requested extras
  • install_requires and extras_require include base dependencies and extra dependencies, as you’d expect (this seems to be the only part of packaging that’s actually intended to be supported by extras?)
  • entry_points includes one default entrypoint and one entrypoint that’s only desired when an extra is specified, although the extra marker seems to be ignored such that if the extra isn’t installed then the script is still created but won’t work since the extra dependencies aren’t installed

The latter point is covered at Console_scripts entrypoints hidden behind extras are always installed · Issue #9726 · pypa/pip · GitHub, in particular the last comment which suggests moving code into separate packages such that ‘extra’ entrypoints are managed by ‘extra’ dependencies. The problem with this is that in my case it’s my implementation of the ‘extra’ that depends on the main application code, not the other way around…

In theory I could refactor out into package dependencies such as minegauler -> minegauler-core and minegauler[bot] -> minegauler-bot -> minegauler-core i.e. factor out the core application code that’s needed by extras into a third package such that the main package has a regular dependency on the core and an ‘extras’ dependency on the CLI tool, which also depends on the core… but this feels like a lot more faff than it should be - is proliferation of packages really the answer?

If the recommendation is to use many packages such that extras manage their own code/data/entrypoints and can be declared as simple dependencies then does it make sense to keep them together in a single repo in their own subdirectories? Something like:

|-- minegauler/
|   |-- minegauler/
|   `-- pyproject.toml
|-- minegauler-bot/
|   |-- minegauler_bot/
|   `-- pyproject.toml
`-- minegauler-core/
    |-- minegauler_core/
    `-- pyproject.toml

I don’t like this because project files don’t live at the root of the repo, and I think that breaks being able to pip install using the GitHub URL… So I guess I’d be left managing 3+ dependent repositories at once.

I’m left feeling this use-case isn’t really properly supported in python packaging and I might need to roll my own solution. Does anyone have any better suggestions?

I think it is more of a loose convention, just because it is quite practical and there does not seem to be any better way so far. Some tools, I am thinking of Poetry for example, have something like “development dependencies”, but they are not recognized by any other tool. So it does not combine well with the rest of the ecosystem (tox, etc.), whereas “extras” are easy to combine.

As far as I know, the list of “extras” offered by a project does not appear anywhere (not on PyPI for example), users are usually not aware that they could potentially install minegauler[test], unless they read the packaging configuration. So it is not really an issue. Users will only know about the “extras” that are mentioned explicitly in the install documentation of the project.

Your suggestions seem well-thought and clean. But maybe if I were you, I would keep things simple. If a user installs minegauler without the bot extra, and then calls minegauler-bot, I would output:

Please install 'minegauler[bot]' to use this command.

Thanks for the detailed and well-written writeup; its very helpful understanding your situation and

I’m not entirely clear if you mean "moving away from having a setup.py" in terms of making your existing config declarative, or switching to a build backend other than Setuptools.

If you want to move away from a dynamic setup.py and to a static, declarative setup.cfg, basically everything in your existing setup.py is directly portable to the same. A package like setup-py-upgrade will do most of the heavy lifting for you.

You could also consider switching to another build backend, like Poetry, which makes much of the rest of this easier.

A few other things I noticed:

  • The pyproject.toml is missing the most important part, the build system config; see the appropriate section of the official packaging tutorial.
  • It looks like you’re using PEP 420 namespace packages? They aren’t that commonly used these days, can make things much more complex, and it isn’t clear to me why they’re needed within the same project.
  • Consider the PyPA-recommended src layout, which combined with the previous should help avoid some of the current complexities in your find_packages() config, as well as other issues.

This indeed sounds like a pretty idiomatic application for Extras.

Yep, that’s exactly what extras are for.

Right now, extras doesn’t currently offer the ability to remove deps from the default, as it gets rather complex if you specify multiple extras, some of which add deps and some of which remove them, but there have been plenty of proposals and discussion on potentially adding a “default” extra that would be a minimal set of deps for the package, upon which the other extras could build:

Opinions differ as to how idiomatic this is. Some feel it isn’t really what extras are intended for, and better ways of doing this are availible when using more advanced build backends like Poetry. On the other hand, if the latter isn’t being used, it can be more convenient, accessible and less duplicative than manually maintaining and installing requirements-dev.txt, or as an abstract equivalent to the frozen deps in that file.

Often you’ll see test, lint and other development-related dependencies (basically the contents of your requirements-dev.txt file) consolidated into a dev extra (which is easier now that at least with current pip, extras can depend on other extras, reducing duplication). The intention is to allow developers/contributiors, and potentially repackagers, who are e.g. installing from a GitHub clone, Git tarball, sdist, etc. to more easily install and run the tests without having to manually install (and maintain a duplicate list of) dependencies in requirements-dev.txt, which isn’t always shipped or easily available as a file.

Yes; if you have hefty package data that you want to separate out along with them, you’ll want to move that to a separate package.

That’s right; extras are really just (more or less) groups of optional dependencies.

Yes. Per the current Entry Points spec:

Using extras for an entry point is no longer recommended. Consumers should support parsing them from existing distributions, but may then ignore them. New publishing tools need not support specifying extras. The functionality of handling extras was tied to setuptools’ model of managing ‘egg’ packages, but newer tools such as pip and virtualenv use a different model.

I’m a little confused by this, sorry. In general, your implementation of the extra is going to depend on the main package, creating a dependency graph as follows:

             cli_extra
             /       \
            /         \
           V           V
main_application    < cli_extra deps >
        |
        |
        V
< main app deps >

So I’m not sure how your case differs.

Yeah, this would be a good way to do it, as I understand. Keeping your package more modular is generally recommended, as it decreases coupling, aids reusability, replaceability and pluggability, and allows downstream users to be more specific about their own dependencies, but it does increase overhead to some degree.

What you’re talking about here is basically a “lite” version of the “monorepo” approach, which has both upsides and downsides. It isn’t very common in Python packaging, but it’s what Google and some others favor, particularly in other languages and large corporate environments. It makes it substantially easier and more atomic to make broader changes that affect multiple different, discrete parts of an application/system. On the other hand, there are some usability tradeoffs in your development workflow, as most tooling isn’t really designed around it and contributors won’t expect it. See Google for some more detailed perspectives on this.

While there are certainly reasons not to use a monorepo, it is perfectly possible to still install from the GitHub URL by using the subdirectory parameter. See the pip docs.

As I understand your needs, it is more or less supported, it just requires some restructuring of your project, as many others do. I’d generally advise trying to advise against rolling your own tooling unless you’ve exhausted other options (which there appear to be a several), the benefits are clear (in this case, they don’t appear all that decisive), and you’re willing to accept the long-term maintenance risks and costs of doing so.

I appreciate the responses, it’s all pretty much as I expected but there’s some useful points that have been made.

My current thoughts can be summarised in response to following:

I understand your perspective, but I think what I was really hoping for and referring to when I said “not properly supported” is having finer-grained control over what gets packaged/installed from within a single project.

This was actually the reason I originally set things up as namespace packages - my intention was for all code to be installed under the ‘minegauler’ namespace, but for the packages within that namespace to have clear separation such that they could be installed separately (with ‘app’ being the core/default, and other subpackages being ‘extras’).

My hope was that I could control subsets of the code, data and entrypoints that would be installed. I originally came up with two possible approaches: using extras, or using separate package names. Unfortunately it seems this is not supported by extras (which only control which dependencies are installed), and to have separate package names I’d need multiple pyproject.toml or setup.py files (either in some kind of non-standard monorepo setup or across separate repos).

Therefore I still feel the level of control I’m looking for isn’t properly supported within a single project. Does that sound fair?


Point-replies below…

That’s a reasonable suggestion, although the error message would only make sense in a pip environment (i.e. not in a PyInstaller package), but that’s probably not a concern in practice. I guess what I was really hoping for was a clean way to exclude all of the code, data and the entrypoint related to the bot.

I did have another idea: create a skeleton python project named ‘minegauler-bot’ that has a hard dependency on the main ‘minegauler’ project, but the entrypoint is defined in the ‘minegauler-bot’ package. Then minegauler[bot] would simply depend on minegauler-bot as an extras_require.

I mean moving to pyproject.toml instead of setup.py, although I’m expecting to continue using setuptools unless there’s something that seems a better fit.

I appreciate the input on my current packaging setup, I actually did some testing of the new setuptools support for PEP 621 recently (see Help testing experimental features in setuptools and my branch).

I was briefly, but added an __init__.py in when I started hitting problems… What you see in my setup.py is actually an effort to only package some of the subpackages, but maybe I should just package it all and support the other subpackages via other ‘dev extras’. This comes back to my desire for extras to also control the project code that’s packaged as well as managing dependencies…

Yeah, I had read this discussion and realised it would serve as a possible solution to what I was describing, sorry I probably should have dug it out and linked it.

Yeah, I have read that and decided against the src layout - personally I prefer to avoid using editable installs, which I believe become somewhat required (at least unless all imports are relative)?.

This resonates with me, since this project does attempt to maintain a requirements-dev.txt and have it be in sync with requirements.txt. I might consider switching to using extras for specifying these deps, although my current dev workflow doesn’t involve pip installing my own project (I’m assuming people who use a ‘test’ or ‘dev’ extras in development would use editable installs?).

I guess this is the main kind of thing I was hoping there was a better answer to :frowning:

My case is the standard case, apologies for the confusing wording :slight_smile: My point was that it wouldn’t be possible to simply move out the code for minegauler[bot] into a new minegauler-bot package while keeping the minegauler[bot] extra, because the two projects would then depend on each other.

Sure, I’m totally on board with modularity. However, using separate projects feels like huge overkill in this case - I’m able to maintain pretty good discipline with modularity within the project, and don’t want to have to worry about interdependencies and versioning between what are really just internal parts of the project!

I certainly don’t plan to go overboard with any kind of roll-my-own solution - I was thinking more along the lines of having multiple pyproject.toml files (perhaps generated using a script) to allow me to push different parts of the codebase as different packages from within what’s otherwise a normal single-project layout. I’d definitely prefer to avoid any kind of hacky solution like this though, hence asking for input!

Taking a step back for a moment, I guess the one question I haven’t really asked is “why”? I.e., could you explain what’s driving your requirements that the different pieces of your application be independently installable—not only in terms of dependency stacks, but code and data as well—but that do not allow for separating it into different modular repos, into subprojects within the same repo, or top-level import packages within the same distribution package? Without this, it feels like this may be a bit of an XY problem, or perhaps an overoptimization, because I’m having trouble seeing a clear rationale for this, at least in your case.

Right now, at least testing in a conda env without symlinks, the basic Python runtime and bootstrapping deps itself is around 25 MB download size, and over 100 MB unpacked including all environment ancillaries. Installing, with pip, just the core deps of your minegauler package (no extras) is just under a 60 MB compressed download size, and results in a 260 MB total env size. Installing the bot extra only adds around another 500 kB in download size, and around 1 MB total to the environment. Your application itself is just under 1 MB total size in either sdist or wheel form, and around 1.5 MB unpacked.

As such, the size of all the code and data in your application itself is only around 1.1% the total download size, or 0.38% the unpacked size, of the runtime and environment dependencies it requires to operate. Therefore, unless you add ≈2 orders of magnitude more code and data files, reducing just those dependencies (which Extras does allow you to do) is going to make much more difference than anything else to the end-user heft—and this is the typical case for all but a few of the heftiest applications in the wild.

Also, to note, your core application requires a 60 MB download of deps (150 MB unpacked), not counting the Python runtime and stock virtual environment deps, whereas your extra only requires 500 kB (1 MB unpacked). Therefore, the practical motivation for not simply requiring such deps by default, given they only add 0.8% to the total download size (and that’s not including the interpreter and core support packages), seems a little unclear.

There’s also another aspect—your project appears to be a game, i.e. an end-user GUI application, rather than a library or CLI tool. I would think that at least at the kind of scale where the total additional download heft is really going to outweigh the additional work on your end to minimize it, your end users aren’t en-masse downloading and installing Python, firing up a terminal, creating a venv, using pip to install minegauler, and then launching it with a CLI command every time.

Rather, I’m guessing that if there is a good number of them, most of them are by and large using the PyInstaller standalone version, for which (at least with most such tools, such as pynsist that we use for Spyder) you can control via the spec file, config, build process, etc. what deps and potentially even parts of your application get installed, so you can package “lite” and “full” versions (as we do). As such, pretty much everything we’re discussing is moot, since you can just create (or script the creation of) multiple standalone versions to suit your needs.

You can do this, but it adds a fair degree of extra complexity and potential caveats to look out for, and is not particularly common in the Python world, at least for this purpose. As such, you’ll want to carefully consider the practical benefit to your users of doing so, which, as mentioned above, seems a little unclear to me.

Yup, technically that would work to prevent the entrypoint from getting installed while avoiding a circular dependency. However, I’m not how much practical benefit that would be, given the above size considerations, and the fact a command that prints a friendly message explaining how to make it work (via pip, pyinstaller, etc) seems more useful, I’d think, then a generic command not found (and I’m not sure how users would know to run the command anyway, unless instructed to by documentation). Whereas making minegauler the metapackage that installed the requested subpackage(s) via extras, or just all of them, would allow you to actually separate the import packages alongside the distribution packages, as well as their deps, entry points, data files, etc.

Ah, that’s great. I was going to mention that as something to look out for in the future, I just didn’t want to overwhelm you with options, but it seems you’re well ahead of the curve. Its great that you tested it; just keep in mind that it might take some time for the support to stabilize and be widely available, and there is an official converter for setup.cfgpyproject.toml, not setup.py, so it would likely be worth doing the less trivial part of converting your config in your setup.py to setup.cfg now (as well as pasting in the PEP 518 config to your pyproject.toml, which you really should do as soon as you can), which gets you most of the way there, and then the transition to PEP 621 and setuptools config in pyproject.toml should be a breeze.

Yep, because not having a top-level __init__.py is how you define a PEP 420 namespace package. If you remove the top-level __init__.py (whlie keeping __init__.pys in the separate subpackages), it should more or less work (with various caveats in various contexts).

I’m not sure exactly what you mean by “dev extras”?

This gets pretty far from the designed purpose of extras, which are to control optional dependencies, not split one top-level import package into multiple subpackages with separate code, data, entry points, etc… For that there are many other tools, such as namespace packages, making them separate packages in the same repo, making separate repos, packaging them in different standalone installer versions via installer tools, using specialized build tools that enable this, etc.

I’d be curious to hear the reasons for this, and what you do instead—editable installs are the standard workflow for developing Python projects, are simple to do and don’t allow you to accidentally pick up whatever version is in your CWD instead of what is actually installed, which helps reduce the risk of hard to catch packaging and testing issues, as well as user mistakes and confusion, and allow you to use your package just as if it was installed, instead of resorting to hacks.

As an alternative, if you want to keep the requirements file format and use frozen dependency versions (appropriate for an application) while avoiding duplication, pip-tools is designed for that use case. You can also use more powerful and advanced package management systems, like Poetry.

I’m a little curious what you do do, since that’s the standard way of developing Python projects.

Ah, I understand what you’re saying now. Well, you might be able to technically do that, since minegauler-bot only depends upon minegauler; only minegauler[bot] depends upon minegauler-bot. But that would be rather strange; it wouldn’t be that much work to create a small metapackage that just depended upon minegauler (core) and then exposed the various extras.

To be honest, requiring separate distribution packages feels like nearly equally large overkill, given the small size of your project and the fact that it is an end-user application, not a library.

1 Like

I think that in general, packaging Python applications is not a well-supported scenario. The packaging ecosystem is mostly focused on packaging libraries. Yes, entry points allow your library to expose a command interface, but the packaging capabilities are still designed around libraries.

There are plenty of options for packaging a Python application up, but there’s no really consistent story (and no good guides that I can find).

I would definitely support efforts to improve application packaging - that would likely involve looking at tools like pyInstaller, cx_Freeze, etc, as well as getting a better idea of the constraints/requirements involved in optional application features such as you describe.

But that’s a longer-term thing, and not directly of help for you right now…

2 Likes

Just to clarify, my point was more that for the broader context of the user’s case, I wasn’t clear on the practical motivation for breaking up the application itself into different distribution packages, that also made the various options presented non-viable. But I could potentially see some specific cases where this could be necessary, and more broadly, as one of the maintainers of a large, widely used FOSS Python application myself (the Spyder IDE/scientific environment), I can certainly wholeheartedly agree with all the points @pf_moore made.

Yeah, fair question. A part of my asking is wanting to understand what options are available in python packaging (e.g. for other projects, not just the one I’m using as an example here). Aside from that I think it’s just a desire for a ‘clean’ solution, i.e. where I can have the repo structured in what I consider the most practical way, and then package up only what end-users should need.

From the discussion we’ve had, perhaps you’d describe what I want as some kind of ‘light monorepo’ but without wanting to deviate from python project norms. That is, I want to logically split up packages while also controlling exactly what gets packaged up. I also prefer the idea of telling users to get optional extras using e.g. minegauler[bot] over minegauler-bot, since the former is clearly part of the same project.

I get the point about this case not really being a problem in practice for pip installs (with it being a game, etc.), but again this is partly a desire for a clean solution (only package what’s needed) and also wanting to understand what’s possible for other projects I manage.

There are pros and cons with the src approach, and I’ve considered them carefully multiple times over the years :slight_smile:

The big selling points of using a src directory for me are:

  • Easy to find the code from the root of the repo (I’m used to the package name being used now, but previously I would have found it easier to locate code if it was in src alongside tests etc.)
  • Avoid accidentally using the repo code rather than the packaged code when running tests etc.
  • Avoid including the repo root on PYTHONPATH (where there’s setup.py etc. that could be imported)

However, the big loss for me is simply being able to run the code from the repo - if you clone the repo and install the dependencies you should just be able to run it IMO, without any need to switch it to ‘package form’ or mess with paths. Especially with this project as an application as opposed to a library, the package isn’t the primary concern - it’s just one of many ways for the user to play the game (where another is to clone the repo, install the deps and run it!).

The two recommended src layouts (AFAIUI) are <project-root>/src/<package-name>/... and <project-root>/src/.... I assume the first case is normally preferred, since in the second case it would make it impossible to import the code without packaging/installing it (e.g. with editable installs). In the first case you’d need to put src on PYTHONPATH, which doesn’t seem particularly unreasonable, so I might try it out (with run.bat and run.sh scripts that set things up for a user cloning the repo to easily be able to run the application).

Just use my code as it is in the repo, installing dependencies using the requirements file. The repo provides the code, and the python package is just a way of distributing it. Since this isn’t a library, I prefer to keep it as “repo provides the code” first and foremost.

Yeah, I think you’re right. The only reason I started talking about separate packages was as a workaround for not having all the control I wanted with ‘extras’ (minegauler[feature-X]). I doubt I’ll bother at all though in this case :slight_smile:

1 Like

I think “cd to the code directory and run the code” is an entirely reasonable model for an application style project. It’s how I write things like admin scripts, personal utilities, etc. I’d really like it if Python had a good model for bundling something like that up to share it, but right now, we really don’t.

@njs posted a good summary of the idealized lifecycle of a Python project to distutils-sig, over 3 years ago. I wish it had got more attention, because for me it really does highlight the pain points in sharing Python code, unless you’re writing a library (which is the focus of the current ecosystem). So we are still stuck in a “make everything look like a library, because that’s the problem we have tools to solve” mindset, and as a result people like you struggle to make sense of the packaging guides, because they are designed for a fundamentally different set of assumptions.

The big problem is that what we need is better advice, how-to guides, tutorials, etc. Not necessarily more tools. And yet, there seems to be very little consensus to back up such guides - not least because there’s such a huge variety of scenarios. I wish I could help, but honestly, I’m more likely to be a consumer of such a guide than a producer…

2 Likes

Yeah, for me this particular case seems to be one of “practicality over purity”, to quote the Zen of Python (import this), but I can certainly see cases where it does matter.

I’m not sure exactly what you mean by the second one, unless you’re referring perhaps to single-module projects (i.e. <project-root>/src/<project-name>.py) for which the src layout isn’t generally recommended (AFAIK). You still need one (or more) top-level import package directories with __init__,pys in them underneath the src directory (unless, of course, you make src your top-level import package name, which is most certainly not what you would want).

In general, rather than trying to muck with PYTHONPATH which is usually a Bad Idea™, and which gets into the territory of “why not just use a ‘proper’ editable install then”, in your scripts, why not have your script just cd into src and run directly from there (as @pf_moore also mentions)?

You could specify only your test requirements in requirements-dev.txt, and have users install the package and that file for testing.

It’s technically possible to set up a PEP517 back-end which takes in as flat set of Python modules and produces a nested package structure, although I’m not sure any of the popular back-ends have that functionality

1 Like