Multiple related programs: one `pyproject.toml`, or multiple projects?

JDLH · July 16, 2022, 11:12pm

I am working on a set of related command-line programs, and I don’t understand how best to structure them into Python packages and projects. These commands do individual related steps of a workflow. They share data structures and functionality.

The obvious way to structure their source code is as one project in one repository. I can register a console_scripts entry for each program. Shared modules are in one src/ tree. Each program can easily import any of them. However, then I have only one pyproject.toml. I can have only one project name, one issue date, one version number, which all the programs must share.

If I want to allow separate project names, issue dates, and version numbers, it seems I would need to make them multiple projects. Each would have its own respository, its own pyproject.toml, its own src/ tree. I would have to structure the shared modules as importable packages as their own project, or maybe as a git subproject which each program’s repository inherits. That seems like more complexity and repeating myself.

Are there other models which I am missing? Something which lets each program have its own version number, but which lets them share code?

I am hoping someone can point me to something clearly better which I am failing to imagine. Thank you!

jcgoble3 · July 17, 2022, 1:23am

There’s no requirement for the root of a package folder to also be the root of the Git repository, so my thoughts here are to have all programs in one repository containing multiple packages. The repository root will contain a directory for each package, with each of those folders containing all of the project-specific stuff typically found at the root of a repo (such as pyproject.toml). Your CI script or other build process can cd into each folder and build the package, then cd into the next folder and so on.

Shared code can be its own package (and associated directory) on which the programs using it can declare a dependency. The PyPI description should just say that it is an internal package not meant to be installed alone.

So your repository can look something like this:

myprograms
|
+-- common/  <-- shared files go in this project
|   |
|   +-- src/
|   +-- pyproject.toml
|
+-- program1/
|   |
|   +-- src/
|   +-- pyproject.toml
|
+-- program2/
|   |
|   +-- src/
|   +-- pyproject.toml
|
+-- .gitignore
+-- .gitlab-ci.yml (or the CI file for your choice of CI service)
+-- LICENSE.txt
+-- README.md

EpicWink · July 17, 2022, 1:33am

If you’re going to version your programs separately, I suggest they go into different repositories, otherwise making releases gets a bit complicated when you look at the Git history. I do have a little bias against monorepos, however.

Another comment is that you can have multiple commands under the one app using argparse.ArgumentParser.add_subparsers.

frostming · July 17, 2022, 2:02am

I created a tool to manage Python monorepos. It works like lerna:

You can use whatever backend you like: setuptools, pdm, hatchling, flit

It is still in its early phase and any feedback is welcome

vbrozik · July 17, 2022, 2:43pm

Similarly to Laurie I also think the clean approach is to have separate repositories for the tools to be released separately and one more repository for the shared code.

To minimize the repeated work when creating the individual repositories you can use templates. One of popular tools is Cookiecutter:

or you can use some kind of git templates:

cameron · July 18, 2022, 12:37am

By Laurie O via Discussions on Python.org at 17Jul2022 01:43:

If you’re going to version your programs separately, I suggest they go
into different repositories, otherwise making releases gets a bit
complicated when you look at the Git history. I do have a little bias
against monorepos, however.

By contrast, I love the monorepo! At least for interrelated things.
Nothing stops you having multiple projects and releases inside a
monorepo. That’s how my personal projects are handled.

Do you really want distinct releases per command? If each is complex,
that makes sense. If they are smallish and interrelated, maybe you just
want to release the lot as one project with one release number. I’m
thinking about command A being dependent on the “current” revision of
command “B” - if you always release as a single thing that’s always in
sync because A and B come out together; if you release them individually
you may want to include versioned dependincies i.e. command “B” requires
at least revision A3 of command “A”.

Obviously this is up to you, but there’s a complexity tradeoff.

Cheers,
Cameron Simpson cs@cskk.id.au

JDLH · July 18, 2022, 4:19am

Thank you, everyone, for the ideas.

It is helpful to have pointed out that there can be multiple src/ trees and pyproject.toml files in a single repository. I don’t think I recall seeing that possibility discussed in the Python Packaging User Guide.

The debate between monorepos and multirepos is interesting. I see the advantages to both choices.

It is early days on my project, and I expect it will always be a small project, so I am starting it out as a monorepo with a single pyproject.toml file. However, I now have some conceptual models for directions in which it can grow.

CAM-Gerlach · July 23, 2022, 2:43am

I’m not a big fan of monorepos and don’t really use them (much), but in general the biggest challenge of managing lots of different repos is not creating their infra, but maintaining them, particularly things like updating common portions of the readme, contributing guide, other meta files, gitignore, gitattributes, GitHub Actions workflows, other CI config, test config, linter config, tooling config, packaging config, etc, etc.

I’ve experimented with several tools for handling this, but they’ve all had caveats for the use cases I run into, and also require a lot of work to create the cookiecutters, as seemingly all of the ones out there are out of date or have some issue or another. I even have had plans to create my own wrapping pre-commit and cookiecutter, but that seems an impossible dream given the time involved…

ssbarnea · May 2, 2023, 10:10am

I would be curious to see if anyone can share a link to a working monorepo setup, one using setuptools and that survived the test of time.

I am less worried about managing changelog for each package but more about not ending up wasting precious time debugging bugs in all the tooling around, that might get confused but such a layout.

Still, having a single linters configuration could be seen as nice feature when using a monorepo.

steve.dower · May 2, 2023, 10:25am

I have no idea how much has been overridden these days, I suspect a lot. And there are dedicated people whose job is to keep it all running, but that’s inevitable at this scale.

The subrepos are in the sdk directory. I don’t believe they’ve switched to pyproject.toml throughout the source tree yet, but they may be generating them for releases (and if not, that’s likely where they’ll start, and I’d better go poke them to get on it before the legacy behaviour in pip stops working).