Monorepo approach to handle multiple projects

Hello Python Experts,

I have packages x, y, z that each does something different. These packages will likely grow and each has a given version and goes into pypi.

  • Those packages are used in my job for certain task, they depend on each other.
  • They are constantly evolving and need new releases once a week or more.

I have two problems:

  • Development: I need to push them and publish them all, because at a given time I would want to test them and they depend on each other. Having different projects means that I have to make sure that all of them were always pushed/published

  • User: I need to pull them all and make sure that they are up to date before using them. The development and use do not necessarily happen in the same machine.

Ways to do this are:

  • Have a script that loops over the projects and pushes/publishes only what has changed and what is needed. That script would also pull or re-install all the projects when needed. However now I have to maintain more synchronization code.

  • Putting all the projects in a single package. This would mean I only have to pull/upgrade once, but now x alone is not reusable, I would have to reuse the whole bundle. I would like modular and reusable code.

I see that all the packages can be placed in a single git repository, which would make easier pushing and pulling. However when publishing, I would have to still write something to do the job. The post also suggests doing the publishing in a CI script, I might try that. Do you have any thoughts on this?

Cheers.

A few thoughts to add to my post that you linked to:

  1. The big issue with using a monorepo is that there’s no practical way to maintain different versioning for each project with Git tags. You could accomplish this with a version.py file in each project, but Git tags become meaningless unless you release all projects together at once under the same version number. This isn’t a blocker, but you have to put a bit more thought into how you want to do versioning.
  2. Building and publishing is just a couple of simple commands (python -m build and twine upload ...), which is extremely easy to throw into a CI script (the hardest part is keeping the PyPI API token secret). In fact, many project maintainers never build or publish by hand, doing those tasks only via CI/CD services. I’m one of those maintainers.
  3. You shouldn’t ever need to publish your own packages to develop against them. That’s what editable installs (pip install -e .) are for: to develop against your local working copy. Publishing/releasing is for users.
  4. I haven’t ever had a reason to look this up to be sure, but I think Python builds support producing multiple distinct packages from a single build command and a single pyproject.toml. It’s worth looking into.

Another option is to make tags like such:

x-1.2.3
y-1.2.3
z-1.2.3

A script can check if all (sub)modules x, y, and z have the same latest version, and if so, pack them all and release them in one fell swoop.

This allows x, y, and z to actually grow independently of each other.

Hello @jcgoble3 @pepoluan

Thanks for your replies. Just to put this in context, I have a package p0 which will encompass p1, p2, etc. I.e. p0 is our project and has a bunch of dependencies. However all those dependencies are evolving at the same time.

This monorepo approach seems to entail that:

  • The histories are merged. We cannot track each project separately.
  • The developer needs to clone the whole repository, even if he cares only about p1 and p2. E.g. changes in p5, that is completely unrelated to the developer’s job, will have to be merged and seen by the guy working on p1 and p2.

I am not sure if that’s what we want. The way we did this in the past was, having some sort of master project that for each p1, p2,… project and in a given branch branch:

  • Check that all projects are cloned.
  • Pulls, thus making sure all the projects are up to date with the remote.
  • If there are local changes not commited, exit. The user will commit them and rerun
  • If the version is different from any tag, tag the project, then push. The CI will then publish the new tagged project to pipy.

or in other words:

project_manager -n sync -k project_list.txt

It was meant to make sure we synchronize our packages between machines and users in an easy way.

Cheers.