How do I package submodules separately?

Hi,
I have a package “foo” with subpackages bar and baz. I now want to package foo.baz separately.

So I create subdirectories packaging/foo and packaging/foo-baz and create the relevant pyproject.toml there … but now what? There doesn’t seem to be a standard TOML tag under [projiect]to tell the build system where the sources are or which files to include/exclude, at least according to the spec … or is there?

If that is build system specific, which would be recommended for this usecase, given that I need to exclude the foo.baz submodule from the foo package?

Bonus question: versioning. Ideally I’d like to retrieve foo.baz’s version from the latest reachable foo-baz/#.#.# git tag, but adding an explicit version to the TOML is OK too.

Indeed, there isn’t.

Yes, and I think pretty much any build back-end likely handles this.

Well I sure hope so. The question remains, though: how do I tell my build system, whichever I end up using, where the sources are and what to include (or not)? Looking at the docs for Hatch, for instance, showed an immediate lack of settings for the former, leading me to assume that “. or src/ or nothing” is the commonly-assumed answer. However, my desire to add a bunch of symlinks and/or to selectively copy files into the build directories is … low enough that I’m looking for a better way.

Maybe this answers your question: Creating project files from a package's command line interface - #7 by sinoroc

Separately installable and importable sub-packages are called namespace packages.
Here is a guide: Packaging namespace packages - Python Packaging User Guide

@merwok Thanks, but that’s not the question. The question is how to generate a (namespace) package when it’s in the same directory tree as the package whose namespace it is in, and how to tell the builder which directory to use, given that there are now two (or more) pyproject.toml files that want to package part of the same tree.

The guide you linked to is not helpful here because it has separate subdirectories for each namespaced subpackage. That’s not the case in my (mono)repository.

In other words, given

foo/quux.py
foo/bar/__init__.py
foo/bar/baz.py
packaging/foo/pyproject.toml
packaging/foo-bar/pyproject.toml

what do I write into those project files so that when I run pybuild-or-whatever in directory packaging/foo (a) the builder figures out that the sources are in ../.., (b) I get a nice and shiny wheel for foo that includes foo/quux but not foo/bar, and (c) when I’m doing this in packaging/foo-bar I get a wheel for foo.bar?

I could of course write a script to autogenerate an ugly symlink tree in each and every packaging subdirectory, but that’s a major hassle, tends to get out of sync, and doesn’t play at all well with running pytest.

Yes, the guide is telling you to change that. I don’t think python packaging tools are designed to create separate distributions from (what they see as) a single package. So if you want to distribute separate packages with a common parent, then you should use namespace packages.

2 Likes

That is a shame. I have been wanting something like this myself in a number of situations. One is that I have tests like:

./proj
./proj/pkg1
./proj/pkg1/tests
./proj/pkg2
./proj/pkg2/tests

I would like to make the tests a separate package but reorganising the entire layout is very disruptive. If anyone knows of a simple way to separate this then that would be great. In my context I would be fine with the idea that pip install . installs the package without the tests even if a bit of extra fiddling is needed when building the tests package.

Packing Python code that is not in a subdirectory of where pyproject.toml is, will be very difficult and annoying to do. It is very unconventional.

Since the build back-end setuptools can be configured via a setup.py script in which you can obviously write any Python code you like, it should be very much possible with setuptools. The way to do this is to customize setuptools commands. The documentation on this topic is not good, though: Extending or Customizing Setuptools. You will likely need to do a lot of research, and trial and error. There should be some examples floating around the internet; most of them are bad examples, though, so be careful.

I guess some of the other build back-ends might also allow this kind of unconventional flexibility that you require, maybe scikit-build-core via CMake.

Now, if you had a project source tree directory structure like the following, it would make everything much easier for you:

foo
├── foo
│   ├── pyproject.toml
│   └── src
│       └── foo
│           └── quux.py
└── foo-bar
    ├── pyproject.toml
    └── src
        └── foo
            └── bar
                ├── baz.py
                └── __init__.py

OK. Thanks for everybody’s feedback. I will write a script to find and hard-link the requisite source files, and “their” tests, into individual src subdirectories. This seems to be the least work all around.

A structure using several toplevel subdirectories is what I had before. The problem is that the packages are too closely related, and testing changes that affect more than one of them becomes way more prone to errors than with a monorepo.

If I were you, I would check each of the main build back-ends. Maybe one of them does what you need after all, I can not claim to know all the features of each one of them. Also check the Development workflow tools (I have not listed uv there yet). As far as I understood some of those do some quite advanced stuff to handle closely related projects (libraries, applications) and break out of the conventional workflow of 1 source tree directory, 1 library, 1 virtual environment, and so on. I do not really know why “monorepo or not monorepo” would be relevant here (I also never truly understood what is special about monorepos), but I think I have seen some of those tools be mentioned as “monorepo-friendly” or something like this (maybe uv’s workspaces or PDM’s solution).

Good luck! You can always come back with further questions, maybe once you have made some progress towards a more concrete issue.

I’ll chime in here as well with a related need where I’ve been looking for a solution for a long time (at a very low intensity level, there’s plenty else to do, and the current state isn’t exactly “broken”). Have a project that lays out like this (much abbreviated)

proj/
├── moduleA.py
├── moduleB/
└── Tools/
    ├── toolA.py
    └── toolB/
...

The code in Tools are sort of plugin-like, when called they check for possible external dependencies, and if found initialize themselves and are available for use. e.g. the “lex” tool would look for the external utility lex, or flex, or win_flex, or anything else that could meet the purpose. One of those tool modules causes us some problems: it’s big, it’s very rarely used, and it contains a bunch of support files (which are mainly xml/xsl) vendored from someone else’s project that is under a different license, so it complicates our general claim that the project is under MIT. Because it has twice as many files as the whole rest of the package, scanners think we’re lying when we say it’s Python/MIT since they see more files that don’t fit our claim than ones that do. So would like to package that one path as a separate namespace package so only people who need it have to install it, and so the metadata on it can indicate the correct license for it.

Move to a different structure? Because it’s been like this for two decades, I’m not getting buy in to do that (at the moment). Was this structured the right way from the beginning? Easy to argue no, but times were different.

P.S. (added later, I submitted too early): also, moving this directory path to a separate place to aid packaging would make it more complicated for us: the project itself uses that to build the documentation, which we normally do directly from a checkout.

A monorepo complicates things when you want to tag releases. Subpackages frequently have their own release cadence, esp. when you use (some semblance of) semantic versioning.