How to best structure a large project into multiple installable packages

I hope this question is not too easy or generic for this forum!

What I do not understand is what the recommended way is to split a large python project into several parts so that some parts of the project can installed separately. In addition it would be good if those optional parts could register themselves with the main package and could live as subpackages of the main package.

So lets say there is already some package “mainproject” which can be installed using pip install mainproject and which provides mainproject.Class1 and mainproject.sub1.Class2.

Now at a later time user wants to install mainproject_sub2 and this should add mainproject.sub2.Class3.

Is this possible and how to do it best?

I guess my question really is if and how one can distribute a package that adds modules to another existing package upon installation, or how to accomplish something that is very close to this.

You probably need namespace packages. Ignore the Legacy Namespace Packages part if you only care about Python 3.

Thanks! If I understand this correctly though, this only works if the “namespace package” is not a “normal” package which has an and implementation files in the root like my existing package? Does this mean it can only be done if the existing package gets refactored and thus the API changed (since currently it is possible to do from mainproject import Class1 – but if I understand correctly this would not be possible with a name package?

The current existing package tries to make the main components easy to install so the file does its own imports so that the user can then easily do stuff like from mainproject import a,b,c without having to specify the actual modules from which to import a,b,c. Again, this would not be possible with namespace packages, is that correct?

If all that is correct I think I cannot use namespace packages because I think refactoring the existing mainproject package would break backwards compatibility too much and result in something that is too inconvenient to use.

Put another way: my understanding is that a namespace package allows several subpackages to be children of the same root package, but does NOT allow that the root package also has its own implementation, i.e. everything must then come from some subpackage not the root package? Which would be really unfortunate if the bulk of the implementation should be in the root package but one wants to add additional optional subpackages later.

There’s actually a (mis?)feature in pip that allows a package to install into another package, so if you make mainproject (a regular package) and mainproject.sub1 (a namespace package), pip would happily “combine” them into one directory on installation (and is able to uninstall them correctly). The caveat is it won’t work if you put the regular and namespace package in different site-packages (only the regular one would be picked up).

But your observation is correct—no, there’s no way to support what you want if you want to do things “by the book” and make sure everything always works. The only way to simulate it would be to make mainproject some kind of proxy that tries to import mainproject_sub1 when user imports mainproject.sub1. This can be done by hooking into the import system (with importlib), and/or ultilising the module-level __getattr__ feature available in 3.7+.

1 Like

Thanks for confirming my suspicion. That is a bit disappointing from my point of view.
I had been thinking of something similar to what you describe so that the additional package would get installed as “mainproject_sub4” int “mainproject_sub4” but can be loaded as “mainproject.sub4” through some runtime trickery. But this would require that “mainproject” actually knows about all the new packages that may get added as new subpackages in the future which is also not ideal but maybe doable for important cases.

I guess I will just accept the uglyness of having to have all the new stuff in a different root package.

Thank you a lot for your help!

You can include the init file on your main package, just make sure you don’t also include it in the subpackages.

Basically, namespace packages exist because pip cannot count how many times a file has been installed. So if each of your packages overwrites the init file, the first one to be uninstalled will delete it and break everything.

You’re going to be (slightly) faster and (significantly) more secure/reliable if it’s there. So as long as only your main package adds the init file, you can add/remove submodules as their own packages

1 Like

Oh - this sounds good! I have to admit I did not digest all the documentation and PEPs for this and I am a bit scared to rely on something that may eventually turn out to be forbidden or unsupported.

So to make sure I understand you correctly: if “mainproject” has an init file in the root directory and also provides submodule “sub1” (with its own init file), I can still package and distribute “mainproject_sub4” to only provide “mainproject/sub4” with an init file and implementations only in the “sub4” directory? It would be perfectly fine if the “mainproject_sub4” distribution would not even work without “mainproject” being installed (on which it would have to depend anyway).

That sounds really good!

1 Like

Yep, you’ve got it :+1:

We do a similar thing (and a few variations) throughout the Azure SDK for Python, which is designed to have a small common core and then let you choose which services you want to use (so you only need dependencies for things you’re using, etc).