How to best structure a large project into multiple installable packages

johsmi9933 · October 9, 2020, 7:29am

I hope this question is not too easy or generic for this forum!

What I do not understand is what the recommended way is to split a large python project into several parts so that some parts of the project can installed separately. In addition it would be good if those optional parts could register themselves with the main package and could live as subpackages of the main package.

So lets say there is already some package “mainproject” which can be installed using pip install mainproject and which provides mainproject.Class1 and mainproject.sub1.Class2.

Now at a later time user wants to install mainproject_sub2 and this should add mainproject.sub2.Class3.

Is this possible and how to do it best?

I guess my question really is if and how one can distribute a package that adds modules to another existing package upon installation, or how to accomplish something that is very close to this.

uranusjr · October 9, 2020, 7:33am

You probably need namespace packages. Ignore the Legacy Namespace Packages part if you only care about Python 3.

johsmi9933 · October 9, 2020, 8:04am

Thanks! If I understand this correctly though, this only works if the “namespace package” is not a “normal” package which has an init.py and implementation files in the root like my existing package? Does this mean it can only be done if the existing package gets refactored and thus the API changed (since currently it is possible to do from mainproject import Class1 – but if I understand correctly this would not be possible with a name package?

The current existing package tries to make the main components easy to install so the __init__.py file does its own imports so that the user can then easily do stuff like from mainproject import a,b,c without having to specify the actual modules from which to import a,b,c. Again, this would not be possible with namespace packages, is that correct?

If all that is correct I think I cannot use namespace packages because I think refactoring the existing mainproject package would break backwards compatibility too much and result in something that is too inconvenient to use.

Put another way: my understanding is that a namespace package allows several subpackages to be children of the same root package, but does NOT allow that the root package also has its own implementation, i.e. everything must then come from some subpackage not the root package? Which would be really unfortunate if the bulk of the implementation should be in the root package but one wants to add additional optional subpackages later.

uranusjr · October 9, 2020, 8:36am

There’s actually a (mis?)feature in pip that allows a package to install into another package, so if you make mainproject (a regular package) and mainproject.sub1 (a namespace package), pip would happily “combine” them into one directory on installation (and is able to uninstall them correctly). The caveat is it won’t work if you put the regular and namespace package in different site-packages (only the regular one would be picked up).

uranusjr · October 9, 2020, 8:40am

But your observation is correct—no, there’s no way to support what you want if you want to do things “by the book” and make sure everything always works. The only way to simulate it would be to make mainproject some kind of proxy that tries to import mainproject_sub1 when user imports mainproject.sub1. This can be done by hooking into the import system (with importlib), and/or ultilising the module-level __getattr__ feature available in 3.7+.

johsmi9933 · October 9, 2020, 9:04am

Thanks for confirming my suspicion. That is a bit disappointing from my point of view.
I had been thinking of something similar to what you describe so that the additional package would get installed as “mainproject_sub4” int “mainproject_sub4” but can be loaded as “mainproject.sub4” through some runtime trickery. But this would require that “mainproject” actually knows about all the new packages that may get added as new subpackages in the future which is also not ideal but maybe doable for important cases.

I guess I will just accept the uglyness of having to have all the new stuff in a different root package.

Thank you a lot for your help!

steve.dower · October 9, 2020, 9:10am

You can include the init file on your main package, just make sure you don’t also include it in the subpackages.

Basically, namespace packages exist because pip cannot count how many times a file has been installed. So if each of your packages overwrites the init file, the first one to be uninstalled will delete it and break everything.

You’re going to be (slightly) faster and (significantly) more secure/reliable if it’s there. So as long as only your main package adds the init file, you can add/remove submodules as their own packages

johsmi9933 · October 9, 2020, 9:15am

Oh - this sounds good! I have to admit I did not digest all the documentation and PEPs for this and I am a bit scared to rely on something that may eventually turn out to be forbidden or unsupported.

So to make sure I understand you correctly: if “mainproject” has an init file in the root directory and also provides submodule “sub1” (with its own init file), I can still package and distribute “mainproject_sub4” to only provide “mainproject/sub4” with an init file and implementations only in the “sub4” directory? It would be perfectly fine if the “mainproject_sub4” distribution would not even work without “mainproject” being installed (on which it would have to depend anyway).

That sounds really good!

steve.dower · October 9, 2020, 11:42am

Yep, you’ve got it

We do a similar thing (and a few variations) throughout the Azure SDK for Python, which is designed to have a small common core and then let you choose which services you want to use (so you only need dependencies for things you’re using, etc).

dgorjup · March 19, 2021, 5:27pm

Hijacking this thread, as my use case seems pretty similar to John’s.

If I understand correctly, it should be possible to have a mainproject package with its own functionality (and an __init__ file in the main directory), and then have additional namespace packages that can be installed seeparately.

I have tried this, with the following structure:

mainproject/
	setup.py
	mainproject/
		__init__.py
		
project_a/
	setup.py
	mainproject/
		project_a/
			__init__.py
					
project_b/
	setup.py
	mainproject/
		project_b/
			__init__.py

When only project_a and project_b are installed, everything works as expected - I can import them as mainproject.project_a and mainproject.project_b, respectively.

Also installing mainproject breaks things - now only the core functionality of the mainproject is part of its namespace, mainproject.project_a and mainproject.project_b can no longer be imported (persumably because they indeed cannot be found in “site-packages/mainproject…”).

Have I misunderstood Steve’s answer?

uranusjr · March 19, 2021, 5:36pm

It depends very much on what you mean by “install”. There are several ways to put a package on sys.path, and the approach works only for some of them.

dgorjup · March 19, 2021, 7:24pm

Well in my case, these were very simple test packages, not much different than those in the sample-namespace-packages example, and I installed them into a clean virtual environment using setup.py install.

I found that, while the “native” way did not work as I had hoped, adding a pkgutil-style __init__.py file into the namespace directories of the project_a and project_b packages, as well as (and probably most importantly), to the top of the mainproject's __init__.py file, made the whole thing work as expected. I could install (and uninstall) the mainpackage, project_a and project_b separately, and I could now import the two subpackages as mainproject.project_a and mainproject.project_b.

This is what my working file structure looks like now:

mainproject/
	setup.py
	mainproject/
		__init__.py # now including `__path__ = __import__('pkgutil').extend_path(__path__, __name__)`
		
project_a/
	setup.py
	mainproject/
		__init__.py # pkgutil-style __init__
		project_a/
			__init__.py
					
project_b/
	setup.py
	mainproject/
		__init__.py # pkgutil-style __init__
		project_b/
			__init__.py

brettcannon · March 19, 2021, 9:03pm

You can’t define an __init__.py anywhere for namespace packages to work for the namespace you want to be expandable. Delete mainproject/__init__.py and things will function appropriately.

dgorjup · March 20, 2021, 8:25am

Thank you, that’s how I understood things initially. I thought Steve’s answer above suggested otherwise, but I must have misinterpreted it.

johsmi9933 · March 23, 2021, 10:30am

Just as a reminder: the point of this thread is to find a solution for packages which are NOT just namespace packages but come with their own functionality. So the aim is to combine some main project mainproject which has its own functionality, subpackages and modules and then install additional subpackages subpackage1 subpackage2 which ADD functionality to the mainproject and importable as mainproject.subpackage1 or mainproject.subpackage2.
Ideally this should work independently of HOW these packages got installed (though a requirement to install mainpackage first would be acceptable).

TBH after all I have read about this I have been cautious and avoided this because I do not want to rely on something that may break at some point, may break depending on the way it gets installed or may turn out to not be supported officially at some point. It would be good if there would be some kind of official opinion on this - either support it properly or forbid it.

sinoroc · March 23, 2021, 7:35pm

I have only skimmed this thread. But maybe that is the “official” statement on this topic.

steve.dower · April 4, 2021, 6:18pm

My answer implied “packages installed via pip” which would mean your actual structure would look like:

site-packages/
	mainproject/
		__init__.py
		project_a/
			__init__.py
		project_b/
			__init__.py

If you aren’t doing an actual installation and getting them all into the same directory on sys.path, then my suggestion doesn’t apply.

If you want to reference parts of your package through separate sys.path entries, you need to leave out all the __init__.py files. Otherwise, if they’re all being installed into one place, provided only one package provides it (and that package is always installed), it’ll be fine.