I have a project where a legacy “utils” or “common” type of package exists called myproject. I am now required to write a few auxiliary libraries that do not belong under the “common” package, but which I still would be beneficial if namespaced under the myproject top level.
To be more explicit, my setup is currently something like this:
myproject, a top-level package
datagen, a top-level package
And I would like to change it to be this:
myproject, a top-level package
myproject.datagen, where myproject is a namespace packageanddatagen` is a regular package
I tested this in a clean environment with Setuptools (full example below) and (somewhat to my surprise) it worked perfectly.
Is this a bad idea? Are there any risks or downsides?
I don’t see anything in PEP 420 advising against it, nor do I see it mentioned as part of the spec.
Here is a more complete description of the working setup I devised:
Where dir1/setup.py uses packages=find_packages('src') and dir2/setup.py uses packages=find_namespace_packages('src').
When I ran pip install ./dir1 ./dir2 I found that I was able to import things from myproject.utils just as well as I could import things from myproject.datagen.thing3.
Evidently, the non-namespace package was not “clobbered” by the namespace pacakge (or vice versa). Is this just the lucky outcome of undefined behavior in Pip and/or Setuptools? Or is this combination of package and namespace package intended to function in this manner?
I wouldn’t say it’s a bad idea, but there are some caveats you must know when you use this pattern. The import system would return a package with __init__.py immediately when it’s found without searching the rest of the sys.path, so your namespace package will not be picked up if it’s installed in a path with lower priority than the non-namespace one (because the import system will stop without ever seeing the namespace package). This may or may not be problematic depending your use case.
It seems like as long as they’re both installed to the same place (i.e. both installed with Pip) it should be fine. But maybe it’s a bit too magical for me to feel comfortable using “in production”, without some kind of blessing or even acknowledgement from a PEP or other spec.
I think in general maybe we need a better specification of how install tools like Setuptools are supposed to handle package name conflicts.
Creating a namespace package
============================
There are currently three different approaches to creating namespace packages:
#. Use `native namespace packages`_. This type of namespace package is defined
in :pep:`420` and is available in Python 3.3 and later. This is recommended if
packages in your namespace only ever need to support Python 3 and
installation via ``pip``.
#. Use `pkgutil-style namespace packages`_. This is recommended for new
packages that need to support Python 2 and 3 and installation via both
``pip`` and ``python setup.py install``.
#. Use `pkg_resources-style namespace packages`_. This method is recommended if
you need compatibility with packages already using this method or if your
package needs to be zip-safe.
.. warning:: While native namespace packages and pkgutil-style namespace
packages are largely compatible, pkg_resources-style namespace packages
are not compatible with the other methods. It's inadvisable to use
different methods in different distributions that provide packages to the
same namespace.
The issue arises when they are installed in different paths and the dir1 package comes before dir2 in sys.path, as @uranusjr explained.
I would create a namespace package in dir1 to avoid this. Perhaps myproject.ext, and then you can install your extension to myproject.ext.datagen. Well, in this case you wouldn’t even need to put a namespace package on dir1, just dir2.
As long as myproject.ext is a namespace package everywhere, everything should be good.
This has a drawback to have 3rd party packages breaking things by accidentally including a __init__.py in myproject.ext.
This sounds very much like how we designed the Azure SDK for Python, which has a couple hundred packages that may be installed under the top-level azure namespace. The actual design of it is not ideal, because it was built to support Python 2 as well (and avoid the perf impact of old-style namespace packages).
Basically, the best layout is to rely on PEP 420 for development (i.e. add all of your src directories to sys.path so that you can import everything - though this won’t work in your case because dir1 does not contain a namespace package), but then ensure that your wheels create concrete packages when installed.
So what you describe in the first post seems fine to me, and we’re definitely using the pattern successfully in production code. All I’d suggest is that you make dir2 package require dir1's package so that you can’t install the “wrong” half of your library. You probably want that to be a tight dependency too, but maybe your code is set up to be flexible here.