Mixing namespace and non-namespace packages: bad practice?

gwerbin · March 6, 2021, 2:28am

I have a project where a legacy “utils” or “common” type of package exists called myproject. I am now required to write a few auxiliary libraries that do not belong under the “common” package, but which I still would be beneficial if namespaced under the myproject top level.

To be more explicit, my setup is currently something like this:

myproject, a top-level package
datagen, a top-level package

And I would like to change it to be this:

myproject, a top-level package
myproject.datagen, where myproject is a namespace packageanddatagen` is a regular package

I tested this in a clean environment with Setuptools (full example below) and (somewhat to my surprise) it worked perfectly.

Is this a bad idea? Are there any risks or downsides?

I don’t see anything in PEP 420 advising against it, nor do I see it mentioned as part of the spec.

Here is a more complete description of the working setup I devised:

dir1/
  setup.py
  src/
    myproject/
      __init__.py
      thing1.py
      thing2.py
      utils.py

dir2/
  setup.py
  src/
    myproject/
      datagen/
        __init__.py
        thing3.py

Where dir1/setup.py uses packages=find_packages('src') and dir2/setup.py uses packages=find_namespace_packages('src').

When I ran pip install ./dir1 ./dir2 I found that I was able to import things from myproject.utils just as well as I could import things from myproject.datagen.thing3.

Evidently, the non-namespace package was not “clobbered” by the namespace pacakge (or vice versa). Is this just the lucky outcome of undefined behavior in Pip and/or Setuptools? Or is this combination of package and namespace package intended to function in this manner?

uranusjr · March 6, 2021, 7:01pm

I wouldn’t say it’s a bad idea, but there are some caveats you must know when you use this pattern. The import system would return a package with __init__.py immediately when it’s found without searching the rest of the sys.path, so your namespace package will not be picked up if it’s installed in a path with lower priority than the non-namespace one (because the import system will stop without ever seeing the namespace package). This may or may not be problematic depending your use case.

westurner · March 6, 2021, 8:00pm

Is this in the docs for namespace packages?

gwerbin · March 10, 2021, 3:30pm

Thanks, that makes sense.

It seems like as long as they’re both installed to the same place (i.e. both installed with Pip) it should be fine. But maybe it’s a bit too magical for me to feel comfortable using “in production”, without some kind of blessing or even acknowledgement from a PEP or other spec.

I think in general maybe we need a better specification of how install tools like Setuptools are supposed to handle package name conflicts.

ericvsmith · March 10, 2021, 7:35pm

I’d say it’s a bad idea. I wish I’d advised against it in PEP 420.

gwerbin · March 10, 2021, 10:40pm

Thanks for weighing in! Is it too late to advise against it in the Setuptools docs and/or the Python docs?

westurner · March 10, 2021, 11:01pm

What’s in Packaging namespace packages — Python Packaging User Guide could be incorporated into the setuptools and/or python docs. From packaging.python.org/packaging-namespace-packages.rst at main · pypa/packaging.python.org · GitHub :

Creating a namespace package
============================

There are currently three different approaches to creating namespace packages:

#. Use `native namespace packages`_. This type of namespace package is defined
   in :pep:`420` and is available in Python 3.3 and later. This is recommended if
   packages in your namespace only ever need to support Python 3 and
   installation via ``pip``.
#. Use `pkgutil-style namespace packages`_. This is recommended for new
   packages that need to support Python 2 and 3 and installation via both
   ``pip`` and ``python setup.py install``.
#. Use `pkg_resources-style namespace packages`_. This method is recommended if
   you need compatibility with packages already using this method or if your
   package needs to be zip-safe.

.. warning:: While native namespace packages and pkgutil-style namespace
    packages are largely compatible, pkg_resources-style namespace packages
    are not compatible with the other methods. It's inadvisable to use
    different methods in different distributions that provide packages to the
    same namespace.

FFY00 · March 10, 2021, 11:13pm

I’d say it’s a bad idea because your users won’t know about this.

This works because pip will install them together, in this case there’s no namespace package. You’ll end up with.

myproject/
   __init__.py
   thing1.py
   thing2.py
   utils.py
   datagen/
       __init__.py
       thing3.py

The issue arises when they are installed in different paths and the dir1 package comes before dir2 in sys.path, as @uranusjr explained.

I would create a namespace package in dir1 to avoid this. Perhaps myproject.ext, and then you can install your extension to myproject.ext.datagen. Well, in this case you wouldn’t even need to put a namespace package on dir1, just dir2.
As long as myproject.ext is a namespace package everywhere, everything should be good.

This has a drawback to have 3rd party packages breaking things by accidentally including a __init__.py in myproject.ext.

ericvsmith · March 10, 2021, 11:17pm

I think updating the Python docs to advise against it is a good idea.

steve.dower · March 11, 2021, 10:10pm

This sounds very much like how we designed the Azure SDK for Python, which has a couple hundred packages that may be installed under the top-level azure namespace. The actual design of it is not ideal, because it was built to support Python 2 as well (and avoid the perf impact of old-style namespace packages).

Basically, the best layout is to rely on PEP 420 for development (i.e. add all of your src directories to sys.path so that you can import everything - though this won’t work in your case because dir1 does not contain a namespace package), but then ensure that your wheels create concrete packages when installed.

So what you describe in the first post seems fine to me, and we’re definitely using the pattern successfully in production code. All I’d suggest is that you make dir2 package require dir1's package so that you can’t install the “wrong” half of your library. You probably want that to be a tight dependency too, but maybe your code is set up to be flexible here.