Add `builtins.namespace`

So the Zen says that the most (or the least) important part of philosophy is:

“Namespaces are one honking great idea – let’s do more of those!” - I appreciate that this isn’t exactly what this quote is about, but still… :slight_smile:

Why do I like them? Why not just dict?

  1. Usual design considerations
  2. It has ~3x faster access. The way it currently is in Python - namespace (or instance attribute access) is a Mapping with access speed similar to the one of Sequence.

Motivation

Although much less used than say dict, but namespaces are still widely used and are implemented in many different places across CPython:

  1. types.SimpleNamespace
  2. argparse.Namespace
  3. multiprocessing.dummy.Namespace
  4. multiprocessing.managers.Namespace

Github search shows quite a lot of repetition of very similar class implementations:

  1. /class Namespace(\((object)?\))?:/ Language:Python - 5.6 k files link

Not all of these are true positives, but a significant part seems to be. Even if some of the cases need different functionality, inheriting from one-go-to class would already be better.


I have written and have one for my own use.
Main reason is that I am not happy with the selection:

  1. types.SimpleNamespace is slower than argparse.Namespace. It requires import (same as others), but furthermore verbosity of it is highly unattractive.
  2. argparse.Namespace lives in location which does not suggest that it is a good place to use namespace from.

Proposal

In [20]: types.SimpleNamespace(a=1)
Out[20]: namespace(a=1) <- this is pleasant

So builtins.namespace:

  1. Convenient placement
  2. Optimal performance implementation
  3. Reasonable length class name

I think this would save a fair amount of repetition if it was available as conveniently as dict.

Furthermore, some convenient features could be added later to it. E.g.

  1. __getitem__ access - there are many libraries that added attribute access to dict (such as dotwiz), but all of the attempts that I have seen had to sacrifice a lot of performance. And also, it does not make sense to add attribute access to dict as __getattr__ possible keys is a subset of __getitem__ keys. Thus, doing the other way round might make more sense.
  2. __iter__ convenience method to iter(vars(namespace)).

But these are just further considerations.

The 1st question is whether others have a similar view regarding improvement of this situation to myself and whether this deserves further consideration in the first place.

In Short

Would namespace type fit next to list, dict and other builtins, both in placement and implementation regards?

1 Like

Please could you elaborate on this point?

I’m a fan of types.SimpleNamespace and frequently use e.g. for a simple mocks in a test, or when prototyping, but I haven’t had any cause to benchmark it.

A

1 Like
class ClassNamespace:
    def __init__(self, **kwds):
        for k, v in kwds.items():
            setattr(self, k, v)

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃   macOS-11.7.10-x86_64-i386-64bit | CPython: 3.12.4     ┃
┃        5 repeats, 1,000,000 times                       ┃
┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫
┃                         Units: ns        set        get ┃
┃                                   ┏━━━━━━━━━━━━━━━━━━━━━┫
┃                   SimpleNamespace ┃       49         35 ┃
┃                    argp.Namespace ┃       15         11 ┃
┃                    ClassNamespace ┃        8          7 ┃
┃                        namedtuple ┃        -         26 ┃
┃                              list ┃       10          8 ┃
┃                             tuple ┃        -          9 ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━━┛

I’m a -1 since we already have enough similar looking things. Besides the one you mentioned we also have dataclasses.dataclass which is arguably similar. Then we also have collections.namedtuple which is once again arguably the same thing from the user’s perspective.

If we’re bothered about the performance of imports (which is a common trend I see around here), we should continue to look for optimizations in that route instead of trying to make things not need imports.

Interestingly copy/pasting that search query into github search found me nothing (link):


(though I must be doing something wrong to find nothing with it).

Also quoting the zen:

There should be one-- and preferably only one --obvious way to do it.

We already have too many ways for this and I don’t think its useful enough (in a way that it is more useful than just using an existing option) to be in builtins.

My bad, I have only written regex, not the full query. Updated in the main post as:

Github search shows quite a lot of repetition of very similar class implementations:

  1. /class Namespace(\((object)?\))?:/ Language:Python - 5.6 k files

Try this

Well, this is kind of my main point, that there is no one obvious way.
But yes, on the flip side, the cost of introducing such is non-negligible.

In the short term this would be costly and it is expected that many would deem this unnecessary.
I am just thinking whether this would pay off In longer term.

1 Like

I prefer the name SimpleNamespace for the class, since “namespace” is a very generic term that covers all kinds of things (some of which are backed by dictionaries, others are not). I’m not sure there’s enough need to have this in the builtins, but OTOH, types.SimpleNamespace isn’t all that good either; the one-line summary of the types module doesn’t suggest that you can find a thing here for providing attributes with dot notation.

Maybe the types module needs a bit of a rebrand?

1 Like

Thanks for the updated search. That makes more sense. I figure there are a lot of red herrings in there. Though the ‘similar to existing solutions’ version of a namespace may be either old code or just folks unaware of similar things.

Too bad things like linters can’t warn people that

Something similar to <line xyz> exists in the stdlib as <thing.thang>. Consider using that instead of rewriting it here.

Then at least they have to acknowledge it and # type ignore the line to get the linter to go away.

1 Like

dataclasses tend to be pretty slow to access attributes (slower than NamedTuples, which use _collections.tuplegetter to be faster), and are pretty slow just to import too since they use inspect currently.

A lot of the “existing ways” really are worse in some use cases, even when presented as a modern solution.

That said, I don’t know that adding builtins.namespace is really the right solution, we could just look for ways to make the other namespace-like things performant, possibly even using the same backing implementation if warranted, but leave dataclasses and types.SimpleNamespace as the obvious places to get that functionality, based on whether you just want a container, or the full trappings that come with dataclasses

1 Like

Caveat: Microbenchmarks are not representative, etc:

Using IPython for the %timeit command:

from types import SimpleNamespace
from argparse import Namespace as ArgparseNamespace
class ClassSetattr:
    def __init__(self, **kwds):
        for k, v in kwds.items():
            setattr(self, k, v)

class ClassVars:
    def __init__(self, **kwds):
        vars(self).update(kwds)

for name in 'SimpleNamespace', 'ArgparseNamespace', 'ClassSetattr', 'ClassVars':
    container = globals()[name]
    print('create', name)
    %timeit ns = container(a=1, b=0.0, c=None, d=..., e={}, f=[], h=())
    print('set', name)
    %timeit ns.new_item = 42
    print('get', name)
    %timeit assert ns.new_item == 42
create SimpleNamespace
501 ns ± 13.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
set SimpleNamespace
49.9 ns ± 1.39 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
get SimpleNamespace
64.9 ns ± 3.27 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
create ArgparseNamespace
2.11 μs ± 131 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
set ArgparseNamespace
50.7 ns ± 0.989 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
get ArgparseNamespace
64 ns ± 2.24 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
create ClassSetattr
2.2 μs ± 234 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
set ClassSetattr
51.1 ns ± 2.42 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
get ClassSetattr
67.2 ns ± 7.13 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
create ClassVars
2.37 μs ± 152 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
set ClassVars
57.9 ns ± 3.18 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
get ClassVars
70.5 ns ± 1.73 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

All on Python 3.13. Doesn’t seem to be much in it, and creation of SimpleNamespaces is slightly faster. But I doubt nanoseconds matter here!

Likewise, I think the ‘obvious’ way here is SimpleNamespace, and if the documentation isn’t good enough I’d be happy to review an improvement for it.

A

1 Like

Some sort of compromise could be to leave types as “go-to” place for a namespace, but add one more type to it with more convenient name and optimal performance. It could be independent (not taken from sys.implementation) and could be improved upon.

Name for example could be AttrMap. Anything shorter would do, building nested namespaces in tests is a bit of a pain:

SimpleNamespace(a=SimpleNamespace(a=1), b=SimpleNamespace(a=1))

To compare the same with dict:

dict(a=dict(a=1), b=dict(a=1))
{'a': {'a': 1}, 'b': {'a': 1}}

Performance of argparse.Namespace is unstable across versions, platforms, and there are some other strange things going on there that I haven’t got to the bottom of it yet. (One of the reasons for it is that I don’t know which one I should focus on improving)

See: 1. AttrDict and 2. argparse.Namespace performance

3.14 main, OSX:

❯ ./python.exe -m timeit -s 'import types; a = types.SimpleNamespace(a=1)' 'a.a'
5000000 loops, best of 5: 41.5 nsec per loop
❯ ./python.exe -m timeit -s 'import argparse; a = argparse.Namespace(a=1)' 'a.a'
20000000 loops, best of 5: 12.2 nsec per loop

I don’t think there’s much question that types.SimpleNamespace is the “right way” for untyped code. With typing, dataclasses or named tuples fit.

I think it’s a hidden gem right now. Once you know about it it’s very useful, but it’s hard to find.

I would not put it in types if we could move it, but it doesn’t seem worth trying to move it now.

Maybe the doc improvement is to make the docs for types separate user facing utilities from values like ModuleType which are much more about code inspection and reflection. I’ll have to look again to see if the docs are even deficient, but might put up a PR if there’s some easy improvement.

1 Like

So ok, it seems that SimpleNamespace is “one obvious way”.

Naming, placement and documentation aside, there is a performance issue.
Can anyone replicate this?

import argparse, functools, platform, timeit, types
nsa = argparse.Namespace(a=1)
nst = types.SimpleNamespace(a=1)
t_func = lambda x: int(sum(timeit.repeat(x, number=1_000_000)) / 5 * 1000)
oh = t_func(lambda: None)
t_nsa = t_func(lambda: nsa.a) - oh
t_nst = t_func(lambda: nst.a) - oh
print(platform.system(), platform.python_version(), '-', t_nsa, '&', t_nst)

Online REPLs seem to agree with me:

Linux  3. 7.4    - 36 & 39  # https://www.jdoodle.com/python3-programming-online
Linux  3. 9.9    - 25 & 26  # https://www.jdoodle.com/python3-programming-online
Linux  3.10.12   - 13 & 10  # https://www.onlinegdb.com/online_python_compiler
Linux  3.11.5    - 13 & 42  # https://www.jdoodle.com/python3-programming-online
Darwin 3.12.4    -  8 & 44  # My machine
Linux  3.12.8    -  7 & 29  # https://www.programiz.com/python-programming/online-compiler/
Darwin 3.14.0a2+ - 12 & 42  # My machine

So not sure about Windows, but it seems that something happened at 3.11 on unix, which made C implemented namespace slower than Pure Python implementation.

I think my initial benchmark table above well represents current unix situation.

The variance is high. Repeating the benchmark 100 times per version:

1 Like

Much thanks! I think this confirms that situation is prevalent on windows as well.

If 0-horizontal was drawn, I think it can be clearly seen that the difference starting from 3.11 is non negligible.

On the bright side, it is not that types.SimpleNamespace got slower, but that speed in general has improved, it just failed to reap full benefits.

Are people using namespaces so heavily in such tight loops that ~30ns differences are not dwarfed by the surrounding computations (that’s not a rhetorical question)?

5 Likes

It is not that I use namespace in various places and it becomes a bottleneck, but rather I sometimes look for what objects to use in performance sensitive places.

Naturally, I most often aim for list or tuple. However, from time to time I could substitute it for namespace for readability, but always refrain from it given current situation.

I think generally, if “one obvious implementation exists” in standard library it should ideally provide optimal performance. Especially for something which is close to builtin container and is implemented in C.

But yes, I don’t think many people would care about this as namespaces so far are mostly used as containers for final output (e.g. argparse) or places where micro-performance does not matter much (e.g. testing attributes).

3 Likes

So the speed penalty boils down to not using Py_TPFLAGS_MANAGED_DICT, but manual dict creation, which is not doing optimizations and performance is the same as the one of dict.