I was surprised to find that the .keys() returned by a subclass of collections.UserDict are not iterable, even though it will return a KeysView object.
For example,
from collections import UserDict
class CustomDict(UserDict):
pass
>>> d = CustomDict()
>>> d['a'] = 'b'
>>> reversed(d.keys())
TypeError: 'KeysView' object is not reversible
>>> reversed(d.data.keys())
<dict_reversekeyiterator object at 0x10a3d6f70>
compared to
>>> d = dict()
>>> d['a'] = 'b'
>>> reversed(d.keys())
<dict_reversekeyiterator object at 0x10a3d7740>
As shown above, the underlying UserDict.datadict is reversible, though.
To clarify: It is iterable, it is not reversible though.
>>> iter(collections.UserDict().keys())
<generator object KeysView.__iter__ at 0x7fc13414be00>
>>> reversed(collections.UserDict().keys())
Traceback (most recent call last):
File "<python-input-3>", line 1, in <module>
reversed(collections.UserDict().keys())
~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'KeysView' object is not reversible
Also notable is that the keys() iterator is a generator object. It comes from here:
class KeysView(MappingView, Set):
...
def __iter__(self):
yield from self._mapping
So my guess is, nobody thought to also add a __reversed__ method, possibly since iteration order wasn’t a defined feature of dictionaries back when that was created. When was it created? According to the git logs, the yield from line came from a 2012 update to replace for... yield loops, and prior to that, I think that code dates all the way back to 2007 when collections.abc was first introduced.
I’m gonna say, the reason it isn’t reversible is that nobody’s ever asked for it to be reversible If you have a good use-case for it, go ahead and make this a proposal. Write a patch for Lib/_collections_abc.py to add this functionality, and see if anyone else agrees that this is worth doing.
I’m not sure. It’s possible that there’s some distinction that I’m not aware of. It seems too obvious - wouldn’t that have worked back in 2007 too? There has to be something I’m overlooking here.
This seems to be how UserDict implements it, but not collections.abc.MutableMapping. It’s also significantly faster in simple perf testing using %timeit
class K:
def __init__(self, m: dict[Any, Any]):
self.m = m
def __iter__(self):
yield from self.m
class J:
def __init__(self, m: dict[Any, Any]):
self.m = m
def __iter__(self):
return iter(self.m)
>>> d = {i: str(i) for i in range(1000000)}
>>> k = K(d)
>>> j = J(d)
>>> list(k) == list(j)
True
>>> %timeit list(k)
39.4 ms ± 362 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit list(j)
14.8 ms ± 118 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Seems like just binding __iter__ to the mapping is even faster, but it doesn’t work unless explicitly called:
class L:
def __init__(self, m: dict[Any, Any]):
self.m = m
self.__iter__ = m.__iter__
>>> l = L(d)
>>> list(l)
TypeError: iter() returned non-iterator of type 'NoneType'
>>> list(l.__iter__())
[1,2,3, ...]
>>> %timeit list(l.__iter__())
7.14 ms ± 79.6 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit list(d)
6.05 ms ± 44.1 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Let me know if my testing here is totally wrong. It’s interesting that there seems to be such a ‘huge‘ difference between all these methods.
I can’t say for sure that it’s wrong, but my gut feeling is that there’s some edge case somewhere that isn’t being handled here. After all, you don’t use UserDict when you exactly want dict, so there has to be some sort of reason.
It’s possible that UserDict doesn’t matter any more though. Back when it was first created, I believe dict couldn’t be subclassed; today, that’s not a limitation, so you can often create a subclass of the vanilla dictionary with all of its functionality.
So maybe the problem isn’t actually one of functionality, but of demonstration. If the purpose of UserDict (and similarly what’s happening in collections.abc.Mapping) is for people to browse the source code, then it doesn’t matter what the performance is like - it’s better to implement things directly. But if that’s the case, ehhh, I still don’t know.
I tend to agree with this. I’ve used UserList and UserString a few times, but almost never UserDict. I do find it interesting that this issue seems to permeate the Mapping abc though. That means that UserDict is actually faster at iterating keys than something that inherits collections.abc.Mapping.
There does seem to be a bit of a disconnect between the base implementations of the abstracts and their real counterparts though. Which has always irritated me a bit. Guess now that you can directly subclass the builtins you really only need to use the abcs for typing?
list(k) being much slower doesn’t surprise me. Sending every element through an extra generator iterator is slow.
But I’m surprised the last three aren’t equally fast, as they all do the exact same work after negligible setup. And I can’t reproduce this, for me they are equally fast.
collections.abc.KeysView probably shouldn’t be reversible, since not all mappings do preserve order. Defining it there would require that ability for the API. Maybe UserDict could have its own subclasses though.