As @thomas said, for correctness.
Yes, for correctness. Because dict
want to speed up also subclasses, that does not change tp_iter
. But @thomas said
speeding up the subclass by slowing down the dict type may not be a win
If you remove that check, dict
will be more fast using dict_merge
with other dict
s, and sublasses will use the slow method.
As I said, IMHO a practical solution is to create a basedict
class. This way you can implement an map as you can do now with abc.Mapping
, but with more performance.
I don’t see this happening. You’d have to prove this is a pretty big win before we tinker with dict
, because it’s so critical to how Python operates. But of course feel free to give it a try!
But of course feel free to give it a try!
Thank you My goal now is to create a frozendict
. I’m doing it very dirty… After it works, I would create a basedict
and make dict
and frozendict
“inherits” from it.
PS: my primary goal anyway it’s not to do premature or infinitesimal optimization. I feel the fact that dict
, that is mutable, is the base for any map in Python now, and if you want to create an immutable map you have to use abc.Mapping
, that shares no common ancestor class with dict
, object
apart, a little problematic.
Honestly, I really like abc
and I use them a lot, if it’s useful now or for the future, and I have time… But I don’t understand the usefulness of collections.abc
. I mean, what’s the advantage of using collections.abc.MutableMapping
over dict
?
Not every MutableMapping
has to be backed by a dictionary. Imagine if you wanted to implement a map object backed by a HAMT (Hash Array Mapped Trie). MutableMapping
and Mapping
is useful in this case to show that the object implements all the required methods to be substituted for a dict
.
yes, i know that the linked HAMT is immutable, so not a good demo for MutableMapping but this is just an example
Okay, I understand. So dict
is not a good substitute for collection.abc.MutableMapping
for every case. But if we implement a basedict
abstract type, that can also be easily recognized in the CPython code itself, it’s not better?
See what set and frozenset does.
There is no baseset. Why basedict is needed?
Well, if you read the topic, the reason is for dict_merge
: it will be fast only for dict
subclasses, not for abc.Mapping
s. A basedict
can be used instead. And I suspect that dict_merge
is not an isolated case.
See Java Map
for example. HashMap
and Hashtable
have the same Map
interface, but they have a completely different implementation. One is the classic map, the other is the old hash table.
This way, you want to check if it’s a Map
, you can. In python, dict
and abc.Mapping
has nothing in common. In Python code this is not a problem, if you use the duck typing. But in the CPython code is a big trouble.
For example, using the @ammaraskar example, you can implement a map that uses HAMT algorithm using basedict
instead of abc.Mapping
and use dict_iter
as tp_iter
. I don’t know if this is practically possible, since probably the internal object will be very different and you can’t use the same iterator function. But it’s just an example.
dict_merge
is fast only with dict
subclasses because the fast path reads from internal structure.
If your frozendict shares internal structure (like OrderedDict), you can use approach same to set and frozenset.
If your frozendict don’t share internal structure, you can not use the fast path of the dict_merge.
As you said, dict_merge can not use tha fast path for HAMT. It is just an example that basedict
does not help anything about it.
Yes, it shares the same internal structure. It’s a copy-paste of dict
Now the code is very ugly, but it works. I have only to add pickle
support, and __add__
and __sub__
, for comparing them with immutables.Map
update
and delete
methods. For now I do not implement fromkeys
.
I already tested it with some benchs that is faster than immutables.Map
. It’s sometime faster than dict
too (but only in not important benchs). I have to implement __add__
and __sub__
, so I can compare also the strong point of HAMT.
There’s space for future improving speed. For example, dict
starts allocating memory for 8 items by default. frozendict
does not need this. I can allocate
ESTIMATE_SIZE(size_iter + size_kwargs)
where size_iter
is the size of the first iterator and size_kwargs
is size of kwargs. And, if needed, I can also remove the unneeded allocated space, if there’s a significant advantage in memory consumption and costs little time, because you can do something like this:
>>> frozendict((("Marco", "Hitler"), ("Marco", "Stalin")), Marco="Sulla")
frozendict({"Marco": "Sulla"})
so I instantiated 3 items but at the end I need only 1, and I can remove the excess, using
dictresize(mp, ESTIMATE_SIZE(mp->ma_used))
I think also I don’t need split tables, even if I did not understood well what they are…
IMHO in this way the creation of the frozendict
will be slower of a dict
, but other operations will be faster, since this will be done only once, at frozendict
instantiation. Furthermore, you have not to check if the table is a split table, if the item is dummy or pending, or if I have enough free item slots.
dict implementation can’t rely on it. So it can not use fast-path for it.
This is example why dict can not use fast path for it.
I’m not talking about your current status and plan. I just asked why basedict would help it. And all you said shows fast-path in the dict object can not be used for your frozendicit.
Not for frozendict
, since probably it will override tp_iter
. But maybe not, I have to do benchmarks and see if changing tp_iter
will boost or slow down frozendict
.
Anyway, a basedict
can help other map implementation.
If I can answer with a question, how Java Map
is helpful?
Why?
Java’s Map is similar to Python’s abc.Mapping.
Every map-like object implements Map
in Java. In Python, dict
is not subclass of abc.Mapping
. It’s not a subclass of any common ancestor, object
apart.
So what?
>>> import collections.abc
>>> issubclass(dict, collections.abc.Mapping)
True
You have not explained what problem you are trying to solve by the basedict.
Please demonstrate it.
I was not talking about Python, but about CPython, and specifically of PyDict_Check
. About the problem, read the thread.
I read this thread several times, but it didn’t clear about you are focusing about PyDict_Check
.
But when I said “See what set and frozenset does.”, PyDict_Check
was in my mind.
There is PyAnySet_Check
. You did not explain what is the difference between the basedict and PyAnySet_Check
. What is the basedict? How it will be implemented? What problem does it solve?
This thread goes far away from the original question. Please summarize your proposal clearly.
Yes, I already found PyAnySet_Check
. Indeed I created a PyAnyDict_Check
for now.
This is IMHO a little problematic. If you create another set-like or dict-like object, you have to know that you have to:
- add in
Include/object.h
aPy_TPFLAGS_MYCLASSNAME_SUBCLASS
- add it to the
tp_flags
of your newPyMyClass_Type
- add it to
PyAnyDict(Set)_Check
- modify
python-gdb.py
(andTools/gdb/libpython.py
)
In python-gdb.py
and Tools/gdb/libpython.py
, you have to add Py_TPFLAGS_MYCLASSNAME_SUBCLASS
and add a case to subclass_from_type()
if a dict-like. If a set like, you have to add another entry in name_map
and a case in proxyval()
None of this is needed with a commondict
(and a commonset
).
Furthermore, I do not understand why
-
python-gdb.py
andTools/gdb/libpython.py
are the same file in two different places - they have to redefine the
Py_TPFLAGS_*_SUBCLASS
. Can’t they be exposed to them from C?
We don’t have to. We can just create set-like, or dict-like object and register them to abc.Mapping.
It’s very rare that we need to create bulitin set-like or dict-like class which shares it’s implementation with set or dict. So don’t worry about it.
We can just create set-like, or dict-like object and register them to abc.Mapping
Yes, it was the approach of immutables.Map
. So you have a C map or set and you must slow it down with collection.abc
… Very strange.
It’s very rare that we need to create bulitin set-like or dict-like class which shares it’s implementation with set or dict.
IMHO it’s rare because not only it’s written in C (that is a necessary evil), but it’s also obscure. I take a lot of time to really understand how to implement frozendict
, searching the dict
API all over the CPython code and reading setobject.c
. I don’t know how much people have my patience
Anyway, don’t worry. I’ll implement it by myself and I’ll do a PR, after making the dirty version of frozendict
at least usable. No one is asking you to do it for me