What this code in dictobject.c does?

As @thomas said, for correctness.

Yes, for correctness. Because dict want to speed up also subclasses, that does not change tp_iter. But @thomas said

speeding up the subclass by slowing down the dict type may not be a win

If you remove that check, dict will be more fast using dict_merge with other dicts, and sublasses will use the slow method.

As I said, IMHO a practical solution is to create a basedict class. This way you can implement an map as you can do now with abc.Mapping, but with more performance.

I don’t see this happening. You’d have to prove this is a pretty big win before we tinker with dict, because it’s so critical to how Python operates. But of course feel free to give it a try!

But of course feel free to give it a try!

Thank you :slight_smile: My goal now is to create a frozendict. I’m doing it very dirty… After it works, I would create a basedict and make dict and frozendict “inherits” from it.

PS: my primary goal anyway it’s not to do premature or infinitesimal optimization. I feel the fact that dict, that is mutable, is the base for any map in Python now, and if you want to create an immutable map you have to use abc.Mapping, that shares no common ancestor class with dict, object apart, a little problematic.
Honestly, I really like abc and I use them a lot, if it’s useful now or for the future, and I have time… :smiley: But I don’t understand the usefulness of collections.abc. I mean, what’s the advantage of using collections.abc.MutableMapping over dict?

Not every MutableMapping has to be backed by a dictionary. Imagine if you wanted to implement a map object backed by a HAMT (Hash Array Mapped Trie). MutableMapping and Mapping is useful in this case to show that the object implements all the required methods to be substituted for a dict.

yes, i know that the linked HAMT is immutable, so not a good demo for MutableMapping but this is just an example

Okay, I understand. So dict is not a good substitute for collection.abc.MutableMapping for every case. But if we implement a basedict abstract type, that can also be easily recognized in the CPython code itself, it’s not better?

See what set and frozenset does.
There is no baseset. Why basedict is needed?

Well, if you read the topic, the reason is for dict_merge: it will be fast only for dict subclasses, not for abc.Mappings. A basedict can be used instead. And I suspect that dict_merge is not an isolated case.

See Java Map for example. HashMap and Hashtable have the same Map interface, but they have a completely different implementation. One is the classic map, the other is the old hash table.

This way, you want to check if it’s a Map, you can. In python, dict and abc.Mapping has nothing in common. In Python code this is not a problem, if you use the duck typing. But in the CPython code is a big trouble.

For example, using the @ammaraskar example, you can implement a map that uses HAMT algorithm using basedict instead of abc.Mapping and use dict_iter as tp_iter. I don’t know if this is practically possible, since probably the internal object will be very different and you can’t use the same iterator function. But it’s just an example.

dict_merge is fast only with dict subclasses because the fast path reads from internal structure.

If your frozendict shares internal structure (like OrderedDict), you can use approach same to set and frozenset.
If your frozendict don’t share internal structure, you can not use the fast path of the dict_merge.

As you said, dict_merge can not use tha fast path for HAMT. It is just an example that basedict does not help anything about it.

Yes, it shares the same internal structure. It’s a copy-paste of dict :smiley:

Now the code is very ugly, but it works. I have only to add pickle support, and __add__ and __sub__, for comparing them with immutables.Map update and delete methods. For now I do not implement fromkeys.

I already tested it with some benchs that is faster than immutables.Map. It’s sometime faster than dict too (but only in not important benchs). I have to implement __add__ and __sub__, so I can compare also the strong point of HAMT.

There’s space for future improving speed. For example, dict starts allocating memory for 8 items by default. frozendict does not need this. I can allocate

ESTIMATE_SIZE(size_iter + size_kwargs)

where size_iter is the size of the first iterator and size_kwargs is size of kwargs. And, if needed, I can also remove the unneeded allocated space, if there’s a significant advantage in memory consumption and costs little time, because you can do something like this:

>>> frozendict((("Marco", "Hitler"), ("Marco", "Stalin")), Marco="Sulla")
frozendict({"Marco": "Sulla"})

so I instantiated 3 items but at the end I need only 1, and I can remove the excess, using

dictresize(mp, ESTIMATE_SIZE(mp->ma_used))

I think also I don’t need split tables, even if I did not understood well what they are… :smiley:

IMHO in this way the creation of the frozendict will be slower of a dict, but other operations will be faster, since this will be done only once, at frozendict instantiation. Furthermore, you have not to check if the table is a split table, if the item is dummy or pending, or if I have enough free item slots.

dict implementation can’t rely on it. So it can not use fast-path for it.

This is example why dict can not use fast path for it.

I’m not talking about your current status and plan. I just asked why basedict would help it. And all you said shows fast-path in the dict object can not be used for your frozendicit.

Not for frozendict, since probably it will override tp_iter. But maybe not, I have to do benchmarks and see if changing tp_iter will boost or slow down frozendict.
Anyway, a basedict can help other map implementation.

If I can answer with a question, how Java Map is helpful?

Why?

Java’s Map is similar to Python’s abc.Mapping.

Every map-like object implements Map in Java. In Python, dict is not subclass of abc.Mapping. It’s not a subclass of any common ancestor, object apart.

So what?

>>> import collections.abc
>>> issubclass(dict, collections.abc.Mapping)
True

You have not explained what problem you are trying to solve by the basedict.
Please demonstrate it.

I was not talking about Python, but about CPython, and specifically of PyDict_Check. About the problem, read the thread.

I read this thread several times, but it didn’t clear about you are focusing about PyDict_Check.

But when I said “See what set and frozenset does.”, PyDict_Check was in my mind.
There is PyAnySet_Check. You did not explain what is the difference between the basedict and PyAnySet_Check. What is the basedict? How it will be implemented? What problem does it solve?

This thread goes far away from the original question. Please summarize your proposal clearly.

Yes, I already found PyAnySet_Check. Indeed I created a PyAnyDict_Check for now.

This is IMHO a little problematic. If you create another set-like or dict-like object, you have to know that you have to:

  1. add in Include/object.h a Py_TPFLAGS_MYCLASSNAME_SUBCLASS
  2. add it to the tp_flags of your new PyMyClass_Type
  3. add it to PyAnyDict(Set)_Check
  4. modify python-gdb.py (and Tools/gdb/libpython.py)

In python-gdb.py and Tools/gdb/libpython.py, you have to add Py_TPFLAGS_MYCLASSNAME_SUBCLASS and add a case to subclass_from_type() if a dict-like. If a set like, you have to add another entry in name_map and a case in proxyval()

None of this is needed with a commondict (and a commonset).

Furthermore, I do not understand why

  • python-gdb.py and Tools/gdb/libpython.py are the same file in two different places
  • they have to redefine the Py_TPFLAGS_*_SUBCLASS. Can’t they be exposed to them from C?

We don’t have to. We can just create set-like, or dict-like object and register them to abc.Mapping.
It’s very rare that we need to create bulitin set-like or dict-like class which shares it’s implementation with set or dict. So don’t worry about it.

We can just create set-like, or dict-like object and register them to abc.Mapping

Yes, it was the approach of immutables.Map. So you have a C map or set and you must slow it down with collection.abc… Very strange.

It’s very rare that we need to create bulitin set-like or dict-like class which shares it’s implementation with set or dict.

IMHO it’s rare because not only it’s written in C (that is a necessary evil), but it’s also obscure. I take a lot of time to really understand how to implement frozendict, searching the dict API all over the CPython code and reading setobject.c. I don’t know how much people have my patience :smiley:

Anyway, don’t worry. I’ll implement it by myself and I’ll do a PR, after making the dirty version of frozendict at least usable. No one is asking you to do it for me :stuck_out_tongue: