Should `**` unpack iterables of iterables?

Is it possible right now to make a user-defined class that is compatible with **d apart from by subclassing dict?

Generally I think it is better if these things are based on well-defined method-based protocols rather than nominal types. In the case of *d it is the iterator protocol. For **d it can be the iterator protocol as well except the expectation is to yield tuples for the pairs.

For an object to be unpacked with **, it needs to support:

  • A .keys() method that returns its keys, and
  • A __getitem__ method to access values by those keys:
class MyMap:
    def __init__(self, data):
        self._data = data

    def keys(self):
        return self._data.keys()

    def __getitem__(self, key):
        return self._data[key]

def f(a, b):
    print(a, b)

f(**MyMap({'a': 1, 'b': 2}))

Result:

1 2
2 Likes

True, but this is already true for dict.__init__, dict.update.

And this sounds like sort of pretty much everything, where there is some sort of protocol and machinery is used after input to determine if the input is valid.

And part of those cases are also similar in a way that this can become apparent only after large amount of work has been done. Some of them apart from dict.__init__ and dict.update:

  1. opr.attrgetter('a', object()). Internally it will need to iterate through *args.
  2. isinstance(arg, (str, object()))

I can see that this is best avoided if possible, but given this is only an extension of already existent protocol and this has been done numerous times in other places, I don’t see how this can be a detrimental factor.

While these methods do accept multiple types, I think they do so for historical reasons. I think a more “modern” interface would have had dict.from_items instead of having dict support iterables and mappings in the same spot. Similarly, update wouldn’t need to take a mapping since |= supplants that use.

Interfaces that support multiple types in the same spot are problematic because:

  • they limit the type errors that are caught by type checkers, which reduces the errors caught, and
  • they result in code that is harder to read and reason about (the minor benefit to code writers is rarely worth making code harder to read).

I don’t think it’s good langauge design to try to make iterables of pairs work like mappings.

2 Likes

So ok, I guess / hope we agree that conversion key-value-pairs <-> dict is pretty fundamental one.

And what would be the default argument type to dict.__init__?

So what would dict.update take then?


All in all, I don’t find arguments such as “if this was done from the beginning it would have been done differently” very convincing.

Instead, I am much more prone to taking the situation at hand the way that it is and see what action / inaction is the most sensible.

Of course, if that something can be changed to the way that “it should have been done from the beginning”, then it should be done, I suppose.

But if the situation at hand is that “it is how it is and is not changing”, then decisions should ideally be made conditional on this fact.

A single mapping, in my opinion. I realize many people would disagree with me. Just my opinion.

It’s also interesting that the usage dict(a=foo, b=bar, **c, **d) is a bug magnet whenever cor d don’t contain string keys. Some linters will ask you to use the dict display instead.

An iterable of pairs.

That’s fair, but my argument wasn’t just about idealism. I am arguing that limited interfaces are better in principle, whereas you’re arguing for more permissive interfaces. I don’t think the existence of permissive interfaces is evidence that permissive interfaces are better. They’re still bad. And I think we should avoid creating more.

2 Likes

I don’t have strong opinion on this. My approach is to digest as much information as possible, to collect as many use cases as possible and arrive at optimal implementation - sometimes more permissible is better, sometimes separating constructors/methods based on input type is better.

But I think I do agree with you that the latter, in practice, is much more often the case, at least when I look back it seems so.

Not really, I am concentrating on this specific case.

I don’t like hard rules like that too much. There are always exceptions. E.g. a case where theoretically (proven by science or whatever) only 2 types of input exist. But yes, I get what you are saying and agree to a large degree.


But again, I am more interested in the case a hand, given the situation that it is.

What I mean is, I don’t see any added value compared to simply creating a dictionary from key-value pairs. It doesn’t even improve performance. Unpacking does not cause rehashing (citation needed).

Also, the current behavior reflects a good separation of concerns. The ** operator does not need to handle creating dictionaries from key-value pairs.

class Key:
    def __init__(self, name):
        self.name = name
    def __hash__(self):
        print(f"Hashing {self.name}")
        return hash(self.name)

a = {Key('a'): 1}
print("Unpacking...")
b = {**a}

Result:

Hashing a
Unpacking...

Overall, this would only slow down the ** operator slightly and make it a more complex, ‘magic’ syntax.

But dicts themselves are iterables and their keys can be iterables too, so it’d become unclear what this should do:

{**{(1, 2): 0, (3, 4): 1}} # {(1, 2): 0, (3, 4): 1}} or {1: 2, 3: 4}?

Maybe not in the way that you think I meant, but it does avoid intermediate dictionary when one is in a situation, where one has iterable of key-value pairs.

For {**d1, **pairs, **d2}, this would be as performant as:

d = dict(d1)
d.update(pairs)
d.update(d2)

Although the former might be more convenient syntax for some.

But some performance benefit would be there for func(**pairs) which is not available now.

Yes, I suppose hashing is much larger factor. Well maybe not that much larger for some optimized types, such as int / str (not sure about this), but it doesn’t matter as no rehashing is done and it is not what I wa referring to.

Maybe, maybe not. Depends on POV. If I am user in the situation where I have key-value pairs and want to source them as arguments, then my concern is to do it in most convenient way possible, and:

foo(**pairs)

could be more convenient than

foo(**dict(pairs))

for many users.

With additional performance benefit of not creating intermediary dict object and only needing 1 iteration instead of 2.


This would do same thing as it does now. Same as:

d.update(dict)

would do the same thing regardless whether

d.update(pairs)

is allowed or not.

But how exactly does the interpreter determine what it is? Does it prioritize strict dicts, or all objects that implement the collections.abc.Mapping protocol, which really only requires methods of __getitem__ , __iter__ , __len__, which are also implemented by a sequence?

The idea is that it would work exactly as dict.update. cpython/Lib/_collections_abc.py at c2428ca9ea0c4eac9c7f2b41aff5f77660f21298 · python/cpython · GitHub


Also, don’t get me wrong, I am not hard-vouching for this, just taking time to respond to arguments that I don’t find convincing enough (regardless of whether they are for or against).

2 Likes

The user may recognize pairs of items in the iterable, but the interpreter only sees individual items, not key-value pairs.

This means updating continues until a failure occurs. The user would need to wrap it in a try/except block, similar to how update is handled, or create a dictionary beforehand.

At first, it seems like a convenient shortcut, but it also inherits all the complications of creating a dictionary from pairs.

try:
    func(**pairs)
except:
    exit()

…is not visually appealing.

Ah fair enough. dict.update indeed prioritizes objects that implement the mapping protocol. Since the proposal is simply about making the ** operator consistent with the existing behavior of dict.update, I think I’m in support of the proposal now.

I think this is one of those errors that are captured by global try-except with error message “Contact developer”.

Then this should also always be done?

try:
    func(**dict(pairs))
except:
    exit()

as it is prone to exactly same error.

It does not seem to have any additional technical issues compared to {**dict(pairs)} and func(**dict(pairs)).

At least none have been pointed out so far.


Ok, I will stop defending this and will leave this fun part to OP if he wishes to pursue this.

Personally, I am positive on this conditional it can be shown it would be useful enough in practice to the degree that the effort and additional complexity is justified.

If there is little usage, then I am +0.1, given there is agreement that dict.update argument type protocol is not changing. I just like consistency I suppose, given the opportunity costs can be inferred accurately enough and are minimal. E.g. the API is final enough to tell that it has little to no chance of obstructing better things in the future and similar.

Not necessarily. You can leave func(**mapping) outside the try/except block if you don’t expect any exceptions from func. This is more about guarantees that certain current syntaxes will never raise exceptions, as they are designed to always work.


Let’s keep the ** operator simple as it is.

So does the proposal work like this:

d1 = {(0, 1): "foo"}
d2 = {(2, 3): "bar", (4, 5): "baz"}

d1.update(d2)  # obvious, normal. works fine
d1.update(**d2)  # fails, update() doesn't take keywords
d1.update(*d2.items())  # this already works
d1.update(*d2)  # successfully updates, but not what you want
d1.update(**d2.items())  # this thread would make this work

update() does take keywords but it would fail because keywords have to be strings and tuples are not strings.

This proposal would make d1.update(**d2.items()) behave like d1.update(**d2) when d2 is a mapping, so it would fail in the same way as d1.update(**d2) because tuples are not strings and can’t be keywords.

1 Like

Oh huh I guess I’ve just never used that.

1 Like

I can see the benefit of this. Would it also make sense for this idea of “unpacking pairs” to work in the other direction, i.e. not just from a list of tuples to key/value pairs in a dict, but from key/value pairs in a dict to a list of tuples? Currently:

[*{‘a’: 1, ‘b’: 2}] == ['a', 'b']

but could/should:

[**{‘a’: 1, ‘b’: 2}] == [(‘a’, 1), (‘b’, 2)]

which currently throws SyntaxError?