Should `**` unpack iterables of iterables?

Thanks for saving some of the fun for me :slight_smile:

I didn’t expect this to inspire as much discussion as it has, and I’m finding the conversation really interesting. Looking through the thread, though, it does seem like many of the arguments come down to personal preference; and perhaps to no one’s surprise, this kind of subjective value judgement is a place where multiple reasonable people can reasonably disagree.

Here’s my attempt to briefly summarize my opinion:

  • It seems like we all value “simplicity,” but from going through the thread, I think we may disagree about what “simplicity” means, or at least about the relative importance of various kinds of simplicity. One primary way I interact with Python is through teaching, and so one kind of simplicity I value super highly is simplicity for someone new learning the language. To this end, internal consistency is very important to me.

  • Given PEP 448, we have (among other things) multiple ways to create/merge dictionaries: {**x} and dict(x) and d.update(x). Once one has a feeling for what kinds of x are appropriate for one of these forms, it would be nice if that feeling generalized to all of them rather than requiring memorization of which types are allowed in which case.

  • The change proposed here is (AFAICT) fully backwards-compatible, and it can be implemented with no little-to-no impact on performance of existing code.

  • I have not personally been convinced by any of the examples in this thread that this change would make anything harder to read/understand.

Happy to continue the conversation!


I agree with this in principle, and I would agree with it here, too, were it not for the feeling that in this case, keeping this interface limited makes it inconsistent with other existing ways of doing the same thing. The way I see it, implementing this idea wouldn’t be adding a whole new permissive interface; it would be making one existing interface more permissive to make it more similar to existing, similar interfaces.

I think about this the same way. I agree in principle, but given that that decision has been made, I favor making things internally consistent over other considerations.


Yes, and intentionally so. The proposal, as I see it, is allowing pairs in all the different ways we can create/update dictionaries, rather than only in some, and maintaining the same semantics in all of those cases, with the intention that once you know the quirks with one of these things, you know the quirks of all of them.

I’m not sure I agree, but even granting that, it would only be as leaky as dict.update is, and in the same way.

I also think it’s not quite right to say that the behavior of **mapping doesn’t already depend on the contents of mapping; in some cases, like keyword argument unpacking, the contents very much matter (keys must be strings and match the signature of the function being called). So the user still has to be careful.


It does improve performance if what you’re starting with is not a dictionary but rather an iterable of pairs, which is the case I’m thinking about here. Here’s a little test case (of course, I know I actually have a dict to start with in this code, but let’s imagine I didn’t):

Quick Test Code
import timeit
import math

y = list(math.__dict__.items())

N = 1_000_000

t = timeit.timeit(lambda: {**dict(y)}, number=N)
t2 = timeit.timeit(lambda: {**y}, number=N)

print(f'{{**dict(y)}} {t:.04f} seconds')
print(f'{{**y}} {t2:.04f} seconds')
adj = "faster" if t < t2 else "slower"
print(f'wrapping in dict is {adj} ({t / t2:.02f}x)')

t3 = timeit.timeit(lambda: {}.update(y), number=N)
print()
print(f'{{}}.update(y) {t3:.04f} seconds')
print(f'{{**y}} {t2:.04f} seconds')
adj = "faster" if t3 < t2 else "slower"
print(f'calling update is {adj} ({t3 / t2:.02f}x)')

def func(**kwargs):
    return kwargs.keys()

t4 = timeit.timeit(lambda: func(**dict(y)), number=N)
t5 = timeit.timeit(lambda: func(**y), number=N)
print()
print(f'func(**dict(y)) {t4:.04f} seconds')
print(f'func(**y) {t5:.04f} seconds')
adj = "faster" if t4 < t5 else "slower"
print(f'wrapping in dict is {adj} ({t4 / t5:.02f}x)')

Running this test through my quick hacky implementation, a fairly typical output looks like:

{**dict(y)} 1.9673 seconds
{**y} 1.6413 seconds
wrapping in dict is slower (1.20x)

{}.update(y) 1.7020 seconds
{**y} 1.6413 seconds
calling update is slower (1.04x)

func(**dict(y)) 3.8099 seconds
func(**y) 3.4797 seconds
wrapping in dict is slower (1.09x)

This happens with no slowdown of the pure-dict case since we never hit the new branching point in that case, though it would be slightly slower than the current implementation for custom types that support .keys and .__getitem__, due to the extra check for the existence of .keys. We also obviously save on memory by avoiding the creation of an intermediate dict object.

Of course, this is going to be way slower than the fast path have for dict/dict merges, but IMO that’s beside the point.


I also put together an Emscripten-based demo for this idea in case anyone wants to fiddle with it without needing to compile yourself:

6 Likes

dict.update() is a method defined on the dictionary object, not part of Python’s core syntax. In contrast, ** is a syntax feature of the language itself, used in expressions like exponentiation and argument unpacking. Since one belongs to the language’s grammar and the other to an object’s method, comparing them directly doesn’t make sense because they operate at different layers of the language.

The argument is similar to asking why ** (exponentiation) doesn’t accept strings, just because int() can convert them. "2" ** "2"?

Also, ** does not indicate nested unpacking. It doesn’t follow the visual pattern where one asterisk suggests single-level unpacking, two asterisks imply double-level, and three asterisks would imply triple-level unpacking.

This is the fastest approach (no pun intended):

import math

func(**math.__dict__)

I don’t think that’s equivalent to what I’m suggesting here.

I’m not disputing that it would be weird for "2" ** "2" to work, in a vacuum. But I feel like it would be more weird if math.pow("2", "2") worked but "2" ** "2" raised an exception, given that they’re supposed to represent the same operation. It’s that kind of difference that I feel between dict.update and ** used for dict unpacking.

I’m a little confused by this. I wasn’t trying to suggest anything of the sort.

Naturally. Hence my caveat “(of course, I know I actually have a dict to start with in this code, but let’s imagine I didn’t)”. I don’t think it’s unreasonable to assume that there are situations where you don’t have a dictionary, but you have an association list that you want to treat as a mapping (like the examples from the Github search about **dict(zip(). It’s the difference in performance in those cases that I was trying to get at.


I’m not sure whether I’ve been unclear in my communication, or if I’m misunderstanding your arguments, but it’s starting to feel like we’re talking past each other, which I don’t think is particularly fruitful.

I’m still happy to continue the conversation (and interested to hear more opinions as well), but I’m going to step away from this thread for a bit.

2 Likes

I understand it now, but dict.update is an implementation detail. I still don’t see the connection between this and the ** syntax.

In what way is dict.update an implemenation detail? It’s a fully documented public method of a built-in type.

The two code snippets in the OP already demonstrate the connection. Please elaborate how it isn’t sufficient for you.

1 Like

I like this proposal and I think it matches what I would intuitively expect. With *args the result is a tuple but you can pass whatever would be acceptable in the tuple constructor. With **kwargs the result is a dict and it seems natural that it could accept what the dict constructor accepts.

The part that I find awkward about this is not really the proposal here but just that the dict constructor is already awkwardly overloaded as you can see in typeshed. For tuple the stubs have:

class tuple(Sequence[_T_co]):
    def __new__(cls, iterable: Iterable[_T_co] = ..., /) -> Self: ...

For dict it is:

class dict(MutableMapping[_KT, _VT]):
    # __init__ should be kept roughly in line with `collections.UserDict.__init__`, which has similar semantics
    # Also multiprocessing.managers.SyncManager.dict()
    @overload
    def __init__(self) -> None: ...
    @overload
    def __init__(self: dict[str, _VT], **kwargs: _VT) -> None: ...  # pyright: ignore[reportInvalidTypeVarUse]  #11780
    @overload
    def __init__(self, map: SupportsKeysAndGetItem[_KT, _VT], /) -> None: ...
    @overload
    def __init__(
        self: dict[str, _VT],  # pyright: ignore[reportInvalidTypeVarUse]  #11780
        map: SupportsKeysAndGetItem[str, _VT],
        /,
        **kwargs: _VT,
    ) -> None: ...
    @overload
    def __init__(self, iterable: Iterable[tuple[_KT, _VT]], /) -> None: ...
    @overload
    def __init__(
        self: dict[str, _VT],  # pyright: ignore[reportInvalidTypeVarUse]  #11780
        iterable: Iterable[tuple[str, _VT]],
        /,
        **kwargs: _VT,
    ) -> None: ...
    # Next two overloads are for dict(string.split(sep) for string in iterable)
    # Cannot be Iterable[Sequence[_T]] or otherwise dict(["foo", "bar", "baz"]) is not an error
    @overload
    def __init__(self: dict[str, str], iterable: Iterable[list[str]], /) -> None: ...
    @overload
    def __init__(self: dict[bytes, bytes], iterable: Iterable[list[bytes]], /) -> None: ...
    def __new__(cls, *args: Any, **kwargs: Any) -> Self: ...

The way that “mappings” are special cased in the constructor is awkward. It would be better if dict.__iter__ yielded pair tuples and then a dict[K, V] would also be an Iterable[tuple[K, V]] and requiring that as the input would collapse many overloads. The SupportsKeysAndGetItem case would not be needed at all. The internal code could have a fast path that checks for dict but it would be consistent with the argument being an Iterable[tuple[K, V]] and there would be no other need for the implementation to special case “mappings”.

2 Likes

When comparing ** mapping (dictionary) unpacking with dict.update, the former simply unpacks a mapping, while the latter updates an existing dictionary. The way updates are performed does not affect the ** syntax in any way. The ** operator is syntactic sugar that simplifies writing dictionary displays by avoiding the need to manually extract keys from a mapping. It does not update an existing dictionary.

The ** syntax is straightforward; it simply avoids the manual effort of unpacking key-value pairs from a mapping. It does not parse other data structures to extract pairs. A mapping is a data type, whereas pairs, as used in the original post, is not.

There is no connection between creating a dictionary object by parsing other data structures and unpacking a mapping. They operate in opposite directions.

(My grammar checker identifies ** as an operator, but it’s actually just syntax. That’s why it has been referred to as an operator in previous posts.)

I think this thread is about whether it is good idea to make one.

1 Like

FYI ** is referred to as a operator in several places in the Python documentation:

  • reference to unpacking generalisations for the 3.5 release (referring to PEP 448)
  • reference here describing dictionary display, where it refers to the operand of **
  • also, PEP 448

I’m not questioning what the right term should be, but clearly it has been described as an operator.

This is an implementation detail, but in a sense, ** does update an existing dictionary:

>>> dis.disco(compile("{**x}", "<string>", "eval"))
  0           0 RESUME                   0

  1           2 BUILD_MAP                0
              4 LOAD_NAME                0 (x)
              6 DICT_UPDATE              1
              8 RETURN_VALUE

The patch I linked earlier in the thread works by changing DICT_UPDATE (rather than by doing anything about ** specifically), so maybe an alternative view of this proposal is that it’s about making the DICT_UPDATE opcode work more like dict.update.

I’m not particularly convinced by arguments that the semantics of syntactical structures are special in a way that built-in functions and methods aren’t, since it all boils down to function calls in the end; in this case, dict.update(x) and {**x} both work their way down to functions in dictobject.c anyway, and it’s really just a matter of which of those gets called (a variant that accepts an iterable of pairs versus a variant that doesn’t).


There’s another important point that I don’t want to get lost in the thread. My focus so far has been almost exclusively on dict unpacking versus dict.update, but @oscarbenjamin points out that this change would also improve an asymmetry related to keyword argument unpacking, which I think further strengthens the argument for it:

I agree that the typing situation is a mess, but at least this change wouldn’t be inventing an entirely-new mess :slight_smile:. And I think it’s unfortunately outside the realm of possibility to consider changing dict.__iter__ and friends since that can’t be done in a backward-compatible way.

2 Likes

At the language level, it would not make sense without a properly defined pairs data type, as in assert isinstance([[1, 2], [3, 4]], pairs). An implementation-level optimization is acceptable for **dict(pairs)

Could you explain why?

This is a good idea. The issue is that there needs to be a way to know that dict == builtins.dict.

I am adding this to the list that I am gathering, where the ability to do so would enable useful optimizations. No idea how achievable this is. Nevertheless, the current contents of the list:

# single item container operations
list.pop
list.append
set.add

# builtin functions
builtins.len

# others
**dict(pairs)   # +
1 Like

I don’t see any other way to indicate to the interpreter or the user that this is a list of key-value pairs: [[1, 2], [3, 4]]

In [[1, 2], [3, 4]], nothing in the syntax says “these are 2-element associations.”

But why is this an issue? Interpreter is clueless most of the time about everything. Thus, type checkers.

E.g. Currently parser has no clue that a is a dict either:

print(ast.dump(ast.parse('{**a}'), indent=4))
Module(
    body=[
        Expr(
            value=Dict(
                keys=[
                    None],
                values=[
                    Name(id='a', ctx=Load())]))])

All good in the hood until you try to evaluate it.

So far I have not seen any hard deal-breakers for this. From technical POV situation is analogous to dict.update mechanics. Whether mechanics of dict.update is optimal, this is another discussion, but it is what it is and not changing. This thread is about given the situation at hand, would it be good to extend such mechanics to **arg in context of {**arg} and func(**arg). I don’t know, but these are the questions that I try to focus on when I am at this:

  1. Is it just too weird / unpythonic in any way? I play with Emscripten from time to time to get a feel - it feels a bit weird. But not sure if it is because it is actually weird or I am just not used to it.
  2. Does this cause ambiguities / issues / confusions in addition to those that dict.update cause? Not sure yet.
  3. Is this beneficial enough to be worthwhile? Extensive use-case analysis / beneficial case presentation hasn’t been done - so no idea. Personally, I would have fairly minimal benefit in terms of performance benefit. But:
  4. How beneficial is consistency for its own sake? Personally, I think I value consistency more than the average person, so I am naturally slightly positive by default. Teaching / learning / retaining knowledge is important. E.g. “arg in all of the following follows the same protocol: dict(arg), dict.update(arg), {**arg}, func(**arg)”. This makes my life a bit easier. +added flexibility and less characters to type.
1 Like

Why is Python a dynamic language and also a strongly typed language

For example,“2” + 2 raises:
TypeError: can only concatenate str (not "int") to str

dict and dict.update details of implementation, as well as their “implementation details,” are irrelevant regarding the ** syntax.

I have a dict subclass that accepts serialized JSON and MessagePack strings. I use it for testing, because performance is not important. I got tired of manually deserializing them.

I think one thing that’s missing from this conversation is that, at least for me, this has practically never been a need. If I have mapping, I store it as a mapping. I don’t store it as an iterable of pairs and then have to unpack it.

I think it might help to find some real world use cases of this feature to verify that

  • this actually is a somewhat common need, and
  • the code that would use it isn’t badly designed.
1 Like

A quick search for /\*\*dict\([^=]+\)/ lang:python in GitHub yields many legitimate use cases (from reputable repos with many stars by the way) such as:

  1. By far the most common use case–unpacking zipped key-value pairs as keyword arguments for a call:
  1. Unpacking a returning sequence of key-value pairs (from a function designed to return so) as keyword arguments for a call:
  1. Merging key-value pairs with other dicts:

The proposal would allow these codes to be both more concise and more efficient.

3 Likes