Syntax for dictionnary unpacking to variables

I’d say the earlier exception of a ValueError is actually better for “production code”, since it would be immediate and point to the correct line. mypy and friends will help you point out that some variables might be unbound in the match case, but in a project without these set up I can imagine a developer forgetting to add the default case _.

>>> d = {"a": 1, "b": 2}
>>> match d:
...     case {"a": a, "c": c}:
...         pass
>>> # Some code later...
>>> a
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    a
NameError: name 'a' is not defined

Working in TypeScript I use destructuring assignments every day, so this feature would be a godsend to have in Python as well! I’ll see if I can scrounge up a few examples of code that could be “improved” with the new syntax.

I do think a proposal should preferably consider destructuring object attributes as well, not only dict values. Otherwise I think we would discourage the use of dataclasses and similar classes, since they would be less ergonomic to use.

3 Likes

One HUGE distinction between dict unpacking and sequence unpacking is futureproofing. Suppose you have this function:

def interpret_color(color):
    """Figure out what a colo[u]r name means"""
    # parse all the different things you might want
    return 102, 51, 153

r, g, b = interpret_color("rebeccapurple")

There’s only one direction to safely extend this, and that’s at the right hand edge. Also, in order to be extensible at all, callers have to use it like this:

r, g, b, *_ = interpret_color("rebeccapurple")

In this particular example, it wouldn’t be a problem to extend to the right (the most obvious extension being an alpha value), but you can’t always know that in advance, and callers still have to plan for the possibility that it returns more values in the future.

With dict unpacking, it would be far easier. The initial version would return {"r": 102, "g": 51, "b": 153}, you unpack it into r/g/b, and all’s well. Yes, you have to unpack into those exact names, but in practice, that isn’t actually that big a limitation (can attest - have used this exact concept very frequently). And if an alpha channel is added? No problem, just have four things in the dictionary. No code needs to be changed. You could even insert it somewhere in the middle without a problem.

Obviously that’s not to say that dict unpacking is perfect, problem-free, and the ultimate replacement to sequence unpacking. But they have different tradeoffs, and having both in the language would be valuable.

7 Likes

As Paul explained above, I think the problem you’re describing would be better solved with a structured data type. The problem with using dict unpacking here is that you’re essentially naming all the fields you care about in every single place that you’re unpacking the color (and thereby polluting the local namespace). The advantage of using a structured data type is that you have even the option to implement some of the attributes as properties, for example.

You’d be amazed how rarely that’s actually a problem in practice. Obviously this doesn’t replace ALL attribute/item lookups, and in the cases where you need to use multiple of the same type of thing into the same namespace, you just don’t destructure them; and in a pretty huge number of situations, the names are actually fine on both sides.

Structured data types have their own uses. But if you have a blob of JSON that you’re working with, is it easier to parse it into nested dicts and lists, or to first design a full schema in dataclasses (or equivalent) before you can do any parsing at all? Destructuring out of a dictionary lets you ignore everything that isn’t relevant to you - as per the futureproofing that I described above - instead of trying to design something that copes with all the possible values you might get. I suppose there is SOME value in using SimpleNamespace instead of a dictionary, but all you really achieve there is a slightly more compact syntax (spam.ham instead of spam["ham"]). You still won’t have any properties or anything.

Destructuring isn’t a replacement for other ways of doing things. It’s a way to do, much more conveniently and without lots of name repetitition. one particular very common operation. That is all.

3 Likes

Your JSON example makes sense to me. And I agree that

**{'a': a, 'b': b} = some_dict   # This is the most analogous syntax for dict unpacking IMO

is simpler than

a = some_dict['a']
b = some_dict['b']  # Yes, I agree this has unfortunate repitition of `some_dict`

I guess it hasn’t come up for me that often. Most of the time, if I’m doing something like this, I should be using a dataclass. Just for comparison:

@dataclass
class C:
  a: int
  b: int  # opportunity to add member functions and properties.

c = C(**some_dict)  # only adds one attribute to the local namespace.

isn’t so much longer if you use C more than once.

It would be nice to see some real world code where dict unpacking would be beneficial. Maybe grepping for some pattern that matches the sequence of dict lookups would turn some things up?

Can’t you do this with a match statement? So all you’re saving here is a bit of space?

match interpret_color("rebeccapurple"):
    case {"r": r, "g": g, "b": b}: pass
    case _: raise ValueError("Invalid return value")
1 Like

This utterly and totally fails if some_dict has any more elements, though. That’s what I mean about being forced to fully define the JSON schema before you work with any of the data, which is a tedious job and blocks all future expansion. It’s not much longer in this example, where you’re destructuring the whole thing - but it becomes way way longer when you have to define ten or twenty attributes just because you care about two of them.

2 Likes

Yes, which is why some people have talked about an inline match directive. But that still means you end up repeating the names, so I’d consider it a much heavier-duty tool and not well suited to simple tasks. An inline match assignment does come close to the goal; a full match block is overkill. Also, it feels rather weird to have a case statement with no code in it, just to leave the name bindings to outlast the statement; but maybe that’s just me.

3 Likes

Another option would be to create a function

def f(other_variables..., r: int, g: int, b: int, **_):
  ...  # process here

f(other_variables.., **interpret_color...)  # does not pollute any local variables, extensible
1 Like

In my experience, fully specifying the data structure (that you need) coming out of JSON deserialisation is way faster development due to the usual benefits of typing, for all but the quickest of experiments.

You can ignore unneeded fields by defining a second constructor:

@dataclass
class Colour:
    r: int
    g: int
    b: int

    @classmethod
    def from_dict(cls, d: dict[str, int]) -> "Colour":
        return cls(d["r"], d["g"[, d["b"])

Which can then be expanded later on by adding new fields and updating the constructor.

In this case, I recommended a serialisation library like attrs (with cattrs), pydantic or marshmallow to reduce duplication.

Fully specifying the part that you need? Maybe. Fully specifying everything? Absolutely not.

But you’re comparing this to typing, which is most definitely optional in Python, so it should be noted that all you’re saying here is that this should ALSO be optional.

1 Like

Interesting this work when run in a file but not at the prompt!!!

On further thought, this works perfectly. a is not defined as ‘c’:c did not match, ‘c’ not in d.