Syntax for dictionnary unpacking to variables

jimy-byerley · September 2, 2022, 4:43pm

I hope this post is not a duplicate of any other somewhere or of some PEP. At least I didn’t find any on this forum regarding the hereby matter.

The idea

At the moment, tuples are very useful for holding many values and unpack it as function parameters or to assign many variables at a time:

def myfunction(these, are, ordered, values):  ...

args = ('these', 'are', 'ordered', 'values')
myfunction(*args)   # unpacking to arguments
a, b, c, *_ = args   # unpacking to variables

All this is allowed by the iterator protocol.
We can also unpack as aguments using the mapping protocol

def myfunction(these, values, are, unordered):  ...

kwargs = {'these':1, 'are':2, 'unordered':3, 'values':4}
myfunction(**kwargs)   # unpacking to values named like the target arguments

I think it could be great to have a new syntax for unpacking to variables, but using the mapping protocol like for functions. So it would provide a concise way of getting specific values from dictionnaries.

# this could be very cool !
kwargs = {'these':1, 'are':2, 'unordered':3, 'values':4}
these, values = **kwargs  # unpacking the values named like the target variables

assert (these, values) == 1, 4

Current workarounds

I know mainly two ways of doing similar things, but none are as satisfying as the latter dedicated syntax.

working with globals (uh…)
```
globals().update(kwargs)
```
It only works in a module scope, and it is not a good design. Also it’s unpacking all the variables from kwargs so it is not as flexible as a proper unpacking.
Doing the same in a local scope is not possible since for performance reasons, function scopes do not store variables in a dictionnary nor any kind of user-accessible mapping.
using operator.itemgetter
```
these, values = itemgetter('these', 'values')(kwargs)
```
Well it works but needs repetition of the desired variables, which is painful if we have a certain number of variables to extract. Also it’s creating a temporary object that is immediately called then destroyed: this is a bit slow only for variables assignment.

Alternatives

These are just ideas, without any consideration of feasibility nor interest.

working with locals, the same as with globals
```
locals().update(kwargs)
```
this would require to create a new mapping type dedicated to call frames; Allowing to access and assign the scope variables in the scope’s call frame using the mapping protocol. So it would be a dictionnary-like object, with no insertion abillity
I don’t know what else

Motivations

this ease the use of dictionnaries as return values instead of tuples
this makes extraction of values from dictionnaries much easier and readable (for config, set of vars, provided environments, etc)
this makes more similar the use of functions returning tuples and functions returning dictionnaries.
complex functions returning a bunch of values often hesitate between tuples (for small set of values) and dictionnaries (for bigger set, or when only few values are used at a time by the caller, but computed anyway). This is for instance often the case in the machine-leanrning libraries I’ve been working with, and this is the case of many scipy functions too.
this plays well with the already existing antagonism between mapping and iterator
this brings an elegant, efficient, and hygienic answer to the iconic question of updating locals()

blink1073 · September 3, 2022, 2:39am

JavaScript has this concept, they call destructuring assignment, that we used heavily in the JupyterLab code base. For example:

const obj = { a: 1, b: 2 };
const { a, b } = obj;

notatallshaw · September 3, 2022, 3:03am

Structural pattern matching has the effect of destructing assignment: Python 3.10: Cool New Features for You to Try – Real Python

steven.daprano · September 3, 2022, 8:03am

I think this has been raised many times before. Nobody has been able to reach consensus on the syntax, semantics, or usefulness of it.

You can start with these two:

Mapping unpacking assignment

Dict unpacking assignment

I think it could be great to have a new syntax for unpacking to variables, but using the mapping protocol like for functions. So it would provide a concise way of getting specific values from dictionnaries.
# this could be very cool !
kwargs = {'these':1, 'are':2, 'unordered':3, 'values':4}
these, values = **kwargs  # unpacking the values named like the target variables

assert (these, values) == 1, 4

That seems to be a shame that you have to write the names of the target variables when they are already written once in the dict.

Especially in the common case where you want all the names in the dict.

It also seems extremely limiting that you have to use the same names for the dict keys as variables. or vice versa. For examlple, your data is coming in from another language:

kwargs = {'Dies': 1, 'sind': 2, 'ungeordnete': 3, 'Werte': 4}

but you want to use English variables. Or some of your keys are not legal identifiers.

kwargs = {'of': 1, 'if': 2, '$who': 3}

Motivations

this ease the use of dictionnaries as return values instead of tuples

this makes extraction of values from dictionnaries much easier and readable (for config, set of vars, provided environments, etc)

this makes more similar the use of functions returning tuples and functions returning dictionnaries.
complex functions returning a bunch of values often hesitate between tuples (for small set of values) and dictionnaries (for bigger set, or when only few values are used at a time by the caller, but computed anyway). This is for instance often the case in the machine-leanrning libraries I’ve been working with, and this is the case of many scipy functions too.

this plays well with the already existing antagonism between mapping and iterator

this brings an elegant, efficient, and hygienic answer to the iconic question of updating locals()

I think the first item is irrelevent. Returning a dict is no different with or without this proposal. You just say return d. Likewise this doesn’t make it easier to construct the dict in the first place.

The second item is arguable. Much easier to write, sure. But easier to read? That depends on who is doing the reading. This will be yet another terse short-cut syntax that beginners and inexperienced users will have no idea what it means.

Python mostly looks like executable pseudo-code. For most features, you don’t need to be a Python expert to guess what it does. Unpacking is an exception, it has to be learned and memorised.

“complex functions returning a bunch of values…” – that would be a code smell. I’m not saying that they are badly designed, but only that they smell like they could be badly designed.

But I guess it depends whether you think of the function returning a single value which is a dict, or many values which happen to be collected in a dict.

I think that the advantage of a dict is when you don’t know what they keys will be. If you do know what the keys will be, and they are all legal identifiers, a much better data structure is a named tuple, or a SimpleNamespace, or some other object with named fields/attributes such as a dataclass.

And with those, you don’t need to destructure the dict into individual variables.

“already existing antagonism between mapping and iterator” – I don’t understand this. What sort of antagonism?

“question of updating locals()” – this doesn’t solve the problem of updating locals. We’ve always had the ability to update locals with direct assignment to a local variable. What we don’t have is the ability to manipulate locals as a namespace like we can do to globals. This proposal doesn’t change that.

jimy-byerley · September 3, 2022, 9:28am

quite verbose unfortunately, just for a bunch of assignments you need 3 lines

jimy-byerley · September 3, 2022, 10:01am

Thanks for pointing out these posts.
Well the only conficts I can see between our 3 proposals are:

use brackets or not: {target, variables} versus target, variables
allow versus disallow the existance of keys not matching target variables

Not difficult to decide after discussion, whatever the result is.

That seems to be a shame that you have to write the names of the target variables when they are already written once in the dict.

I can agree, but for execution speed the python interpreter must know the set of local variables before execution, also to specify the extracted keys allows to extract only what is needed and prevent overriding of other variables in the scope.

your data is coming in from another language … but you want to use English variables. Or some of your keys are not legal identifiers.

You can already say that for unpacking dictionnaries as function arguments. Since it is an accepted behavior in the case of functions arguments, I see no point against the same with variable assignment.

This point has already been discussed recently don’t you agree ?

Python mostly looks like executable pseudo-code. For most features, you don’t need to be a Python expert to guess what it does. Unpacking is an exception, it has to be learned and memorised.

You are right on this, but since users already need to memorize the trick of iterable unnpacking and dict unpacking to arguments, I expect one will find dict unpacking to variables quite intuitive.

I think that the advantage of a dict is when you don’t know what they keys will be. If you do know what the keys will be, and they are all legal identifiers, a much better data structure is a named tuple, or a SimpleNamespace, or some other object with named fields/attributes such as a dataclass.

This is not the only use case for returning dict: you can use it also when there is too much things to return to use a simple tuple. Because it asks the user to memorize the precise order of values, where a dictionnary let him extract the values he really wants.
A dictionnary on the other hand just ask the user to know the names of the field(s) it wants, and let it not care about the rest.

I don’t consider a namedtuple to be a good option since it must be declared beforehand. meaning if you have N functions with a different dictionnary result, you have first to declare N namedtuples just to be used at one place each.
SimpleNamespace is more convenient, but provides no ability to extract a bunch of members, so it is just like a dictionnary.

jimy-byerley · September 3, 2022, 10:32am

I guess whether is stinks or not depends on what your function is for.
In my opinion, the following example do not smell.

# case 1:  intermediary results
def computation():
    a = intermediate_result()
    b = long_operation(a)
    c = other_intermediate_result(a)
    d = final_result(b,c)
    # I can have many intermediary result, so I return a dict
    return dict(a=a, c=c, d=d)   

a, d = **computation()   # I want d, but also want a which is anyway computed

And for functions using dictionnaries (that could be the result of the previous one) I have also the following

# case 2:  initialization of environments
def procedure(env1, env2):
    # env1 and env2 both contain a lot of variables created elsewhere
    # I need a lot of variables from env1 and env2, so I extract them for much better readability in the rest of the function
    the, variables, needed = **env1
    other, values = **env2
    do_something(needed, the, values)
    # still we do not know in this function all the attributes of env1 and env2 that other_procedure can need
    other_procedure(env1)
    other_procedure(env2)

In this case you could say I should use class instances for env1 and env2. But what the point in creating a class with no methods just for use in 2 functions (one for creating env1, one for using it together with env2) ?

# case 3:  work with keyword args
def procedure(**kwargs):
    the, variables, needed, here = **kwargs
    some_work(the, variables)
    and_other_stuff(needed, here)
    other_procedure(**kwargs)

John_Carter · September 3, 2022, 3:54pm

d = dict(a=0, b=2, c=4)
match d:
    case {'a':a, 'c':c, **rest}:
        pass
    case _:
        raise 'Missing data'
print(a, c, rest) # 0 4 {'b': 2}

We can unpack d now, with 5 lines of code, 1 would be better if there is any interest.

pf_moore · September 3, 2022, 4:02pm

Also, with a, b, c = **dct, what happens if dct doesn’t have a b key? Presumably ValueError to match sequence unpacking. So robust code needs to take that into account. The match statement makes the “what if the dict doesn’t match” case explicit, which is probably better in “production quality” code.

For quick scripts a one-liner may be better (easier to read and who cares if bad data causes a crash). But quick scripts have a bad habit of growing into production infrastructure in my experiemce…

steven.daprano · September 3, 2022, 4:50pm

If you have examples from real code, they may be more convincing than

the invented ones here.

In my opinion, your first example positively reeks.

The names are meaningless letters ‘a’, ‘c’ and ‘d’. Presumably this is because it is a made-up example, not a real one. Being meaningless, there is no advantage to using a dict.

You’re only returning three values. Just use a tuple.

If you need the value of a for something else, and c is not needed, you should compute it first, then pass it into computation() as an argument:


a = compute_a()

d = compute_d(a)  # No need to return c if it is not used.

I don’t understand your second example. What are these “environments”, and why do you have two of them, where you use some values from one environment and some values from the other?

Again, this seems made-up. Can you give a real-life example of this?

To me, a realistic example would involve merging the environments into a single environment object, then using it as a namespace. There are lots of ways of doing it, here’s one:


chain = collections.ChainMap(env1, env2)

env = types.SimpleNamespace(**chain)

do_something(env.needed, env.the, env.values)

You still have env1 and env2 available if you need them.

Your third example is perhaps a bit better, but still quite artificial. I think this would be a better design:


# case 3:  don't work with keyword args when you don't need to

def procedure(the, variables, needed, here):

    some_work(the, variables)

    and_other_stuff(needed, here)

    other_procedure(the, variables, needed, here)

Named parameters are usually better than cramming everything into kwargs. If nothing else, they work much better with auto-complete in your IDE.

steven.daprano · September 3, 2022, 4:56pm

3.10 with its match statement has not even been out for a year yet. The majority of Python code is running on older versions, so we don’t yet have a lot of community experience with dict destructuring in match statements.

It would be good to get some more experience with it, to see how generally useful it actually is in practice, before adding a one-line version.

MRAB · September 3, 2022, 6:01pm

If the syntax was:

{a, b, c} = **dct

maybe you could provide a default like this:

{a, c, b='default'} = **dct

jimy-byerley · September 4, 2022, 8:20am

Since a, b, c = [1, 2] is raising a ValueError in case of not enough or too much values to unpack, I would expect the same (or eventually a KeyError) in case of a missing key in the dict.

a dict unpacking syntax doesn’t seems to me less “production ready” than iterable unpacking, so I wouldn’t worry about it being used into production

jimy-byerley · September 4, 2022, 8:31am

This is indeed only made-up code. I did start thinking about all this while writing a real-life code, and the existance of dict unpacking was a thing for the choice of the architecture. However there was many different possible architecture as you can imagine. A dict unpacking solution would be nice but I don’t think my real-life code would be the perfect example to proove the necessity of such syntax. This was just what triggered my thoughts about unpacking.

I can still share that original code if you want, but I’m afraid we would quickly go out of topic basing the debate on it.

I can make some of the above examples more complex to show the value of unpacking

steven.daprano · September 4, 2022, 8:58am

Real examples (whether simple or complex) are far more useful than made-up complex examples.

The question is not whether dict destructuring is useful. Of course it is useful. The question is whether it is useful enough to justify creating new syntax for it.

Every new syntax makes Python harder to learn. It makes it less like the beautiful executable pseudocode that made it popular in the first place. It is more code to maintain, and document, and test. If the new syntax adds little or no value to real code, then it is just bloat, and makes the language a little bit worse instead of better.

That’s why we want to see real examples of code that would be improved by the new feature, rather than merely hoping that the feature will (1) be used and (2) actually be an improvement.

jimy-byerley · September 4, 2022, 9:47am

I will try to find real examples. But in the mean time, maybe case 3 is the most explicit.

# case 3:  complexified with named parameters
def procedure(the, variables, needed, here, **kwargs):
    # in fact we can have many more parameters to transmit to sub calls
    # but we don't know them all, so we are still using **kwargs
    some_work(the, variables)
     # those procedure might use 'the', 'variables', 'needed', 'here', but not sure, so we repack it again
    # that's a lot of repetitions
    some_procedure(the=the, variables=variables, needed=needed, here=here, **kwargs)
    and_other_stuff(needed, here)
    # and we have plenty of calls to make with these parameters so we repeat a lot, and create a lot of intermediary dictionnaries
    other_procedure(the=the, variables=variables, needed=needed, here=here, **kwargs)
    yet_and_other_procedure(the=the, variables=variables, needed=needed, here=here, **kwargs)

# case 3:  complexified with dict unpacking
def procedure(**kwargs):
    the, variables, needed, here = **kwargs
    # what is using those extracted values is not necessarily a function call
    # so we cannot use unpacking to arguments. we would be obliged to 
    # write `kwargs['variables']` in many places without dict unpacking
    some_work(the, variables)   
    # much shorter call
    some_procedure(**kwargs)
    and_other_stuff(needed, here)
    # and we have plenty of calls to make with these parameters so we save a dict creation each time and a lot of repetition
    other_procedure(**kwargs)
    yet_and_other_procedure(**kwargs)

The sad thing in this is that we have to always repeat the namespace’s name. As if we had to always use kwargs['values'] instead of just values. It doesn’t have the convenience of variables in the current namespace.

Well, but what if I have N values to return ?

Then if the number of intermediary result is N, this is asking the user to write by itself at least N calls, even if he is only interested in the very result of few of them.
To be honest, requiring the user to do so is what I do in most of my libraries, because most of the time the intermediate computation steps have meaning in themselves. But sometimes they have not, or sometimes they have meaning but depend on other intermediary results we do not want to bother the user with. In such case I want to expose the intermediary result in the return value, in addition to the final result.

Just for the case of functions returning a dict with intermediary results, I’m taking again the example of scipy functions: we have scipy.optimize.minimize() that returns OptimizeResult which is nothing but a structure with optional fields, with no methods and only used for returning from minimize()
Basically this could be a dict or SimpleNamespace !
Well, this structure could only have 3 return values: (x, status, nit)
But since it the other values are computed anyway by the solver and can often be useful to the user and can be boring to compute in user code, there is also the fields (fun, jac, hess, hess_inv, maxcv, ...)

jimy-byerley · September 4, 2022, 9:52am

scipy.optimize.minimize() is not a side-case I think, since I seen a lot of similar designs in the rest of scipy, in pytorch, and in many data science models. We can question the relevance of those designs, but this is an existing case.

jimy-byerley · September 4, 2022, 9:58am

I understand your point.
I will see if I can have real examples that are not too messy to be discussed.

pf_moore · September 4, 2022, 10:29am

I’ve only been skimming the discussion, but I’m now confused about your logic. The scipy.optimize.minimize suggests that returning an object is better than returning a dictionary, because doing so makes it easier for the caller to access attributes.

And yet, you’re suggesting (I think?) that we should add new syntax because it’s clumsy to reference elements of dictionaries returned from functions. So why not return an object instead? One of your examples had return dict(a=a, c=c, d=d) - why not just replace that with return SimpleNamespace(a=a, c=c, d=d)?

And your “case 3” feels to me like something that’s either crying out for a redesign, or something that’s a rare case where no matter what you do, things are going to be a bit messy. It’s hard to tell without real world code, but my instincts suggest a redesign is likely the right answer. Unfortunately, nearly every time someone posts “real code” in a case like this, and people suggest redesigns, we end up in arguments about why “that won’t work” - and in truth, these are rarely productive, as both sides have a vested interest in their position to prove their point.

Overall, I feel that dict unpacking could be useful, but it’s rather niche, difficult to get the details of the design right, and will probably mostly end up being used in cases where it’s a “good enough” solution. In particular, I’d hate to see APIs being explicitly designed to return dicts so you can use dict unpacking on them. So overall I’m probably +0 on the idea.

jimy-byerley · September 4, 2022, 11:46am

In fact I have nothing against scipy.optimize.minimize returning a structure or a SimpleNamespace. This is just an example to highlight that there is existing APIs returning many more values than what is strictly necessary, including intermediate results.

If such function was returning a dict (like in other APIs that I do not have in mind), this would makes it just one case in many, in which an dict unpacking syntax can be useful to the user. This is not the case requiring the new syntax.

This is something I’m afraid of.

Since the example of “case 3” does not make any assumption on how the functions called by this procedure are designed, I would say in a general case that they could eventually be well designed, so the guy writing procedure might have no choice but to pass **kwargs or each parameter namely.
I agree that this could be a rare case.