Pre-PEP: Unpacking in Comprehensions

adqm · June 22, 2025, 5:51pm

Currently (thanks to PEP 448), we have a very nice shorthand for combining small numbers of iterables/dictionaries:

[*it1, *it2, *it3]
{*it1, *it2, *it3}
{**dict1, **dict2, **dict3}

I think a natural extension of this syntax would be to allow starred expressions in comprehensions and generator expressions, to provide easy ways to combine an arbitrary number of iterables/dictionaries:

[*x for x in its]  # list with the concatenation of all iterables in `its`
{*x for x in its}  # set with the union of all iterables in `its`
{**d for d in dicts}  # dict with the combination of all dictionaries in `dicts`
(*x for x in its)  # generator representing the concatenation of all iterables in `its`

These would effectively be shorthand for the following syntax, but more concise by avoiding the use and repetition of auxiliary variables:

[x for it in its for x in it]
{x for it in its for x in it}
{key: value for d in dicts for key, value in d.items()}
(x for it in its for x in it)

Of course, there are alternative ways to do things like this. Taking the concatenation of a bunch of lists as an example, any of the following would result in the same output (though some are more efficient than others):

[x for it in its for x in it]
list(itertools.chain(*its))
sum((it for it in its), [])
functools.reduce(operator.concat, its, [])

However, the proposed syntax of [*it for it in its] is both more concise and more intuitive (to me, at least) than any of these options. Given the existing unpacking syntax, the additional syntax proposed here feels like a natural way to build such a collection given the existing language features, moreso even than, e.g., [x for it in its for x in it] (where, in my experience with teaching Python, the two for clauses often feel backwards for beginners, so the impulse is to swap their order).

Currently, the proposed syntax results in specific error messages:

>>> [*x for x in its]
    ...
SyntaxError: iterable unpacking cannot be used in comprehension
>>> {**d for d in dicts}
    ...
SyntaxError: dict unpacking cannot be used in dict comprehension

My suspicion is that these error messages are mostly encountered by people who wishfully use this syntax and already have a correct intuition for how it would/should behave, rather than by people accidentally typoing a * or ** into those expressions. My confidence that this extension feels natural is inspired in part by students using this notation on written exams, assuming that it is already a part of the language.

Here are a few small example programs written several different ways, to show how this proposed syntax compares against what we can already do in Python. For each, the last version demonstrates the proposed syntax.

Finding all files contained within a directory and its subdirectories:

def get_all_files(path):
    all_files = []
    for _, dirs, files in os.walk(path):
        all_files.extend(files)
    return all_files

def get_all_files(path):
    return [file for _, _, files in os.walk(path) for file in files]

def get_all_files(path):
    return list(itertools.chain(*(files for _, _, files in os.walk(path))))

def get_all_files(path):
    return [*files for _, _, files in os.walk(path)]

Typical CS2-level Python exercise, finding the leaf values of a tree using recursion:

def leaf_values(tree):
    if not tree['children']:
        return {tree['value']}

    out = set()
    for child in tree['children']:
        out.update(leaf_values(child))
    return out

def leaf_values(tree):
    if not tree['children']:
        return {tree['value']}

    return {
        grandchild
        for child in tree['children']
        for grandchild in leaf_values(child)
    }


def leaf_values(tree):
    if not tree['children']:
        return {tree['value']}

    return {*leaf_values(child) for child in tree['children']}

Merging information across configuration dictionaries:

def merge_configs(configs):
    out = {}
    for conf in configs:
        if conf.get("enabled"):
            out.update(conf)
    return out

def merge_configs(configs):
    return {
        key: value
        for conf in configs
        if conf.get("enabled")
        for key, value in conf.items()
    }

def merge_configs(configs):
    return {**conf for conf in configs if conf.get("enabled")}

Filtering out values from an HTML document:

def matching_items(source):
    out = []
    for section in BeautifulSoup(source).find_all('ul', class_='mylist'):
        out.extend(section.find_all('li'))
    return out

def matching_items(source):
    return [
        item
        for section in BeautifulSoup(source).find_all('ul', class_='mylist')
        for item in section.find_all('li')
    ]

def matching_items(source):
    return [
        *section.find_all('li')
        for section in BeautifulSoup(source).find_all('ul', class_='mylist')
    ]

More formally, my friend/colleague Erik Demaine (@edemaine) and I have put together a draft PEP and a basic reference implementation:

Draft PEP: peps/peps/pep-9999.rst at comprehension_unpacking · adqm/peps · GitHub
Reference Implementation: GitHub - adqm/cpython at comprehension_unpacking

This isn’t the first time that this idea has been proposed. Erik and I previously proposed this extension in a thread on the python-ideas mailing list in 2021, where it was met with positive feedback (all replies to that thread were at least +0), but we were ultimately unable to find a sponsor at that time.

So I’m giving it another go here, hoping for feedback (on the proposal and/or the reference implementation) and, ultimately, hoping to find a sponsor for moving forward with the PEP process if there’s still enthusiasm behind this idea.

For additional reference, similar ideas were also presented in PEP 448 itself and in another mailing list thread from 2016, and more recently in another thread on this forum.

alexprengere · June 22, 2025, 7:13pm

I think this is a great idea
I was actually also working on a PEP for this idea (not ready yet), and I have an implementation as well (ready-ish). I was about to post about it next week
I will try to review your PEP, but after a quick read: do you think this should also work for async comprehensions? I see in your implementation it is handled, but there are no mentions of this in the PEP (personally, I lean towards yes for consistency).

elis.byberi · June 22, 2025, 7:52pm

Only the last config values will appear if the configs share the same set of keys:

data = [{'a': 1}, {'a': 2}, {'a': 3}]
flattened = {k: v for d in data for k, v in d.items()}
print(flattened)  # {'a': 3}

adqm · June 22, 2025, 8:01pm

Perfect timing, then, I suppose! Certainly happy to work together on this.

Yes, I do think it makes sense for this to work for async comprehensions for consistency. I also agree that it’s worth calling this out specifically in the PEP; I’ll try to add those words later.

Yes, that’s true (but it does/should work that way in all of the implementations of that program). The intention is that {**d for d in dicts} is equivalent shorthand to {**dicts[0], **dicts[1], ..., **dicts[-2], **dicts[-1]}, including the fact that later values for the same key override earlier values.

The draft PEP has some wording to this effect: “As usual with sets and dictionaries, repeated elements/keys replace earlier instances.” But I can try to see if I can make that part of the intended spec clearer.

elis.byberi · June 22, 2025, 8:28pm

Yes, a subtopic explaining how merge conflicts are resolved would make this clearer. This differs from .update(), which explicitly defines how merge conflicts are handled.

A combination is a selection of items from a set where the order of selection does not matter.

jorenham · June 23, 2025, 12:43am

Will this also work for PEP 530 async comprehensions? So for example

[*x async for x in aits()]

instead of

[x async for ait in aits() for x in ait]

blhsing · June 23, 2025, 1:38am

I’ve always found it surprising and counter-intuitive that unpacking in comprehensions doesn’t work ever since I learned about generalized unpacking in container literals.

That said, I think the PEP needs to address the concern raised by the explanations to why such a feature wasn’t included in PEP-448, specifically that it would cause ambiguity when unpacking in an unbracketed generator expression in a call since argument list already supports unpacking.

That is, which one of these is intended?

f(*x for x in it) == f((*x for x in it))

or:

f(*x for x in it) == f(*(x for x in it))

Original quote from PEP-448:

Unbracketed comprehensions in function calls, such as f(x for x in it), are already valid. These could be extended to:
f(*x for x in it) == f((*x for x in it))
f(**x for x in it) == f({**x for x in it})
However, it wasn’t clear if this was the best behaviour or if it should unpack into the arguments of the call to f. Since this is likely to be confusing and is of only very marginal utility, it is not included in this PEP. Instead, these will throw a SyntaxError and comprehensions with explicit brackets should be used instead.

adqm · June 23, 2025, 2:05am

That’s the intention, yes. I can add some words to that effect to the PEP.

That’s a good point. It may take a little while to find the right specific words to add to the PEP, but I can give the short version of my personal reasoning here, which is that if we’re expanding the definition of what a generator expression is, then the way to remain consistent with Python’s existing behavior (where f(x for x in it) passes a single argument to f) would be for f(*x for x in it) to be equivalent to f((*x for x in it)). That is, it seems to me like f(<some valid generator expression>) should always pass that generator as a single argument.

Further unpacking that generator to separate arguments to f could still be accomplished with f(*(*x for x in it)), which is perhaps a little clunky but is, I think, the way to remain consistent with what we already have.

A little bit of support in this direction, perhaps, comes from the way that the syntax error for f(*x for x in it) is reported in 3.13, which suggests that this is interpreted as f(<a single malformed generator expression>) rather than as f(*<something>):

>>> f(*x for x in its)
  File "<python-input-0>", line 1
    f(*x for x in its)
      ^^
SyntaxError: iterable unpacking cannot be used in comprehension

blhsing · June 23, 2025, 2:25am

Oops I just realized that f(*(*x for x in it)) is perhaps a more fitting way to interpret PEP-448’s concern than my f(*(x for x in it)). I think your rationale sounds good enough to me. It may be just one of those rarer edge cases that users need to refer to the documentation for and shouldn’t significantly affect the benefits of the syntax much overall.

Nineteendo · June 23, 2025, 10:26am

We can instead think of out = (...x... for x in it) as equivalent to the following code regardless of whether or not ...x... uses *:

def generator():
    for x in it:
        yield from [...x...]
out = generator()

Paddy3118 · June 23, 2025, 2:39pm

… which is fine as we have:

assert {**{'a': 1}, **{'a': 2}, **{'a': 3}} == {'a': 3}

Paddy3118 · June 23, 2025, 2:47pm

I’d expect dict keys with the same hashes such as 0 and 0.0 to keep the first key, (mentioned here).

Paddy3118 · June 23, 2025, 3:01pm

It seems to me that we would create a special case for the generator case which doesn’t sit right with me.

# We could have:
f([*x for x in it])
f({*x for x in it})
f({**x for x in it})
# It seems more consistent to have:
f((*x for x in it))

And leave f(*x for x in it) as an error

(Consistent as all cases would then use an extra inner bracket pair).

pf_moore · June 23, 2025, 3:14pm

Paddy3118:

It seems to me that we would create a special case for the generator case which doesn’t sit right with me.
# We could have:
f([*x for x in it])
f({*x for x in it})
f({**x for x in it})
# It seems more consistent to have:
f((*x for x in it))
And leave f(*x for x in it) as an error

But we have

f([x for x in it])
f({x for x in it})
# And yet we have
f(x for x in it)

The rule that a generator expression doesn’t need to be enclosed in a second pair of parentheses unless needed to resolve ambiguity is pretty well established by now. f(*x for x in it) seems pretty clearly to me to mean f((*x for x in it)) by that rule, with parens added in `f(*(x for x in it)) to disambiguate it from the normal case.

As with anything that’s a matter of what “looks right”, people’s opinions will differ, of course.

Paddy3118 · June 23, 2025, 3:24pm

I see your point - we already have a pattern/exception and the proposal should be thought of as a simple extension of a pre-existing pattern/exception.

I’ll withdraw my criticism then

tjreedy · June 23, 2025, 3:57pm

Currently, the generic <collection-opener> expr(x) for x in iterable <collection-closer> roughly translates to the generic

collection = collection-type()  # Not needed for generator comp.
for x in iterable:
    collection-add-one(expr(x))

The generic proposal is that generic <collection-opener> *iter-expr(x) for x in iterable <collection-closer> translate to similarly generic

collection = collection-type()  # Not needed for generator comp.
for x in iterable:
    collection-add-multiple(iter-expr(x))

Replacing collection-add-one with collection-add-multiple in the interpretation of a comprehension is a bit jarring to me, having used comprehensions for 2 decades. But the value is avoiding having to add an inner for loop, which a user would usually not write versus using the implied extend/update methods that we include with list/set/dict classes. Even though ‘jarred’, I can see this avoidance as being in the spirit of comprehensions.

Properly documenting the change should be part of the PEP and the initial implementation.

elis.byberi · June 23, 2025, 4:34pm

Yes, following the same behavior seems to be the simplest approach. In other words, simply stating that:

dicts = [{'a': 1}, {'a': 2}, {'a': 3}]
flattened = {**d for d in dicts}

is equivalent to:

flattened = {**{'a': 1}, **{'a': 2}, **{'a': 3}}

would be a sufficient and complete explanation.

What’s happening here is that we’re creating a single permutation determined by insertion order, where duplicate keys update only the value. However, this explanation seems more complex than the previous one.

mcepl · June 23, 2025, 6:56pm

-1 from me. My experience taught me that (I think Guido, but I am not sure) was right when he said, that the moment you felt you wanted to do something complicated about comprehensions (and lambdas) was the moment you should write properly named function or for-loop.

MegaIng · June 23, 2025, 7:24pm

Unpacking a series of sequences is not complicated. And the alternative that currently exists (another for-loop) both exists, is being used IRL (I myself have written it dozens of times at least) and is arguably harder to understand for new programmers.

adqm · June 23, 2025, 8:44pm

Indeed, that was the plan (re-using the existing implementation of DICT_UPDATE rather than writing anything new). I’ll try to improve the wording there to make that clearer.