Pre-PEP: Unpacking in Comprehensions

Currently (thanks to PEP 448), we have a very nice shorthand for combining small numbers of iterables/dictionaries:

[*it1, *it2, *it3]
{*it1, *it2, *it3}
{**dict1, **dict2, **dict3}

I think a natural extension of this syntax would be to allow starred expressions in comprehensions and generator expressions, to provide easy ways to combine an arbitrary number of iterables/dictionaries:

[*x for x in its]  # list with the concatenation of all iterables in `its`
{*x for x in its}  # set with the union of all iterables in `its`
{**d for d in dicts}  # dict with the combination of all dictionaries in `dicts`
(*x for x in its)  # generator representing the concatenation of all iterables in `its`

These would effectively be shorthand for the following syntax, but more concise by avoiding the use and repetition of auxiliary variables:

[x for it in its for x in it]
{x for it in its for x in it}
{key: value for d in dicts for key, value in d.items()}
(x for it in its for x in it)

Of course, there are alternative ways to do things like this. Taking the concatenation of a bunch of lists as an example, any of the following would result in the same output (though some are more efficient than others):

[x for it in its for x in it]
list(itertools.chain(*its))
sum((it for it in its), [])
functools.reduce(operator.concat, its, [])

However, the proposed syntax of [*it for it in its] is both more concise and more intuitive (to me, at least) than any of these options. Given the existing unpacking syntax, the additional syntax proposed here feels like a natural way to build such a collection given the existing language features, moreso even than, e.g., [x for it in its for x in it] (where, in my experience with teaching Python, the two for clauses often feel backwards for beginners, so the impulse is to swap their order).

Currently, the proposed syntax results in specific error messages:

>>> [*x for x in its]
    ...
SyntaxError: iterable unpacking cannot be used in comprehension
>>> {**d for d in dicts}
    ...
SyntaxError: dict unpacking cannot be used in dict comprehension

My suspicion is that these error messages are mostly encountered by people who wishfully use this syntax and already have a correct intuition for how it would/should behave, rather than by people accidentally typoing a * or ** into those expressions. My confidence that this extension feels natural is inspired in part by students using this notation on written exams, assuming that it is already a part of the language.


Here are a few small example programs written several different ways, to show how this proposed syntax compares against what we can already do in Python. For each, the last version demonstrates the proposed syntax.

  • Finding all files contained within a directory and its subdirectories:

    def get_all_files(path):
        all_files = []
        for _, dirs, files in os.walk(path):
            all_files.extend(files)
        return all_files
    
    def get_all_files(path):
        return [file for _, _, files in os.walk(path) for file in files]
    
    def get_all_files(path):
        return list(itertools.chain(*(files for _, _, files in os.walk(path))))
    
    def get_all_files(path):
        return [*files for _, _, files in os.walk(path)]
    
  • Typical CS2-level Python exercise, finding the leaf values of a tree using recursion:

    def leaf_values(tree):
        if not tree['children']:
            return {tree['value']}
    
        out = set()
        for child in tree['children']:
            out.update(leaf_values(child))
        return out
    
    def leaf_values(tree):
        if not tree['children']:
            return {tree['value']}
    
        return {
            grandchild
            for child in tree['children']
            for grandchild in leaf_values(child)
        }
    
    
    def leaf_values(tree):
        if not tree['children']:
            return {tree['value']}
    
        return {*leaf_values(child) for child in tree['children']}
    
    
  • Merging information across configuration dictionaries:

    def merge_configs(configs):
        out = {}
        for conf in configs:
            if conf.get("enabled"):
                out.update(conf)
        return out
    
    def merge_configs(configs):
        return {
            key: value
            for conf in configs
            if conf.get("enabled")
            for key, value in conf.items()
        }
    
    def merge_configs(configs):
        return {**conf for conf in configs if conf.get("enabled")}
    
    
  • Filtering out values from an HTML document:

    def matching_items(source):
        out = []
        for section in BeautifulSoup(source).find_all('ul', class_='mylist'):
            out.extend(section.find_all('li'))
        return out
    
    def matching_items(source):
        return [
            item
            for section in BeautifulSoup(source).find_all('ul', class_='mylist')
            for item in section.find_all('li')
        ]
    
    def matching_items(source):
        return [
            *section.find_all('li')
            for section in BeautifulSoup(source).find_all('ul', class_='mylist')
        ]
    

More formally, my friend/colleague Erik Demaine (@edemaine) and I have put together a draft PEP and a basic reference implementation:

This isn’t the first time that this idea has been proposed. Erik and I previously proposed this extension in a thread on the python-ideas mailing list in 2021, where it was met with positive feedback (all replies to that thread were at least +0), but we were ultimately unable to find a sponsor at that time.

So I’m giving it another go here, hoping for feedback (on the proposal and/or the reference implementation) and, ultimately, hoping to find a sponsor for moving forward with the PEP process if there’s still enthusiasm behind this idea.

For additional reference, similar ideas were also presented in PEP 448 itself and in another mailing list thread from 2016, and more recently in another thread on this forum.

21 Likes

I think this is a great idea :slight_smile:
I was actually also working on a PEP for this idea (not ready yet), and I have an implementation as well (ready-ish). I was about to post about it next week :sweat_smile:
I will try to review your PEP, but after a quick read: do you think this should also work for async comprehensions? I see in your implementation it is handled, but there are no mentions of this in the PEP (personally, I lean towards yes for consistency).

1 Like

Only the last config values will appear if the configs share the same set of keys:

data = [{'a': 1}, {'a': 2}, {'a': 3}]
flattened = {k: v for d in data for k, v in d.items()}
print(flattened)  # {'a': 3}

Perfect timing, then, I suppose! Certainly happy to work together on this.

Yes, I do think it makes sense for this to work for async comprehensions for consistency. I also agree that it’s worth calling this out specifically in the PEP; I’ll try to add those words later.

Yes, that’s true (but it does/should work that way in all of the implementations of that program). The intention is that {**d for d in dicts} is equivalent shorthand to {**dicts[0], **dicts[1], ..., **dicts[-2], **dicts[-1]}, including the fact that later values for the same key override earlier values.

The draft PEP has some wording to this effect: “As usual with sets and dictionaries, repeated elements/keys replace earlier instances.” But I can try to see if I can make that part of the intended spec clearer.

2 Likes

Yes, a subtopic explaining how merge conflicts are resolved would make this clearer. This differs from .update(), which explicitly defines how merge conflicts are handled.

A combination is a selection of items from a set where the order of selection does not matter.

Will this also work for PEP 530 async comprehensions? So for example

[*x async for x in aits()]

instead of

[x async for ait in aits() for x in ait]

I’ve always found it surprising and counter-intuitive that unpacking in comprehensions doesn’t work ever since I learned about generalized unpacking in container literals.

That said, I think the PEP needs to address the concern raised by the explanations to why such a feature wasn’t included the PEP-448, specifically that it would cause ambiguity when unpacking in an unbracketed generator expression in a call since argument list already supports unpacking.

That is, which one of these is intended?

f(*x for x in it) == f((*x for x in it))

or:

f(*x for x in it) == f(*(x for x in it))

Original quote from PEP-448:

Unbracketed comprehensions in function calls, such as f(x for x in it), are already valid. These could be extended to:

f(*x for x in it) == f((*x for x in it))
f(**x for x in it) == f({**x for x in it})

However, it wasn’t clear if this was the best behaviour or if it should unpack into the arguments of the call to f. Since this is likely to be confusing and is of only very marginal utility, it is not included in this PEP. Instead, these will throw a SyntaxError and comprehensions with explicit brackets should be used instead.

1 Like

That’s the intention, yes. I can add some words to that effect to the PEP.

That’s a good point. It may take a little while to find the right specific words to add to the PEP, but I can give the short version of my personal reasoning here, which is that if we’re expanding the definition of what a generator expression is, then the way to remain consistent with Python’s existing behavior (where f(x for x in it) passes a single argument to f) would be for f(*x for x in it) to be equivalent to f((*x for x in it)). That is, it seems to me like f(<some valid generator expression>) should always pass that generator as a single argument.

Further unpacking that generator to separate arguments to f could still be accomplished with f(*(*x for x in it)), which is perhaps a little clunky but is, I think, the way to remain consistent with what we already have.

A little bit of support in this direction, perhaps, comes from the way that the syntax error for f(*x for x in it) is reported in 3.13, which suggests that this is interpreted as f(<a single malformed generator expression>) rather than as f(*<something>):

>>> f(*x for x in its)
  File "<python-input-0>", line 1
    f(*x for x in its)
      ^^
SyntaxError: iterable unpacking cannot be used in comprehension
2 Likes

Oops I just realized that f(*(*x for x in it)) is perhaps a more fitting way to interpret PEP-448’s concern than my f(*(x for x in it)). I think your rationale sounds good enough to me. It may be just one of those rarer edge cases that users need to refer to the documentation for and shouldn’t significantly affect the benefits of the syntax much overall.

We can instead think of out = (...x... for x in it) as equivalent to the following code regardless of whether or not ...x... uses *:

def generator():
    for x in it:
        yield from [...x...]
out = generator()

… which is fine as we have:

assert {**{'a': 1}, **{'a': 2}, **{'a': 3}} == {'a': 3}

I’d expect dict keys with the same hashes such as 0 and 0.0 to keep the first key, (mentioned here).

It seems to me that we would create a special case for the generator case which doesn’t sit right with me.

# We could have:
f([*x for x in it])
f({*x for x in it})
f({**x for x in it})
# It seems more consistent to have:
f((*x for x in it))

And leave f(*x for x in it) as an error

(Consistent as all cases would then use an extra inner bracket pair).

But we have

f([x for x in it])
f({x for x in it})
# And yet we have
f(x for x in it)

The rule that a generator expression doesn’t need to be enclosed in a second pair of parentheses unless needed to resolve ambiguity is pretty well established by now. f(*x for x in it) seems pretty clearly to me to mean f((*x for x in it)) by that rule, with parens added in `f(*(x for x in it)) to disambiguate it from the normal case.

As with anything that’s a matter of what “looks right”, people’s opinions will differ, of course.

I see your point - we already have a pattern/exception and the proposal should be thought of as a simple extension of a pre-existing pattern/exception.

I’ll withdraw my criticism then :slight_smile: