Pre-PEP: Unpacking in Comprehensions

The exact quote is “Seems sensible to me” :slight_smile:

Could a poll help clarify the level of support of the community for this?
It is quite difficult to gauge it from those discussions (and I am not saying a high level of support should mean for this PEP to be accepted), but perhaps it could help find a core dev to sponsor it.

3 Likes

How should it behave in this case?

x = 111
y = 222
z = 333
c = [x, y, z]
[a, b, *c] = [1,2,3,4,5]

Do we get a = 1, b = 2, c = [3, 4, 5] as it happens now, or do we get a = 1, b = 2, x = 3, y = 4, z = 5?


How would it be for [a, b, *c] = [a, b, *c]?


See here what I wanted to get to.

I’m personally more than fine just to keep gathering feedback/suggestions ad hoc for now and hope that one of the core devs is excited enough by the idea to offer to sponsor it.

This post has only been up for ~3 days so far, so I don’t think there’s any imminent risk of this idea fading into obscurity. Also, I imagine there are lots of people who only check here periodically, and I’d like to have an opportunity to hear from them, too. I’d prefer to take our time and gather more feedback and make sure we get the wrinkles ironed out and as many people happy as possible, rather than rushing and having it more likely to be rejected as a result.

That said, given my general ignorance of the process, if there’s something more proactive we should be doing to try to find a sponsor (either now or in the longer term), I’d appreciate it if someone more experienced could offer advice :slight_smile: (though I would imagine that that meta conversation might be better held somewhere else so we can keep this thread on topic).

2 Likes

Let me show you the Python equivalent. You need a new built-in class for the value of a *x expression, and some code that recognises objects of this class and acts on them:

class Unpackable:
    def __init__(self, iterable):
        self.iterable = iterable

def unpack(iterable):
    for it in iterable:
        if isinstance(it, Unpackable):
            yield from it.iterable
        else:
            yield it

To implement the new syntax, turn *x into Unpackable(x), and replace [...] with list(unpack(...)).

So this:
[*x if isinstance(x, Iterable) else x for x in its]

Becomes this:
list(unpack(Unpackable(x) if isinstance(x, Iterable) else x for x in its))

For the record, I’m -1 on all of this.

Can’t you just implement direct unpacking in the interpreter?

I’m just explaining how @blhsing’s further generalization would work under the hood. It can’t be implemented more directly, because the *x could be potentially nested deep in the target expression. You could make a special case implementation for when the * is the top level operator, but that’s about as much as you can do without it getting very complicated very quickly.

I don’t see why that makes it impossible. A comprehension like [x if cond else y for ...] is compiled like this:

    LOAD_NAME cond
    TO_BOOL
    POP_JUMP_IF_FALSE to L3
    LOAD_NAME x
    JUMP_FORWARD to L4
L3: LOAD_NAME y
L4: LIST_APPEND

Why couldn’t [*x if cond else y for ...] simply be compiled like this?

    LOAD_NAME cond
    TO_BOOL
    POP_JUMP_IF_FALSE to L3
    LOAD_NAME x
    LIST_EXTEND
    JUMP_FORWARD to L4
L3: LOAD_NAME y
    LIST_APPEND
L4: 

And similarly for more complex expressions…

1 Like

There’s something very cool about the idea of something like [*x if cond else y for ...], and it’s one of the things that I had initially thought about when writing this up. But actually parsing and handling that in a general sense complicates the language too much for my taste (or makes it deviate too much from the existing language for me to be on board, at least for now).

Unless I’m missing something, it seems that both of the two proposed extensions here would require much more substantial changes to the language, either:

  • allowing starred expressions to themselves actually be a new kind of expression and then thinking through how to handle the effects of that everywhere, or
  • duplicating most of the current parser constructs with starred versions (or partially-starred versions) that can be used in expression lists but not elsewhere

Neither of those seems particularly simple to me, and I think that the complexity that either of those would introduce makes them much less attractive to me. That is to say, if we were going to propose something that would make that big of a change, I would need strong justification to feel like it was worth it.

I’ve tried in the current version of the PEP to show that there are examples in the standard library that could benefit from my original syntax (combining a bunch of iterables together into one). Are there enough similar examples of combining a bunch of different structures, some of which are iterable and some of which are not, and where the resulting “extended” comprehension syntax is clearer, to justify this extra complexity? If so, we should go for it; if not, I’ll add a note to the current PEP that these ideas were considered but deferred but leave it to future PEPs to think more about them.

1 Like

You can special-case top-level ternary expressions, sure. I think it falls into the if it’s hard to explain bucket, but it’s possible. But if you want it to work for arbitrary Python expressions, you’d better have an answer for what this compiles into:

[random.choice([x, (*x)]) for x in its]

Edit: Changed *x to (*x) to avoid it being a list display.

+1 from me on keeping the proposal simple. And I’ll note that as the PEP author, you are perfectly within your rights to make that choice - the people wanting the more complex solution can always write their own alternative proposal, after all.

8 Likes

That’s already valid and dis.dis tells you what that compiles to.

2 Likes

Since [x, *x] is already a valid list literal, wouldn’t the answer be that *x is unpacked into the list that is provided to random.choice?

(That said, I think we’re beating a dead horse as no one has defended the conditonal unpacking proposal in the past several posts. It could also be deferred to another PEP if people actually wanted it.)

P.S. I have edited my original reply to acknowledge my misreading of @blhsing’s original proposal.

2 Likes

You are right. I have edited the post to make it not a list display. The challenge remains.

The only case where the new * meaning might become confusing is when it’s used with PEP 3132: Extended Iterable Unpacking:

a = [*x for *x, y in [[1, 2], [1, 2]]]

x is assigned via unpacking, and then immediately re-unpacked. That’s double indirection, visually and mentally.

3 Likes

In case anyone wants to try this out but doesn’t want to compile the reference implementation, I put together a little Emscripten-based demo, based on Katie Bell’s WASM REPL with some small changes (using Ace instead of a regular textbox for the code editor, and some tweaks to try to make arrow keys work for navigation with the REPL).

3 Likes

This is a very good point, and one that I hadn’t thought of. I definitely had to stop and think carefully about what the output would be in your example.

I guess there are maybe a few questions that come to mind in response:

  1. Is it the syntax that’s confusing things here, or is the semantic structure intrinsically confusing? That is, are either of the following substantially easier or harder to understand than the version quoted above?

    [z for *x, y in [[1, 2], [1, 2]] for z in x]
    
    out = []
    for *x, y in [[1, 2], [1, 2]]:
        out.extend(x)
    
  2. Is it less confusing if you’re not unpacking just a single value? That is, is the following easier to reason about since it’s clearer what the right-most *x is doing when it’s actually grabbing multiple values?

    [*x for *x, y in [[1, 2, 3], [1, 2, 3]]]
    

    or, more realistically, if we’re unpacking from variables, which is probably the more likely way that this would show up in real code?

    [*x for *x, y in list_of_lists]
    
    # or
    [*x for *x, y in [a, b, c]]
    
  3. Does this example illustrate a common pattern that people are going to want to use often in real code? If so, is this the natural structure that comes to mind, or would people likely turn to a different structure as their first choice?

These questions are all at least somewhat subjective, but I’m curious what people think about them.

3 Likes

FWIW this is clear to me. I would read the *x, y as re-packing the lists we’re iterating over and the *x for as unpacking the resulting lists. I suppose the confusion would be if someone imagines *x is a name in its own right and not an operation that considers whether the name is being read or written, but having used it in both contexts, it doesn’t strike me as confusing.

Given the existing one-liner alternatives:

[z for x in list_of_lists for z in x[:-1]]
[z for *x, _ in list_of_lists for z in x]

I think [*x for *x, _ in list_of_lists] is more readable. It even seems like a conveniently compact example to explain the difference between unpacking assignments and unpacking expressions[1].


  1. I’m not sure if we have terms for distinguishing these cases, but hopefully this is clear enough. ↩︎

7 Likes

I’d personally prefer

[*x[:-1] for x in list of lists]

(Note it’s not clear to me whether it’d be better to write *x[:-1] or *(x[:-1]).)
Just because notation can be used in an obscuring manner, that doesn’t mean it’s bad.

currently I might write this as

sum(x[:-1] for x in list_of_lists, [])
or
list(itertools.chain(x[:-1] for x in list_of_lists))

because I don’t like having double for loops inside list comprehensions.

1 Like

I’ll take a shot at providing my answers here. For context, I have used Python extensively for personal and professional purposes for 10 years, but am not a software developer by trade and have ~no formal computer science training.

I find the first version to be about as confusing as the one with comprehension unpacking, but I already noted above that I find nested for loops in comprehensions to be counterintuitive.

The second is clearer, primarily because the explicit for loop syntax means the unpacking in *x, y is declared before x is used as an iterable.

I think that it’s slightly more clear with the longer list literals or when the outermost list is named something explicit like list_of_lists, as the context makes clearer that y is the last element of each sublist, and x contains everything else.

The last example is less helpful, as generic variable names like a, b, and c don’t convey the contextually important information that they are iterables with at least one item. As a general rule though, I only use unpacking when it’s clear in context that the item being unpacked has an appropriate structure. These toy examples lack that context.

This particular example is not common IME (in words, I would summarize it as “remove the last element from each list and concatenate the result”), and is complicated enough to warrant a more explicit for loop. Moreover, the fact that y is unused and the inputs are lists means I would default to using slices instead:

[*x[:-1] for x in list_of_lists]

Obviously, one can contrive alternatives that preclude trivial slicing (e.g. use a list of iterators).


Writing this up, I came up with my own list of variant to consider, which at least makes use of the unpacked y. It has no less than three the unpackings, and yet may be more readable than the original:

[*(y, *x) for *x, y in [[1, 2], [3, 4, 5]]]

(I’ll note none of these examples change my +1 on the PEP. There are plenty of inuitive uses of this syntax; confusing ones like these seem like edge cases that should be rejected in code review.)

3 Likes

I find all the variations equally confusing. That’s probably because I don’t use this sort of generalised unpacking at all, so *x, y = something is unfamiliar to me. Using 2-element lists (so x gets unpacked to a 1-element list) made it worse, but that’s likely to be very uncommon. Also, adding parentheses, as [*x for (*x, y) in it] improves readability, so again you’ve picked the worst case as an example here.

I feel that the pattern of using *x to mean “unpack” is close to being overused at this point. In its simplest forms, it’s natural and clear. But the more uncommon or complex uses feel obfuscated, and I’d be cautious about allowing them in production code. Having said that, this is simply a case of "any syntax construct can be overused/misused, so I don’t think it’s a fatal issue. And the simplest form of the proposal, [*x for x in it] is intuitive - so let’s not condemn the proposal because it can be abused.

I think that the simple [*x for x in it] pattern is fairly rare. So I wouldn’t miss this syntax if it doesn’t get added. But the alternatives are somewhat clumsy, so I can see the benefit of adding it. I wouldn’t call unpacking the “natural” approach (at least not for me) but I could easily get used to it.

I think that using unpacking in the for part of a comprehension using the new unpacking syntax (the [*x for *x, y in it] case) is going to be vanishingly rare. So IMO it’s not an example that’s worth focusing on. But if I did come across a case like this in real life, I’d definitely not try to handle it in a single comprehension - I’d break it down somehow, maybe by writing it procedurally:

result = []
for lst in list_of_lists:
    result.extend(lst[:-1])

or maybe by creating a named helper function if I felt like using unpacking:

def all_but_last(lst):
    *result, _ = lst
    return result

[*all_but_last(l) for l in list_of_lists]

Of those two, I’d consider the first to be most natural.

3 Likes