This post is about another idea stemming from thinking about PEP 798. I think I’m less enthusiastic about this idea than about making list.extend and dict.update variadic, but I figured it’d be worth bringing up anyway as part of this little family of ideas.
dict’s constructor and dict.update accept iterables of 2-element iterables as input:
I wonder whether it might be worth considering allowing ** to unpack iterables of this form as well, to make dict.update and ** more similar, allowing, e.g.:
Not only it would make the behaviour more predictable - “pairs are ok everywhere” - , but also it would remove some of the unnecessary intermediate representations making things cleaner and more performant.
I have quite a lot of instances, where I have pairs, but then need to store it as dict, so that it can later be used with {**d1, **d2, **d3, ...}, which is done many times later on.
I’m not a big fan of defining syntax specifically for “iterable of sequences of length 2”. That feels too magical to me.
Given dictionary comprehensions and the union operator, is there really anything this syntax could express more concisely and intuitively? In my mind, mixing those two types together with one operator is a higher burden when reading the code.
The two types are already mixed up in the dict constructor. If dict.__iter__ aliased items rather than keys then it would all be one consistent type since a dict would be an iterable of pairs.
Agreed, this feels too much like a weird special case to me. I don’t think it’s justified, TBH - it’s not like it lets you do anything you can’t already do, and punctuation-heavy constructs are often less readable than simple calls to well-named methods.
I would expect it to be equivalent to (without making unnecessary intermediate dictionaries):
d3 = {**dict(d), **dict(pairs)}
IDK, it seems pretty straightforward to me to allow ** to take anything that could be passed to dict.update() (in fact {**x} produces the DICT_UPDATE bytecode). It would feel more consistent to me. That said, while I can imagine using this, I can’t recall a time I wanted to and was annoyed I couldn’t, so I’m a +0.5 on the idea.
The main use-case I had in mind when thinking about this was when I already had an association list in hand and wanted to treat it like as a mapping, i.e., replacing **dict(x) with **x. I don’t feel like there’s a loss of clarity there, and there’s almost certainly a performance win.
Then again, I agree that I’m not sure that this use case is common enough to be overly concerned with; **dict(...) only shows up once in the whole standard library, for example. Mostly it just feels somewhat weird to me to have update and ** be different in this way.
I think this is a nice idea in principle, but in practice it is problematic because the “pairness” of the elements of an iterable of non-dict iterables cannot be enforced: some of the iterables may only have one element, e.g. (x,). The problem doesn’t exist for dicts as they are keyed pairs to begin with.
I suppose in the latter case (x,)could be mapped to (x, None)[?]
There are many places where this is the case and one needs to trace things back. e.g. needing to know whether something is iterator or not to know if it is going to be consumed or not. Or for v in iterable, where iterable can mean variety of things.
Of course it would be great if everything was easily inferable, but Python has given a lot of that up for the sake of flexibility and other benefits. And although it is important not to go too wild with this, but I don’t think this specific case is anywhere near the red line that would kill this automatically without any further consideration.
I think this is kind of standard situation in Python world (I mean… duck typing…), and from my experience, consistently applied protocols is one of the best of remedies for such. Alongside proper variable naming, comments, annotations…
Why would one need to know this so badly? {**a, **b} - result is dict, input is key-value pairs in one of the forms that are clearly defined by dict.update protocol.
Well, funkiness is already there. Same funkiness everywhere might just be a bit easier to digest and remember as opposed to funkiness in one place and the need to remember where it does apply and where it doesn’t.
However, some evidence is needed for usefulness of this.
It has performance benefits, but use-cases that I would apply this to are mostly top module definitions where it isn’t very important.
Yeah I don’t have any issues understanding the concept, I was thinking about encountering this code in the wild (probably without such descriptive names). I think it would be confusing that ** was doing two different things in the same expression. I might be wrong–it’s possible to get used to anything, and IDEs are helpful. It just doesn’t seem like something I need in the language.
I would want this to mirror the semantics of the existing dict.update(list_) or dict(list_) semantics exactly. Those both raise an exception in the case where one of the internal iterables doesn’t have length 2, so I would want:
Examples
>>> malformed_seq2 = [(1, 2), (3,), (4,5)]
>>> dict(malformed_seq2) # existing behavior
Traceback (most recent call last):
File "<python-input-10>", line 1, in <module>
dict(malformed_seq2)
~~~~^^^^^^^^^^^^^^^^
ValueError: dictionary update sequence element #1 has length 1; 2 is required
>>> malformed_seq2 = [(1, 2), (3,), (4,5)]
>>> d = {}
>>> d.update(malformed_seq2) # existing behavior
Traceback (most recent call last):
File "<python-input-14>", line 1, in <module>
d.update(malformed_seq2)
~~~~~~~~^^^^^^^^^^^^^^^^
ValueError: dictionary update sequence element #1 has length 1; 2 is required
>>> malformed_seq2 = [(1, 2), (3,), (4,5)]
>>> {**malformed_seq2} # i would want the same behavior here
Traceback (most recent call last):
File "<python-input-13>", line 1, in <module>
{**malformed_seq2}
ValueError: dictionary update sequence element #1 has length 1; 2 is required
I suppose that’s an approach. But then ** would produce different results for different input types: for dicts (and mapping types generally) it would always work, but not necessarily for other types of iterables. If you think of ** as an unpacking operator for any iterable of pair iterables I think you’d want consistent results on all possible inputs in its domain.
I would maybe argue that, e.g., [(1,2), (3,)]isn’t in the domain of ** under this proposal even if [(1,2), (3,4)] is; and that folks attempting to use ** on such an iterable should be told that it’s malformed, just like if they fed such a thing to dict.update.
Consistency with update and with dict’s initializer were the original motivation for bringing this up, and I would feel like something was being lost (and that we would be descending into arbitrary magic/funkiness) if all three of those things didn’t end up being consistent.
This is hard to pin down, unfortunately. Maybe someone can do a better job than I can of structuring such a search…
My first attempt was a search for /\*\*dict\([^=]*\)/. That search claims 52.5k results, but that regex has a lot of holes in it, and just looking at the first page shows several false positives (copying a dict, building and then unpacking dict that has its own kwargs, examples where the type of the argument to dict is unclear, etc). And even if they were all true positives, that doesn’t seem like a very big number…
Searching for **dict(zip( gives 10.2k results, which wasn’t originally on my mind but which does seem like maybe it could represent a substantial subset of code that could change as a result of this idea by just using **zip(.
I still think I like this idea, but these quick searches don’t imply that it would be super widely-used.
Maybe one of normalising constants could be: /\=\s+\{.*\*\*.*\}/ Language:Python
307K files.
Say: 10K / 300K ~ 3%. Both numbers are incomplete though. If it was say 10%, maybe that would cover “sufficient use cases” criteria… To me this would suggest that maybe there are enough use cases to continue with this.
This is kind of average case for iterable usage though… Don’t think it makes much difference whether it is an input to a function or syntactic construct.
I made a guess on what you mean by “leaky abstraction”. Which is as you said “The validity of the ** syntax depending on the contents of the iterable”.
In this case, it is the same as the case to which this thread is attempting to synchronize:
dict.update([(k0, v0), (k1, v1)])
If say last item is not a pair, then it raises an exception. Same would be true with **pairs. Don’t see the difference.
Or if you meant something else, then my bad and I could use some clarification.
“Leaky abstraction” is a new terminology to me. Read Leaky abstraction - Wikipedia and it kind of says “All non-trivial abstractions, to some degree, are leaky.”. Is this more “leaky” than appropriate? Why?
We can’t determine whether there are key-value pairs without consuming the iterable; we only see that it’s an iterable. That’s the point where the ** operator is applied. Additionally, we have a specific data type for representing pairs: a mapping.