Frozen set literals

Nineteendo · May 16, 2024, 8:44am

Note: check the GitHub repository for the most up to date version: GitHub - nineteendo/frozenset-literals

Not sure if this is the right place to discuss this, but I found this suggestion for frozen set literals on the mailing list, which seems to have been overlooked:

As a half-baked alternative thought, what about using {{'a', 'b', 'c'}} for the syntax. It’s visually clearer, and still syntactically unambiguous, because a dict can’t have another dict as a key, and a
(frozen)set can’t have a dict inside it. Thinking about possible issues–what if you come across a construct with three opening braces at the start? {{{...

It can’t be a dict within a dict within a dict, because a dict is not hashable, so can’t be a key.
It can’t be a dict within a frozenset, because a dict is not hashable.
It can (and must) be a frozenset within a dict, because a frozenset is hashable, and hence a valid dict key.

Four braces? {{{{...

I think this has to be a frozen set within a frozen set.

More?

The outer one is a dict if an odd number of braces, everything else is a frozen set.

Obviously, the more complex cases get less visually clear, but

they are not common cases
they are still unambigous

And will often be made clearer by

the placement of closing brackets
decent syntax highlighting

We previously discussed this here: Alternative call syntax. ~~So, maybe those posts should be moved here?~~

blhsing · May 16, 2024, 8:52am

+1. I think the {{...}} syntax is fairly intuitive, and it saves a call to frozenset after loading it from co_consts.

There are currently 27 usages of frozenset({...}) in CPython, and 28.8k usages on GitHub, which, while not huge, are still meaningful numbers.

Nineteendo · May 16, 2024, 8:59am

I think the main reason they aren’t used more often is that frozenset({...}) is just too long.

Many of the usages of {...} could just be replaced with frozen sets.

(And the syntax would also work for frozendict when added).

blhsing · May 16, 2024, 9:08am

Very true. Currently a set literal starts out with BUILD_SET instruction, its value is then loaded onto the stack as a frozenset constant from co_consts and finally an addtional SET_UPDATE instruction has to be performed to convert it to a set. Defining sets that don’t need to be mutated as frozensets can save the overhead of this conversion.

from dis import dis
def f():
    s = {1, 2, 3}
dis(f)

outputs:

  2           RESUME                   0

  3           BUILD_SET                0 <= can be skipped with s = {{1, 2, 3}}
              LOAD_CONST               1 (frozenset({1, 2, 3}))
              SET_UPDATE               1 <= can be skipped with s = {{1, 2, 3}}
              STORE_FAST               0 (s)
              RETURN_CONST             0 (None)

Nineteendo · May 16, 2024, 9:22am

If you want a frozen set, it also needs to convert it back:

>>> from dis import dis
>>> dis('a = frozenset({1, 2, 3})')
  0           0 RESUME                   0

  1           2 PUSH_NULL
              4 LOAD_NAME                0 (frozenset)
              6 BUILD_SET                0
              8 LOAD_CONST               0 (frozenset({1, 2, 3}))
             10 SET_UPDATE               1
             12 CALL                     1
             20 STORE_NAME               1 (a)
             22 RETURN_CONST             1 (None)

This is all we need (like you already mentioned):

  0           0 RESUME                   0

  1           2 LOAD_CONST               0 (frozenset({1, 2, 3}))
              4 STORE_NAME               0 (a)
              6 RETURN_CONST             1 (None)

barry-scott · May 16, 2024, 9:27am

Sure some people seem to value brevity over clarity.
But a lot of people want clarity over brevity.

I would think that a stronger reason not to see frozenset is that it is often not that important to freeze a set.

Indeed only in the case of coding for robustness would use of frozenset be called out in code review.

Nineteendo · May 16, 2024, 9:32am

Same goes for sets and dicts: set([1, 2, 3]), yet we can still write {1, 2, 3}.

blhsing · May 16, 2024, 9:51am

Yes, the current frozenset defintion is needlessly inefficient in that it first gets loaded as a frozenset constant, then gets converted to a mutable set, and finally gets converted back to a frozenset with a call to frozenset, when the first step would’ve been sufficient.

alexprengere · May 16, 2024, 10:01am

Such syntax is already legal Python, even though it will always raise a TypeError: you are building a set taking a single element, which is also a set.
For new syntax you will need to come up with something that was not legal Python before.

Nineteendo · May 16, 2024, 10:05am

Why is that a problem? You can’t make {'a', 'b', 'c'} hashable, like you can’t add new methods to string literals. So this would only be useful in code golf, which we don’t care about.

blhsing · May 16, 2024, 10:05am

No, all that needs to be done is for the parser to make {{...}} of higher precedence than {...}.

alexprengere · May 16, 2024, 10:31am

But you are still technically breaking backward compatibility policy, even though the breakage here is largely theoretical. My point is that, if you look at the new syntax introduced in the past 10 years, it was always something that was not legal Python before (PEP 695, PEP 701, PEP 654, PEP 634, PEP 572, PEP 498, PEP 515, PEP 526…).

IMO frozen set literals have benefits, mostly because the compiler would be able to optimize them in some cases, like what is done for tuples (constant folding, lookup optimization). The current ability to shadow the frozenset builtin prevents the compiler from doing that now.

In terms of syntax, I think a prefix would make more sense, like f{2, 3} or else.

Nineteendo · May 16, 2024, 10:37am

Also counts for this, although that now raises a syntax error (and doesn’t do something else):

[i+1for i in range(10)]

And @guido already rejected f{1, 2, 3}.

Nineteendo · May 16, 2024, 12:27pm

Does PEP 387 apply here?

assert {{1}} == frozenset({1})  # TypeError: unhashable type: 'set'

Proposal:

assert {{1}} == frozenset({1})  # OK

barry-scott · May 16, 2024, 1:42pm

Not quite; set(a_list) makes a set from a list and {1,2,3} make the set directly.

Nineteendo · May 16, 2024, 1:51pm

If you want to clarify you’re creating a set and not a dict, that’s what you would do.

frozenset({1, 2, 3}) makes a frozenset from a set and {{1, 2, 3}} would make the frozenset directly.

Nineteendo · May 16, 2024, 4:35pm

As for ambiguity, couldn’t we simply make this a syntax error by making whitespace required?

foo = {{{}}}  # SyntaxError
bar = { {{}} }  # OK
baz = {{ {} }}  # TypeError: unhashable type: 'dict'

It might be a little annoying, but at least it will be clear to the reader and it makes parsing easier.

Stefan2 · May 16, 2024, 4:49pm

Just make the syntax foobar{...} mean foobar(...) except that foobar refers to the built-in. Then the human and the interpreter can both be sure the built-in gets used and the interpreter can optimize frozenset, all, any, etc accordingly. (I’m ~99% joking.)

Nineteendo · May 16, 2024, 4:54pm

In this case, it’s mostly the number of characters that need to be written. If it also allows us to reduce overhead, that’s a nice bonus.

Nineteendo · May 16, 2024, 7:15pm

Could a core developer maybe confirm this for me?