Frozen set literals

Note: check the GitHub repository for the most up to date version: GitHub - nineteendo/frozenset-literals

Not sure if this is the right place to discuss this, but I found this suggestion for frozen set literals on the mailing list, which seems to have been overlooked:


As a half-baked alternative thought, what about using {{'a', 'b', 'c'}} for the syntax. It’s visually clearer, and still syntactically unambiguous, because a dict can’t have another dict as a key, and a
(frozen)set can’t have a dict inside it. Thinking about possible issues–what if you come across a construct with three opening braces at the start? {{{...

  • It can’t be a dict within a dict within a dict, because a dict is not hashable, so can’t be a key.
  • It can’t be a dict within a frozenset, because a dict is not hashable.
  • It can (and must) be a frozenset within a dict, because a frozenset is hashable, and hence a valid dict key.

Four braces? {{{{...

  • I think this has to be a frozen set within a frozen set.

More?

  • The outer one is a dict if an odd number of braces, everything else is a frozen set.

Obviously, the more complex cases get less visually clear, but

  • they are not common cases
  • they are still unambigous

And will often be made clearer by

  • the placement of closing brackets
  • decent syntax highlighting

We previously discussed this here: Alternative call syntax. So, maybe those posts should be moved here?

+1. I think the {{...}} syntax is fairly intuitive, and it saves a call to frozenset after loading it from co_consts.

There are currently 27 usages of frozenset({...}) in CPython, and 28.8k usages on GitHub, which, while not huge, are still meaningful numbers.

1 Like

I think the main reason they aren’t used more often is that frozenset({...}) is just too long.

Many of the usages of {...} could just be replaced with frozen sets.

(And the syntax would also work for frozendict when added).

2 Likes

Very true. Currently a set literal starts out with BUILD_SET instruction, its value is then loaded onto the stack as a frozenset constant from co_consts and finally an addtional SET_UPDATE instruction has to be performed to convert it to a set. Defining sets that don’t need to be mutated as frozensets can save the overhead of this conversion.

from dis import dis
def f():
    s = {1, 2, 3}
dis(f)

outputs:

  2           RESUME                   0

  3           BUILD_SET                0 <= can be skipped with s = {{1, 2, 3}}
              LOAD_CONST               1 (frozenset({1, 2, 3}))
              SET_UPDATE               1 <= can be skipped with s = {{1, 2, 3}}
              STORE_FAST               0 (s)
              RETURN_CONST             0 (None)
1 Like

If you want a frozen set, it also needs to convert it back:

>>> from dis import dis
>>> dis('a = frozenset({1, 2, 3})')
  0           0 RESUME                   0

  1           2 PUSH_NULL
              4 LOAD_NAME                0 (frozenset)
              6 BUILD_SET                0
              8 LOAD_CONST               0 (frozenset({1, 2, 3}))
             10 SET_UPDATE               1
             12 CALL                     1
             20 STORE_NAME               1 (a)
             22 RETURN_CONST             1 (None)

This is all we need (like you already mentioned):

  0           0 RESUME                   0

  1           2 LOAD_CONST               0 (frozenset({1, 2, 3}))
              4 STORE_NAME               0 (a)
              6 RETURN_CONST             1 (None)
1 Like

Sure some people seem to value brevity over clarity.
But a lot of people want clarity over brevity.

I would think that a stronger reason not to see frozenset is that it is often not that important to freeze a set.

Indeed only in the case of coding for robustness would use of frozenset be called out in code review.

2 Likes

Same goes for sets and dicts: set([1, 2, 3]), yet we can still write {1, 2, 3}.

Yes, the current frozenset defintion is needlessly inefficient in that it first gets loaded as a frozenset constant, then gets converted to a mutable set, and finally gets converted back to a frozenset with a call to frozenset, when the first step would’ve been sufficient.

Such syntax is already legal Python, even though it will always raise a TypeError: you are building a set taking a single element, which is also a set.
For new syntax you will need to come up with something that was not legal Python before.

Why is that a problem? You can’t make {'a', 'b', 'c'} hashable, like you can’t add new methods to string literals. So this would only be useful in code golf, which we don’t care about.

No, all that needs to be done is for the parser to make {{...}} of higher precedence than {...}.

1 Like

But you are still technically breaking backward compatibility policy, even though the breakage here is largely theoretical. My point is that, if you look at the new syntax introduced in the past 10 years, it was always something that was not legal Python before (PEP 695, PEP 701, PEP 654, PEP 634, PEP 572, PEP 498, PEP 515, PEP 526…).

IMO frozen set literals have benefits, mostly because the compiler would be able to optimize them in some cases, like what is done for tuples (constant folding, lookup optimization). The current ability to shadow the frozenset builtin prevents the compiler from doing that now.

In terms of syntax, I think a prefix would make more sense, like f{2, 3} or else.

1 Like

Also counts for this, although that now raises a syntax error (and doesn’t do something else):

[i+1for i in range(10)]

And @guido already rejected f{1, 2, 3}.

Does PEP 387 apply here?

assert {{1}} == frozenset({1})  # TypeError: unhashable type: 'set'

Proposal:

assert {{1}} == frozenset({1})  # OK

Not quite; set(a_list) makes a set from a list and {1,2,3} make the set directly.

If you want to clarify you’re creating a set and not a dict, that’s what you would do.

frozenset({1, 2, 3}) makes a frozenset from a set and {{1, 2, 3}} would make the frozenset directly.

As for ambiguity, couldn’t we simply make this a syntax error by making whitespace required?

foo = {{{}}}  # SyntaxError
bar = { {{}} }  # OK
baz = {{ {} }}  # TypeError: unhashable type: 'dict'

It might be a little annoying, but at least it will be clear to the reader and it makes parsing easier.

Just make the syntax foobar{...} mean foobar(...) except that foobar refers to the built-in. Then the human and the interpreter can both be sure the built-in gets used and the interpreter can optimize frozenset, all, any, etc accordingly. (I’m ~99% joking.)

In this case, it’s mostly the number of characters that need to be written. If it also allows us to reduce overhead, that’s a nice bonus.

Could a core developer maybe confirm this for me?