Frozenset literals and comprehensions

About 10 months ago, I created a proof of concept implementation of frozenset literals and comprehensions, including tests and documentation. The chance of this getting accepted is very low, but I wanted to share it in case someone would like to play around with it.

Frozen sets

Frozen sets are immutable sets and can be used for constants. Double curly braces or the frozenset() function can be used to create them:

>>> {{1, 2, 3}}
{{1, 2, 3}}
>>> frozenset('foobar')
{{'f', 'r', 'a', 'b', 'o'}}

Notes:

  • To create an empty frozen set you have to use frozenset(), not {{}}; the latter is reserved for an empty frozen dictionary.

  • To use a frozen set as the first element of a set or in a set comprehension, you need to add whitespace:

    >>> { {{3}}, {{2}}, {{1}} }
    { {{3}}, {{2}}, {{1}} }
    >>> { {{c}} for c in 'cba' }
    { {{'c'}}, {{'b'}}, {{'a'}} }
    

Like sets, comprehensions are also supported:

>>> a = {{x for x in 'abracadabra' if x not in 'abc'}}
>>> a
{{'r', 'd'}}

PR (on my fork): Embrace the frozen set by nineteendo · Pull Request #19 · nineteendo/cpython · GitHub
Documentation: https://nineteendo-cpython--19.org.readthedocs.build

5 Likes

{{/}} would be analogous to {/} for set idea.

3 Likes

I have implemented empty set literals on the linked PR, but I don’t think it’s worth the churn.

>>> {/}
{/}
>>> {{/}}
{{/}}

There are 160 occurences of set() and frozenset() in Lib/ (excluding Lib/test, includes some false positives). Compare that to:

  • 17 occurences of frozensets constructed from a list, tuple or set literal
  • 3 occurences of frozensets constructed from a generator expression or set comprehension.
  • 2 occurences of frozensets constructed from a union of sets literals and comprehensions

All of these become roughly 9 charactes shorter (compared to 2 for empty sets and 6 for empty frozen sets).

2 Likes

Nice work, but it isn’t that simple: From reading your grammar description, it seems like that it breaks the currently valid code {{1,2,3}.pop(),4}

Moreover, it also allows {{1,2,3,4} } as a frozen set, which is also not desirable.

2 Likes

Good catch, although a comparison seems more likely to occur in real code. e.g.

>>> s = {4, 5, 6}
>>> {{1, 2, 3}.isdisjoint(s)}
{True}

But people usually don’t write yoda conditions:

>>> s = {4, 5, 6}
>>> {s.isdisjoint({1, 2, 3})}
{True}

Yeah, that’s unfortunate. It’s necessary to not break f-strings:

>>> f"{{{1}}}"
'{1}'

I wonder if can use a soft keyword…
Edit: must be a literal.

You can apply the same trick that you have applied for the right side also for the left side, i.e., do not introduce a {{ token, but define the grammar for frozenset as '{' '{' star_named_expressions '}' '}' and analogous for frozensetcomp. In order for this to work, you have to put the rule for frozen sets before the rule for sets in order to take precedence:

atom[expr_ty]:
    ...
    | &'{' (dict | frozenset | set | dictcomp | frozensetcomp | setcomp)

This does the trick and fixes the issues with {{{3}}, {{2}}, {{1}}} (I only tried the parsing locally, not the rest of your PR.)

In order to forbid { {1,2,3}}, {{1,2,3} } and { {1,2,3} }, one could create hand-crafted SyntaxErrors. That would be possible, but somewhat a hack. But, the current grammar for relative imports is also a hack (Python tokenizes from ....foo import bar as an Ellipsis, followed by a dot). Sometimes, there are no simple solutions.

2 Likes

It looks more like this:

atom[expr_ty]:
    ...
    | &('{' '{') !('{' '{' (dict | set | dictcomp | setcomp) '}' '}') (frozenset | frozensetcomp)
    | &'{' (dict | set | dictcomp | setcomp)

Otherwise this fails:

>>> {{{1, 2, 3}}}
{ {{1, 2, 3}} }

OK, that also didn’t work, I needed recursion:

atom[expr_ty]:
    ...
    | &('{' '{') !invalid_frozenset (frozenset | frozensetcomp)
    | &'{' (dict | set | dictcomp | setcomp)

invalid_frozenset:
    | '{' '{' ~ invalid_frozenset '}' '}'
    | (dict | set | dictcomp | setcomp)
>>> {1, 2, 3}
{1, 2, 3}
>>> {{1, 2, 3}}
{{1, 2, 3}}
>>> {{{1, 2, 3}}}
{{{1, 2, 3}}}
>>> {{{{1, 2, 3}}}}
{{{{1, 2, 3}}}}
>>> {{{{{1, 2, 3}}}}}
{{{{{1, 2, 3}}}}}
>>> {{{{{{1, 2, 3}}}}}}
{{{{{{1, 2, 3}}}}}}
>>> {{{{{{{1, 2, 3}}}}}}}
{{{{{{{1, 2, 3}}}}}}}
>>> {{{{{{{{1, 2, 3}}}}}}}}
{{{{{{{{1, 2, 3}}}}}}}}
2 Likes

Thanks for catching my error above. I think, its is really necessary to provide a rule that there must not be a whitespace between the two {{ and the two }}. Unfortunately, the PEG parser generator does not provide such a command. I changed Parser/parser.c by hand to accept '{' '{' star_named_expressions '}' '}' only if there is no whitespace between the two opening and the two closing braces. In this case:

  • {{ {a,b,c} }} is a frozen set of a set (thus invalid)
  • { {{a,b,c}} } is a set of a frozen set (valid)
  • {{ {a,b,c}} } is a set of a set of a set (invalid)
  • {{{a,b,c}}} is parsed as a frozen set of a set (thus invalid) as I would usually expect.
  • {{{a,b}},{{c}}} is parsed as a set of two frozen sets (valid).

Assuming s is unknown, Yoda order is better. In Yoda order, isdisjoint method is always know and will tell user what’s wrong with s, but that’s not the case in 2nd version.

1 Like

Making this invalid, makes the formatting to keep this valid a lot more complicated (I just removed that code). It’s better to leave this up to a linter.

Ideally no space should be allowed between the pseudo token though.

You’re right, yoda-conditions (SIM300) | Ruff only applies to ==, but this is now irrelevant because expressions like {{1,2,3}.pop(),4} now work.

1 Like