Frozen set literals

Sixteen years ago to reserve space for a hypothetical language extension that never happened. It’s possible that this might not be binding.


I’d much rather have the compiler spot and optimise the existing pattern. Is there any reason that it couldn’t do that? The only thing I can think of is a theoretical piece of code that is relying on each pass returning an object with a different identity, which seems like such a silly usage that I’d be fine breaking it…

I think it only works in this case:

>>> import dis
>>> dis.dis("a in {1, 2, 3}")
  0           0 RESUME                   0

  1           2 LOAD_NAME                0 (a)
              4 LOAD_CONST               0 (frozenset({1, 2, 3}))
              6 CONTAINS_OP              0
              8 RETURN_VALUE

Defining a constant, or function default doesn’t work.

Does this matter? They shouldn’t be on a hot loop; microoptimizations won’t have any real impact.

I guess for a similar reason as why it doesn’t do tail call optimization.

I don’t know which examples you’re looking at, but there certainly are some places where the compiler WILL optimize it. Code like x in {1,2,3,5,8,13} where the set is only used for an inclusion test will be done with a frozenset literal. (Assuming they’re all constants, of course.) But if it’s NOT being optimized, that might be because the name frozenset is being used. That might have been shadowed - you might be using your own subclass of frozenset, for example. Fundamentally, the compiler can’t make assumptions about names, only syntactic elements (like keywords and literals). I suppose what the compiler might do would be this:

def f():
    # Original code: return frozenset({1,2,3,5,8,13})
    if frozenset is co_consts[1]:
        return co_consts[2]
    return frozenset(BUILD_SET(co_consts[2]))

although of course the optimized version can’t be written in Python source code form. It would be interesting to see whether this would be of value, given that the only way to say “has frozenset been rebound?” would be to include a reference to it in the co_consts tuple, which can itself be changed.

>>> def f():
...     return frozenset({1,2,3,5,8,13})
>>> f.__code__ = f.__code__.replace(co_consts = (None, frozenset({1,4,2,8,5,7})))
>>> f()
frozenset({1, 2, 4, 5, 7, 8})

But that would be shenanigans of a level not worth supporting.

1 Like

AIUI the main reason to not optimize tail calls is the impact on tracebacks.

That makes sense, thanks. Another “potential JIT enhancement”, I guess.

Here’s a GitHub repository with more details: GitHub - nineteendo/frozenset-literals.

I updated the repository with a summary of all previous threads. There are really only 3 viable notations:

  1. f{1, 2, 3}
  2. {{1, 2, 3}}
  3. {1, 2, 3}.freeze()

From what I can tell, notation 2 has the least issues. Apparently some people don’t think double punctuation is Pythonic.

If this were implemented, no one would waste their time looking for frozenset literals.

I updated the repository with the syntax and some more examples. Comprehensions are not supported as frozensets are immutable, but that can always be changed later.

I relaxed the syntax:

assert {{{{1, 2, 3}}}} == frozenset({frozenset({1, 2, 3})})

This still raises a type error to avoid ambiguity:

baz = {{{1, 2, 3}}}  # TypeError: unhashable type: 'set'

Technically and theoretically yes, but not practically. It’s unimaginable that there can be any meaningful existing code at all with such a deliberate usage of a {{...}} literal just to produce a TypeError such that introducing the new syntax would cause any actual backwards incompatibility. Please prove me wrong.

You can also raise a type error with 1 character less: -'': code golf - Shortest way to get a TypeError - Code Golf Stack Exchange

1 Like