Frozen set literals

alicederyn · May 16, 2024, 7:44pm

Sixteen years ago to reserve space for a hypothetical language extension that never happened. It’s possible that this might not be binding.

alicederyn · May 16, 2024, 9:38pm

I’d much rather have the compiler spot and optimise the existing pattern. Is there any reason that it couldn’t do that? The only thing I can think of is a theoretical piece of code that is relying on each pass returning an object with a different identity, which seems like such a silly usage that I’d be fine breaking it…

Nineteendo · May 16, 2024, 9:44pm

I think it only works in this case:

>>> import dis
>>> dis.dis("a in {1, 2, 3}")
  0           0 RESUME                   0

  1           2 LOAD_NAME                0 (a)
              4 LOAD_CONST               0 (frozenset({1, 2, 3}))
              6 CONTAINS_OP              0
              8 RETURN_VALUE

Defining a constant, or function default doesn’t work.

alicederyn · May 16, 2024, 10:07pm

Does this matter? They shouldn’t be on a hot loop; microoptimizations won’t have any real impact.

kknechtel · May 17, 2024, 12:02am

I guess for a similar reason as why it doesn’t do tail call optimization.

Rosuav · May 17, 2024, 1:30am

I don’t know which examples you’re looking at, but there certainly are some places where the compiler WILL optimize it. Code like x in {1,2,3,5,8,13} where the set is only used for an inclusion test will be done with a frozenset literal. (Assuming they’re all constants, of course.) But if it’s NOT being optimized, that might be because the name frozenset is being used. That might have been shadowed - you might be using your own subclass of frozenset, for example. Fundamentally, the compiler can’t make assumptions about names, only syntactic elements (like keywords and literals). I suppose what the compiler might do would be this:

def f():
    # Original code: return frozenset({1,2,3,5,8,13})
    if frozenset is co_consts[1]:
        return co_consts[2]
    return frozenset(BUILD_SET(co_consts[2]))

although of course the optimized version can’t be written in Python source code form. It would be interesting to see whether this would be of value, given that the only way to say “has frozenset been rebound?” would be to include a reference to it in the co_consts tuple, which can itself be changed.

>>> def f():
...     return frozenset({1,2,3,5,8,13})
... 
>>> f.__code__ = f.__code__.replace(co_consts = (None, frozenset({1,4,2,8,5,7})))
>>> f()
frozenset({1, 2, 4, 5, 7, 8})

But that would be shenanigans of a level not worth supporting.

Rosuav · May 17, 2024, 1:30am

AIUI the main reason to not optimize tail calls is the impact on tracebacks.

alicederyn · May 17, 2024, 4:51am

That makes sense, thanks. Another “potential JIT enhancement”, I guess.

Nineteendo · May 17, 2024, 8:33am

Here’s a GitHub repository with more details: GitHub - nineteendo/frozenset-literals.

Nineteendo · May 17, 2024, 2:39pm

I updated the repository with a summary of all previous threads. There are really only 3 viable notations:

f{1, 2, 3}
{{1, 2, 3}}
{1, 2, 3}.freeze()

From what I can tell, notation 2 has the least issues. Apparently some people don’t think double punctuation is Pythonic.

If this were implemented, no one would waste their time looking for frozenset literals.

Nineteendo · May 18, 2024, 7:27am

I updated the repository with the syntax and some more examples. Comprehensions are not supported as frozensets are immutable, but that can always be changed later.

Nineteendo · May 18, 2024, 12:49pm

I relaxed the syntax:

assert {{{{1, 2, 3}}}} == frozenset({frozenset({1, 2, 3})})

This still raises a type error to avoid ambiguity:

baz = {{{1, 2, 3}}}  # TypeError: unhashable type: 'set'

blhsing · May 21, 2024, 9:31am

Technically and theoretically yes, but not practically. It’s unimaginable that there can be any meaningful existing code at all with such a deliberate usage of a {{...}} literal just to produce a TypeError such that introducing the new syntax would cause any actual backwards incompatibility. Please prove me wrong.

Nineteendo · May 21, 2024, 9:42am

You can also raise a type error with 1 character less: -'': code golf - Shortest way to get a TypeError - Code Golf Stack Exchange