`frozenset` and `frozendict` comprehensions

1. Introduction

Potentially complete core container set:

  1. list & tuple
  2. set & frozenset
  3. dict & frozendict

Given the above, these are “missing”:

  1. set - no empty set literal
  2. frozenset - object exists, but nothing else
  3. frozendict - nothing exists, not even the object

To address (1), there are couple of threads:

  1. Initializer for set, like dict {} and list []
  2. Implement {,} as an empty set literal

To address (2), has been one attempt (as far as I know):

  1. Frozenset literals and comprehensions

To address (3), the first step of object creation:

  1. PEP 603: Adding a frozenmap type to collections
  2. I think there have been more attempts, but it is not relevant to this thread.

2. Consideration

(1) above is just for complete picture, but this is mostly about (2) and (3).

My thought process:

  1. So (2) was attempted and proved to be tricky (at least that specific solution).
  2. If frozendict was implemented, then instead of 1 full set of literals/comprehensions, 2 are lacking. And even one seems to be difficult.
  3. Thus, maybe what could work is something similar to “string type prefixes”?

Given set literal exists. E.g. {/}:

{/}    # Empty Set
f{/}   # Empty Frozen Set

{}     # Empty Dict
f{}    # Empty Frozen Dict

{a for a in iterable}           # set
f{a for a in iterable}          # frozenset

{a: b for a, b in iterable}     # dict
f{a: b for a, b in iterable}    # frozendict

Not necessarily f, but the idea is not to make up new brackets, but instead use the same syntax, but with some frozen indicator. Maybe prefix, maybe suffix, maybe something else entirely.

3. To sum up

If frozendict is indeed coming and comprehensions for both frozenset and frozendict are desirable, then solution that would capture comprehensions/literals for them both while drawing sensible/consistent parallels could be convenient/smooth path forward.

Thoughts?

7 Likes

I’m happy with the current form: frozenset({a for a in iterable}). That is clear and reasonably fast.

3 Likes

And “{}” are redundant here, just frozenset(iterable) or frozenset(_.doit() for _ in iterable).

But I agree, there are clear workarounds without need for a new syntax, and this proposal lacks “Motivation“ section for lazy people like me, who (sometimes) don’t read previous discussions.

5 Likes

I like it!

1 Like

It seems what you mean with “literals” differs from the documentation (it’s just strings and some numbers, not even ()).

I am probably wrong with precise terminology.

I just mean “syntax for shorthand empty container and comprehensions”. However it is defined in strict terms.

1 Like

I’m gravitating towards the opposite point of view: That Python has too many comprehension variants already. That all we need is generator expressions passed to constructors.

5 Likes

I really like this idea, I was going to propose adding literally the same thing :slight_smile:

It looks like a such tiny change that will unblock so many useful features. However, there is a downside to this. f{} looks very close to f'{}', which are two completely different things.

It will also require some refactorings of INTRINSIC_BUILD_FROZENSET :thinking:

I would really want to push for this change. Working with frozendict and frozenset right now is not really fun :frowning:

3 Likes

Updated proposal:

frozenset

  • Empty: frozenset()
  • With items: f{1, 2}
  • Comprehension: f{x for x in range(10)}

frozendict

  • Empty: frozendict()
  • With items: f{1: 2, 3: 4}
  • Comprehension: f{x: 1 for x in range(1)}

Main motivation:

  • Simplicty: we want to endorse immutable objects usage, especially for free-threading and subinterpreters. Simple syntax propomotes this
  • Performance: we don’t want to create intermediate objects to create a builtin type. Special syntax allows to optimize this perfectly

In my opinion there are several important questions if we want to start working on this.

  1. What the prefix should be? I chatted with several core devs on Discord, they seem to prefer f as proposed here (my top pick as well). It seems nice! But, we can can have ${} form as the backup plan.
  2. Should we really go for empty object literals? Looks like set(), frozenset() and frozendict() calls can be good enough as the empty objects. I propose limiting the scope of this proposal to literals and comprehensions only. We can later address the empty objects separately in the next proposal. It would be easier to get the approvals for the smallest piece of new syntax :slight_smile:
  3. How it should be implemented technically. Currently, there’s an optimization built on top if INTRINSIC for INTRINSIC_BUILD_FROZENSET. Should we build new opcodes or go for INTRINSIC? Should we plan for any future optimizations beforehand?
  4. What C-API would we need? This is a very hot topic for PyFrozenDict API. Currently it does not have a way to build objects interactively. So, we would first likely need to add PyFrozenDict_FromDictSteal API to make a dict with existing APIs and convert it to frozendict the most efficient way. This is the case because PyDict_* mutation APIs can’t be used with PyFronzenDict. The same is not the case for frozenset, it can use mutating PySet_* APIs.
  5. Analyze potential syntax conflicts in lexer / parser with f prefix for strings / frozen objects (if we decide to go with this prefix)
  6. Provide perf numbers: how much faster we can optimize the workflow with our design, compared to older forms like frozendict({1: 2}) and frozenset(x for x in range(10))
  7. Find other core devs who would like and support this idea :slight_smile:

Everyone’s feedback is welcome! :waving_hand:

10 Likes

Does it need new syntax? AFAIK, a method like {1, 2, 3}.freeze() would be just as optimizable.

3 Likes

{1, 2, 3}.freeze()

You still need to construct a temporary copy before freezing a dictionary in this case. Or we need a PyFrozenDict_FromDictSteal API.

Since you cannot change attributes of a dict or a set, it is already known at compile-time what {1,2,3}.freeze() means. So, turning it into a frozenset is essentially constant folding and could be done either as an AST optimization or during the code generation phase.

4 Likes

I don’t quite like this solution for several reasons:

  1. It will be much more verbose: some(f{1, 2, 3}, f{4, 5}) vs some({1, 2, 3}.freeze(), {4, 5}.freeze())
  2. It will require us to add .freeze to many dict-like and set-like types like UserDict, do we actually need them on these types?
  3. It is harder to read. If a dict literal goes on several lines, you will only learn the type on the very last line, which is counter intuitive. For example:
data = {  # what is the type of `data`? is it `dict` or `frozendict`?
    "first": 1,
    "second": 2,
    "third": 3,
    # we still don't know the type :(
    "one more": ...,
    "last": ...,
}.freeze()  # oh, it is `frozendict`!

So, yeah. Syntax seems like a better solution here :slight_smile:

4 Likes

Has frozen{} been considered? It‘s not as compact as f{}, but on the plus side it literally reads „frozen dict“, and it’s still shorter than the current frozendict({}).

1 Like

I think the f was chosen to mirror the string prefixes (which are single characters as well). It would be also my preferred option if a new syntax is introduced. The word frozen in frozen{1,2,3} looks more like a name, which is misleading since no name lookup happens here.

2 Likes

3 could in theory be helped with by type hints.

However I also dislike the .freeze(), especially as it would have to be special cased by constant-folding or code-gen. Although I think f{} is (too?) close to f-strings, I currently don’t really see an alternative to it. Therefore I’d rather support f{} unless a better alternative is found.

1 Like

That would be only an implementation detail, like all the other constant-folding code that is already in place. And it would not even need to be a special case; the same code could also optimize code like "abc".upper(), which is not optimized at the moment.

Test case
Python 3.13.14
>>> import dis
>>> def foo():
...     return "abc".upper()
...     
>>> dis.dis(foo)
  1           RESUME                   0

  2           LOAD_CONST               1 ('abc')
              LOAD_ATTR                1 (upper + NULL|self)
              CALL                     0
              RETURN_VALUE
>>> def bar():
...     return 2+3
...     
>>> dis.dis(bar)
  1           RESUME                   0

  2           RETURN_CONST             1 (5)

3 could in theory be helped with by type hints.

It would still require you:

  • To add .freeze()
  • To add a typehint

Moreover, there are places where you can’t add a hint. For example in a function call argument.

I think that a new syntax a much more ergonomic and readable. It is also not much harder to implement and support. I have a small draft already. PR with the proposal will soon be available :slight_smile:

4 Likes

One more thing: we can reliably optimize dict literal + .freeze() call: {}.freeze(), however we can only optimize statistically d = {}; call(d.freeze()) if this pattern is used for some reason by the end user. We would have to teach people that it is not the same thing. And there will different runtime optimizations based on whether you use literal form vs variable name. While, it is quite obvious that {1, 2} and f{1, 2} are different: because they use different syntax.

1 Like

I can live with the f{...} syntax but wish that name{...} can be reserved for something more generalized like this.

I would instead propose that we make the unary + operator “harden” (that’s how I’d teach it–+ ~ positivity ~ fortitude ~ hardened ~ frozen) a set or a dict:

{/}    # Empty Set
+{/}   # Empty Frozen Set at compile time
+set()   # Empty Frozen Set at runtime

{}     # Empty Dict
+{}    # Empty Frozen Dict at compile time
+dict() # Empty Frozen Dict at runtime

{a for a in iterable}           # set
+{a for a in iterable}          # frozenset

{a: b for a, b in iterable}     # dict
+{a: b for a, b in iterable}    # frozendict

The usage is concise, does not require a new syntax, and can be easily optimized at compile time.