Add the ability to declare an empty dict as `{:}`

Rosuav · March 29, 2023, 12:15am

To put that “very long time” in perspective, Python used to support this syntax:

try:
    spam()
except Exception, e:
    pass

This change was proposed for Python 3 in PEP 3110, and to permit backward compatibility, Python 2.6 and 2.7 supported both syntaxes. That means that, depending on how you measure, the overlap period was from 2006 until 2020: fourteen years. That’s how a lot of Python 3000 changes were done (Python 2.6 was deliberately produced alongside 3.0).

So when you contemplate these kinds of changes, remember that {} would have to continue to be supported until at least 2040. I hope that’s what you’re planning on when you say “a very long time”.

ucodery · March 29, 2023, 1:35am

That is exactly what I mean by “a very long time”. I would personally be okay if these changes were made but {} was never removed from the language, even never raising a DepricationWarning. It is unambiguous to the language, probably unambiguous to experienced users, and even if we did remove it we could never reintroduce the syntax to mean something different at a later time.

IMO this would be a job for linters, to guide users new and old away from {}.

Rosuav · March 29, 2023, 1:49am

Good. Just making sure people aren’t saying “a very long time” and thinking “a couple of years”.

petersuter · March 29, 2023, 10:26am

Removing {} seems like a non-starter.
It is one of the most used constructs in Python, and as such is very convenient as {}.
{} is also “source compatible” with e.g. JSON.
Adding more redundant syntax goes against the usual principles (“one obvious way”).

Writing set() is already very clear and easy to understand.
So basically I see no benefit and quite large downsides to this idea.

ajoino · March 29, 2023, 10:40am

The upside is that, while set() is obvious it’s also slow (requires a name lookup), and the name builtins.set can be overwritten. I don’t see many downsides with adding a special syntax (brings sets in line with other python builtin containers) or special-casing {*()} in the compiler. I think the main question is if creating an empty set is so common that spending effort implementing and maintaining that syntax is worth it? Maintenance is the only real downside I see.

Rosuav · March 29, 2023, 12:51pm

There aren’t very many downsides, but there’s one that’s really REALLY hard to pin down, and thatt’s how ugly or elegant the proposed syntax is. This is highly subjective and thus not easy to properly debate about, but it is no less important for that; and none of the empty-set notations truly score on elegance, for me.

I do think that special-casing {*()} would be relatively unproblematic, since it’s not a semantic change in any way (we already have special handling for x in {1,2,3} and x in [1,2,3] which use frozenset and tuple literals respectively). This wouldn’t have major consequences to interoperability, since the only impact is performance [1], and then editors could choose to display that using “\N{EMPTY SET}” ∅ to encourage the idiom. This is less benefit than actual syntax, but it’s way WAY less cost, since there’s zero backward compatibility concerns, no ambiguity, and the biggest issues are social (will people adopt this idiom?).

Ultimately, the problem happened back when {a,b,c} was chosen as the syntax for a set (which was itself not fundamentally wrong, since a set can conceptually be perceived as being like a dictionary with no values). There’s no ideal solution. All we can do is look at different options and decide what the costs and benefits are, bearing in mind that “do nothing” is ALWAYS a viable option here.

[1] Side note to side note: There IS some interoperability significance to a performance optimization, and the question was asked about whether string appends with += should be optimized in CPython on the basis that other Python implementations might suddenly underperform if this idiom became standard. In the end, it was decided to optimize where possible, but still strongly recommend str.join() instead, as the optimization can be defeated by having another reference somewhere anyway.

rockyoung · May 10, 2023, 8:14am

“since sets were added long after dicts”, this answer the question already in my head for long time, that is simple “WHY?”, thanks bro!

Gouvernathor · May 10, 2023, 12:43pm

I’m +1 on that.
As for most other users here I’m dubitative about deprecating {} or making it make a set. However I think a non-shadowable syntax for empty sets (or an optimization for the existing one) would be a very good improvement, and allowing (,) too.

ntessore · May 10, 2023, 1:05pm

Coincidentally to this thread popping back up, earlier today I needed to hard-code a number of frozen sets, and had the thought (off-topic as it may be, perhaps someone should split the “empty set” discussion out of here?) that it would be nice if we could have different {} quantifiers like we have different "" quantifiers:

an_empty_set = s{}
a_frozen_set = f{1, 2, 3}

Of course then you have a zen-invoking discussion about {1, 2, 3} being the same as s{1, 2, 3} but it would be nice to write every collection in the standard library using syntax (and without the name lookup).

ucodery · May 10, 2023, 8:33pm

This exact syntax has been proposed before, most recently in Make using immutable datatypes more pleasant by adding a little syntactic sugar.

I don’t disagree that it could be useful for some. And once display prefixes were accepted as a thing, other types could possibly follow (TypedDict, deque…).

PlaceReporter99 · July 23, 2023, 5:49pm

Maybe you can keep it as-is and do {,} for set?

yyq · November 7, 2023, 11:16pm

To maintain backwards compatibility, I propose that we actually keep {} as an empty dict, but ALSO allow it to be an empty set. Once the first operation is performed on it or it is typed, the specific type of {} can be determined.

Python already does this type of evaluation where comparisons can be performed intuitively on different types:

x = 1
x == 1.0  # True

We could allow an empty set to be equal to {}.

set() == {}   # proposed: True

We obviously can change the type of a variable, but specifically after an operation on it:

y = 2  # type(y) == int
y *= 1.0  # type(y) == float

It feels just as intuitive to let {} be a dict until known otherwise:

x = {}  # type(x) == dict
x.add(1)  # proposed: type(x) == set

There are two things I can think of that would not be backwards compatible (interested in hearing any others):

Reliance on {} being a dictionary and specifically breaking when {} is used as a set.

x = {}  # type(x) == dict
if some_check:
  x = set()  # type(x) == set
try:
   x.add(1)
except...  # this would not have the same behavior if the proposed change was added

This feels like it could be bad code that we want to minimize, but interested in hearing if there is a strong reason to maintain this behavior.
2) Type checkers may have difficulty inferring the type or have issues with types of variables changing (which is typically not desired behavior).

x = {}  # inferred type: dict
x.add(1)  # type checker may not like that x is now a set

I think that if the type checker is used properly where x is specifically typed, this becomes a noop.

x: Set[int] = {}  # type(x) is a Set[int]
x.add(1)  # ok

kknechtel · November 8, 2023, 12:37am

Hi Michelle, welcome to the Python Discourse forums. Unfortunately, the idea you propose cannot work, and the way it’s presented here suggests some misconceptions about Python.

No; once we have written {}, we have created a value, and that value has a type.

No, this is fundamentally different - it’s a comparison between differently-typed values.

No; we cannot change the type of a variable, because variables do not have types - values do. We can, of course, assign a value of any type to any variable, at any time - there doesn’t need to be any relationship between the types, and it doesn’t matter how the value was created.

type(x) does not determine the type of the variable x; it determines the type of the value that x names. It cannot determine the type of the variable x for two reasons: first, variables don’t have types, and second, type is a callable which is evaluated at runtime (and thus is passed a value, not a variable). If we do x.add(1), it requires that the value that x names supports the add method. If that value is a dict, then making it work requires dicts to have an add method given to them. Such a method could conceivably return a set, and conceivably ensure that the dict is empty first; but it could not transform the object’s type in-place.

yyq · November 8, 2023, 1:24am

Thanks for the response and clarification of value vs variable. I have two followup questions: Is there a way to have the interpreter rebind the variable for us automatically (i.e. insert an assignment statement if we find that the user meant a set)? And if not, why can we not change the object’s type in-place? For instance, you can rebind methods at runtime, so you could have the empty dictionary detect that it’s empty, and if so replace all of its methods with set’s methods and change its own type to a set.

kknechtel · November 8, 2023, 1:36am

If you mean “retroactively interpret x = {} to do the same thing as x = set() if subsequent code tries to use the value as if it were a set” - which is the only way I can understand it - then I’m sure it would be theoretically possible, but it would be very unlike anything else in Python. It would also have massive, cascading implications for the type system. Not to mention, anyone is allowed to create any other type that defines an add method with any signature and any semantics.

If you want a language that has static typing with type inference - such as Haskell - please use one. Python is not that language. Python’s types are relatively strong, but they are dynamic (checked at runtime). The only exception that can occur at compile time is SyntaxError.

Because dict and set are built-in types, which in the reference implementation are created directly in C. The underlying data structures have a specific layout, and thus specific semantics for the raw bytes that are part of the object; simply declaring that one such object is now of the other type cannot possibly work, because the bytes are in all the wrong places, with the wrong values, to represent the desired object state. An empty dict and an empty set, seen from an outside-Python perspective as raw bytes in a computer’s memory, look completely different.

Because user-defined types (such as those created with the class statement) have a common structure, and because the reference implementation does not specifically forbid it, it’s possible to change the type of a user-defined object from one user-defined class to another, by setting the __class__ attribute (sometimes called “swizzling”):

>>> class Gun:
...     def fire(self): return 'bang'
... 
>>> class Employer:
...     def fire(self): return 'get out'
... 
>>> x = Employer()
>>> x.__class__ = Gun
>>> x.fire()
'bang'

However, in normal cases this is nearly useless and quite dangerous - because even though the object’s state (the attributes and their values) is valid (trying to use the object won’t result in following dangling pointers or other sorts of corruption like that), it won’t generally be meaningful (attributes may be missing, resulting in AttributeError, or wrongly set, resulting in TypeError or ValueError, etc. etc., when the method from the new class is tried).

FelixFourcolor · November 14, 2023, 9:11pm

I agree that @yyq’s idea is problematic, but the idea of reinterpreting what value is bound to the variable (an empty set or an empty dict) after seeing how it’s used sounds pretty cool.

jamestwebber · November 14, 2023, 9:14pm

It’s pretty cool and can totally work… in a compiled language, which knows how things will be used before anything is executed. In an interpreted language it’s probably gonna be a mess.

kknechtel · November 14, 2023, 9:53pm

As a nitpick, Python absolutely is “compiled” just as much as Java or C# is. It just also supports a REPL that separately compiles and immediately executes individual statements, and supports executing its bytecode within custom “environments” of global variables. Practically nothing is “interpreted” in the classical sense nowadays - maybe shell scripts are, I’m not sure.

Just like how pass-by-reference almost never happens by default and must be explicitly requested (such as with the ref, in and out keywords in C#)^[1] by almost every language that supports it^[2], and dynamic (rather than lexical) scoping is only seen in niches (again I think shell scripting languages qualify).

The issue for implementing something like this in Python is dynamic typing, i.e., types not being associated at compile-time with identifier names.

C++'s reference types are a curious case, because a name with such a type doesn’t correspond to an object in the program’s storage. In function calls, the & can be seen as a marker for a different calling convention - even though the same syntax is used for local variable declarations. ↩︎
Python names, C# class types, and Java non-primitive types have reference semantics, but this is orthogonal to the calling convention. In a language with true pass-by-reference, it is possible to write a function that swaps the values of two of the caller’s variables. Python can’t do that; at best, if the values have the same type, it can swap their internal states. ↩︎