Initializer for set, like dict {} and list []

ajoino · December 5, 2024, 5:03pm

Could you elaborate on this? Since a single backslash is not a valid expression I don’t see how 1/2 being an expression matters, but I’m sure there’s something I don’t understand.

elis.byberi · December 5, 2024, 5:40pm

{/} would make / an empty sentinel. The same applies to : in the proposed empty dict {:}.

If we were to introduce an Empty sentinel singleton, similar to the None singleton, we would face the same problem of not distinguishing an empty set from an empty dictionary. The use of {Empty} would still be ambiguous.

On the other hand, having multiple empty sentinel values does not make sense.

mikeshardmind · December 5, 2024, 5:48pm

It would be possible to write grammar that treats {/} as an empty set literal, without putting a meaning on / to be a generic empty sentinel. That interpretation is much closer to what people have been discussing.

ajoino · December 5, 2024, 5:49pm

My undertsanding of the parser and compiler is somewhat limited (I have written a simple PEG parser but nothing as complicated as CPython’s) so I might be way out of my depth here, but my point was not to make e single slash a sentinel, but to rather special-case the entire sequence of characters (and their corresponding tokens) into an AST node that can be used to emit the cheaper set consteuction bytecode.

@mikeshardmind posted what I wanted to say but more elegantly

elis.byberi · December 5, 2024, 6:14pm

There’s no doubt about that!

My focus is on the already ambiguous {} syntax. I believe adding another meaning won’t make it any less ambiguous. Also, it’s not that you could confuse an empty dictionary with an empty set; at most, you would get an exception.

d = {}
s = {*()}
# e = {/}

s[1] = 1  # TypeError: 'set' object does not support item assignment
a = s.get(1)  # AttributeError: 'set' object has no attribute 'get'
del s[1]  # TypeError: 'set' object doesn't support item deletion

Lucas_Malor · December 5, 2024, 8:40pm

Maybe {/} is not that bad for an empty set, but it can’t be used for a set comprehension.

zhangyx · December 5, 2024, 8:45pm

Double curly braces {{}} might be a viable syntax candidate.

Currently it is interpreted as an empty dict inside an empty set. This will cause TypeError because dict object is not hashable.

>>> {{}}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict

elis.byberi · December 5, 2024, 8:45pm

Regarding interpretation, how should we interpret this? {/} looks like a set literal, but it’s not—it represents an empty set symbol, similar to the symbol ∅. From my perspective, it is either a set literal with an empty sentinel / or an empty set symbol. If it is an empty set symbol, then there should be a better way to represent it.

Currently, in mathematics, we can represent an empty set using the symbols { } and ∅. If we put something inside curly braces, it no longer looks empty to me.

mikeshardmind · December 5, 2024, 8:49pm

For the purpose of discussing the idea, we should interpret it as discussed here, as the way in python to write an empty set literal.(currently, not something that exists, the closest there is is a set comprehension unpacking an empty tuple) If we define that {/} is an empty set literal in python then that is what it is.

It doesn’t actually matter if it closely matches how other things represent it so long as once people learn it, it’s easy to remember what it is.

ajoino · December 5, 2024, 9:28pm

Is that really a problem, all set comprehensions must at minimum contain {elem for elem in iterable}. That’ll be empty if iterable is empty. Is there something I’m not understanding?

Nineteendo · December 5, 2024, 9:52pm

Yes, but using this for frozendict and frozenset literals would be more useful.^[1]
Alongside {[]}, it’s the only combination of braces that’s still available.

I have a reference implementation for frozen sets ↩︎

blhsing · December 6, 2024, 1:17am

One detail to consider if we are to really move forward with the {/} idea is what the representation of an empty set should become. It’s currently 'set()', but should it become '{/}'?

Ideally we should change it to '{/}' so it would be more consistent with the representations of empty containers of the other built-in container types (e.g. repr([]) returns '[]' rather than 'list()'), but then it will break a lot of existing code, particularly tests with expected output of an empty set.

saaketp · December 6, 2024, 1:37am

It will break some code, but I doubt a lot of code would be comparing repr of set instead of directly comparing the value (s ==set()) or len or bool.

blhsing · December 6, 2024, 1:41am

It will break a lot of doctests for sure, where representations of objects are used as expected output.

Rosuav · December 6, 2024, 2:19am

That’s a fair point, but IMO not enough to block the change. If we never wanted ANY output of ANY program to change, we’d never be able to improve anything. I don’t have ancient Pythons to test on, but my reading of the relevant documents suggests that the {1,2,3} syntax wasn’t originally part of sets, and therefore the repr would have changed when that came in.

blhsing · December 6, 2024, 2:33am

Yeah, but Python isn’t in ancient times anymore. There is a lot more Python code out there that may be potentially affected with any breaking change so we should weigh the benefit vs cost more carefully.

I think the main point of introducing a {/} literal is to make writing an empty set easier and to produce leaner bytecode. Aligning the representation of an empty set to the new literal sounds nice but is not a must, especially when you consider that we would have to maintain two versions of doctests enclosed in conditional statements of if sys.version_info < ...: until the current Python versions are no longer supported in a rather distant future. This applies to even new code we write.

For example, the following doctest:

def f():
    '''The set constructor should return an empty set when given an empty list.

    >>> set([])
    set()
    '''

if __name__ == "__main__":
    import doctest
    doctest.testmod()

would then have to be refactored into:

import sys

if sys.version_info < (3, 15):
    def f():
        '''The set constructor should return an empty set when given an empty list.

        >>> set([])
        set()
        '''
else:
    def f():
        '''The set constructor should return an empty set when given an empty list.

        >>> set([])
        {/}
        '''

if __name__ == "__main__":
    import doctest
    doctest.testmod()

And that looks like a maintenance nightmare for a larger test.

effigies · December 6, 2024, 3:59am

I really don’t think most people would decide that the best way to handle a repr change breaking their doctest is to duplicate their function definitions. In many projects, doctests are run against the latest supported Python to make sure that the docs aren’t wrong, not to test that the results are correct across Python versions, so the result would just be update when you make the switch to a version that uses the new repr.

If you do run doctests across a spread of dependencies, you’ll generally decide whether to skip or to do the standard trick.

>>> some_expr == expected_result
True

Also, doctests are not set in stone. Your test runner can do some interesting things. sphinx.ext.doctest has :skipif: and :pyversion: directives to help deal with cases where only certain Python versions produce the desired outputs. pytest-doctestplus has .. doctest-requires: to handle missing dependencies, and a FLOAT_CMP directive that evaluates the expected text and compares it to the expression result.

I think we could probably extend some of these ideas to make the transition easier for people, if this change was adopted.

jamestwebber · December 6, 2024, 4:02am

Honestly I’d be surprised if there are a ton of doctests that are testing for an empty set as the return value.

Rosuav · December 6, 2024, 4:04am

Maybe, but I can imagine plenty of non-trivial functions that could return sets, and would need to return empty sets in certain circumstances. For example, “gather all user IDs matching these criteria” or “find the commonality between X and Y” (which would, as its final step, do a set intersection) etc.

But those sorts of tests are more likely to be proper unit tests, not doctests.

jamestwebber · December 6, 2024, 4:13am

They do exist, but a GitHub search for “/>>> .+\nset\(\)/ language:Python” gives me a shockingly low number of hits (61 files!). It’s possible the search is failing on the regex though.