Unicode Symbols in identifiers (again) to support readable logical assertions

Searching the archives, I am not optimistic about bringing this up, given past discussions, but it would be
Really Good to be able to use Unicode symbols, particularly the Mathematical Operators block

for Python function names. One can then define useful functions like:

def ∀(lst):
     if lst:
        return functools.reduce((lambda x, y: x and y), lst)
    else:
        return True  # vacuously

def ∃(lst):
     if lst:
        return functools.reduce((lambda x, y: x or  y), lst)
    else:
        return False  # vacuously

which let you write things like

assert( ∀([max >= n for n in a) and ∃( n == max for n in a) )

and other logical assertions that one would use in Science-of-Programming style coding
in the sort of logical notation that has been used in symbolic logic for centuries.

Other useful operator-functions can be similarly defined for ∏, ∑, etc. to make readable, terse
assertions.

As for the “but there are 4 different representations of Sigma in Unicode” inevitable response, there is only one
defined as the mathematical operator as opposed to the assorted letters, so using the mathematical operator symbol
to refer to the mathematical operation is LESS ambiguous than using one of the letter page characters.

I also think that the fact that these discussions of allowing the traditional mathematical operators for things
to be used as the Python notation for them keep coming up indicates that this is something many users would
find intuitive.

I’m a mathematician too, but aren’t your and already provided by the built-in functions all and any?

These particular functions may exist already, but the request I think is to allow a wider range of Unicode characters to appear in Python identifiers, so that those who want to can create aliases for them, or new functions, aligned to mathematical notations (and not just for humorous effect as in this memorable PyConUK Lightning talk).

The difficulty may be that the selection of characters could debated for a long time. Also, it creates exceptional cases in the lexer, where now one need only examine Unicode properties to decide what starts or continues an identifier.

I’m just anticipating the problems so I can say that in spite of those, it still seems like a useful feature, speaking as a mathematically-inclined engineer.

That’s fair enough. So a lot of non-ascii unicode code points are supported already in variable names. E.g. I quite like to define:

Φ = set()

But OP’s point is not about supporting all unicode, it’s about supporting the special math code points, that are currently excluded.

A lot of thought has gone into deciding which code points are supported in variable names.

I assume math symbols were excluded, to define what’s allowed in terms of Unicode blocks (without a huge list of exceptions to the rule), and because a some math symbols, in some fonts, are visually similar to ascii symbols for standard Python operators (+, -, *, /, ^, %, =, ==, <, >, <=, >=), so could easily lead to confusion, or even malicious code passing review by humans.

Some people in cosmology like to do that, too. This can end badly because of misunderstandings about normalisation of the unicode names. One that comes to mind is this:

>>> ϕ = 0.1  # phi for one thing
>>> φ = 0.2  # varphi for another
>>> ϕ  # oops
0.2
2 Likes

Also, ∀ and ∃ are quantifiers, not variables, so I’d find their use as function names jarring if I were reading such code.

1 Like

You are already using those symbols in quite a different way than they would be used in mathematical notation.

(∀n ∈ a . n ≥ max) ∧ (∃n∈ a . n = max)

Is ∑(x**2 for x in a) really any better than sum(x**2 for x in a). The symbol is just one part of the usual notation, the rest of which cannot be translated into Python.

I’m not against the use of Unicode in Python syntax, but one of the main objections to it has always been the ability to type them. Should we ever reach a point where that problem is solved satisfactorily, I think introducing new operators (and not just Unicode aliases for existing ASCII operators) would be more beneficial than defining affected function names.

2 Likes

The unicode standard has a suggestion for how “identifiers” should look, and AFAIK, python mostly follows this (plus as an extra allowing _). Adding the symbols you have listed above would have quite a few weird effects:

  • vs

are just a few characters that jumped into my eye. None of these normalized to their visually-equivalent counterpart from what I can tell, which contrasts to most normal letters in the planes that are currently allowed for identifiers.

Also, IMO many of these would make good operators, not identifies. So a custom operators proposal could make better use of these symbols.

Operators are beyond the OPs intention. Allowing certain symbols as identifiers seems to serve a purpose as illustrated. However, the point is well made that this doesn’t turn Python into a declarative expression language because the rest of what you need to write is still Python syntax.

Other suggestions for creating a DSL over Python share this problem.

“Domain-Specific Language” properly used, should mean a distinct language specific to a domain, but too often now means a domain-specific vocabulary and operator definitions used within an existing language. That doesn’t change the grammar, which may not be the ideal one for the domain.