Using math symbols for operators

pauljurczak · March 29, 2022, 8:29pm

Haskell, Julia and Wolfram allow using familiar and standard mathematical symbols for operators, e.g. × for cross product. Is there any ongoing work to provide this option for Python in the future? I’m assuming it’s not possible today.

brettcannon · March 29, 2022, 10:01pm

Nope, no work being done to add support for this

steven.daprano · March 29, 2022, 11:20pm

Coconut is a transpiler from a superset of Python to ordinary Python, it allows a small subset of Unicode maths symbols:

https://coconut.readthedocs.io/en/v1.1.0/DOCS.html#unicode-alternatives

As far as pure Python goes, the big problem with allow non-ASCII symbols is that there is no good, cross-platform way of entering the symbols, and for many of them, the symbols may not be available in common fonts.

So there is a strong reluctance to require non-ASCII symbols as part of the language definition.

Another solution which some people use is to have their editor (Emacs, I think) to automatically convert the maths symbols into something that Python understands, while still displaying it as the fancy symbol.

If you search the archives for the Python-Ideas mailing list, you will find previous discussions on this, in particular I think it is David Mertz or maybe Christopher Barker who uses that Emacs(?) trick.

pauljurczak · March 30, 2022, 2:36am

I understand that it’s not a smooth sailing, but other languages do it successfully. Attachment to ASCII and 80 column terminal in the 2020s always puzzled me. I started programming with all-caps ASCII and 80 column punch cards, but it was 1980. The technology advanced a long way since then.

Also, AFAIK, Unicode is already allowed in Python identifiers, so it’s not that far off from having it for operators.

methane · March 30, 2022, 4:25am

There is another big problem.
There are many characters looks very similar. It makes code review hard.

ー
‐
‑
–
—
―
−
ｰ

This is not full list of characters looks like -.

pauljurczak · March 30, 2022, 5:57am

This is a problem not unique to Unicode. With plain ASCII you have I and 1 similarity. There is potential for abuse with any design. Coding standards and style will take care of it.

My motivation is to make Python more natural to use by scientists, who are accustomed to mathematical notation. Python use is quite high in this domain.

methane · March 30, 2022, 6:02am

Yes, this is not unique to Unicode. But using unicode makes this issue bigger.

There are coding fonts to distinguish similar characters in ASCII clearly.
But most coding fonts can not resolve similar characters in Unicode.

pauljurczak · March 30, 2022, 6:12am

Currently you can write monstrosities like this one:

def OΟO(n):
    return n+1

which is a bit less murky with syntax highlighting:

def OΟO(n):
    return n+1

Again, any language design can be abused. Hummer can be used both for building houses and cracking skulls…

pauljurczak · March 30, 2022, 6:19am

If a restrictive approach has to be followed, allowing only a (somewhat) carefully selected set of mathematical symbols would solve most real life use cases.

steven.daprano · March 30, 2022, 9:42am

Paul suggested:

“My motivation is to make Python more natural to use by scientists, who are accustomed to mathematical notation. Python use is quite high in this domain.”

You should see what people in those scientific communities think, e.g.
start with people in the numpy, scipy, pandas, Jupyter etc communities.

If you find that there is a strong desire in the scientific Python community to use Unicode symbols, then it might be worth writing a PEP (Python Enhancement Proposal). If they are luke-warm on the idea, then forget about it.

CAM-Gerlach · April 1, 2022, 3:35am

For what its worth, as a scientist, I’d find it very irksome to have to deal with all kinds of special characters for basic operators, and debug programs where colleagues (who typically have much more limited programming experience than the average software developer) used the wrong one somewhere. It would be a nightmare. In LaTeX, where we typically use mathematical symbols the most, we don’t use literal symbols to represent complex equations, but rather special commands, which avoid ambiguity and allow all kinds of fancy, intelligent formatting on the output. I’d wager that’s how most scientists are used to entering symbols anyway, rather than literal x etc.

While it has valid use cases for internatioanlization, allowing nearly the full repertoire of Unicode in identifier names already can cause an array of potential issues in Python code. And @methane is one of the core developers who spends some of the most time working on (and sees the most value in) full support for Unicode and UTF-8, so if he says its not a good idea, I’d be inclined to agree.

pauljurczak · April 1, 2022, 4:32am

I would definitely prefer that method over entering a raw Unicode. Julia does it in a similar fashion. From their documentation:

In the Julia REPL and several other Julia editing environments, you can type many Unicode math symbols by typing the backslashed LaTeX symbol name followed by tab. For example, the variable name δ can be entered by typing \delta-tab

abersheeran · April 1, 2022, 9:37am

You can fork GitHub - abersheeran/mingshe: A better Python. It is also a template for you to create a superset of Python. and change a little code in gram file to do this. It’s not that hard, but wouldn’t this complex symbolic input cause bigger problems?

pauljurczak · April 1, 2022, 6:19pm

I don’t know. I’m a new user of Python. That’s why I posted here, trying to find out practical constraints of implementing this feature.

steven.daprano · April 1, 2022, 7:04pm

What other languages are you using where your colleagues can use non-ASCII characters for maths symbols? And what do they get wrong?

My experience with non-ASCII maths symbols is a positive one. Back in the 1980s, Apple invented Hypercard. What we would describe it as today would be a combination of GUI application builder (based on a metaphor of the rolodex) and hypertext system.

Much to Apple’s surprise, and I suspect horror, it became their most popular (free) application, creating a huge ecology of amateurs writing and swapping Hypercard stacks.

Hypercard itself was based on a scripting language, Hyperscript, and it allowed non-ASCII maths symbols. This was pre-Unicode, so it was based on Apple’s then “extended ASCII” 8 bit character set, which today we call the “MacRoman” encoding. By memory, the characters used included:

subtraction: − as an alias for -
multiplication: × as an alias for *
division: ÷ as an alias for /
not equal: ≠
less (greater) than or equal: ≤ and ≥ as aliases for <= and >=

This worked great, and nobody had any problem recognising what the symbols meant or how to enter them.

The Mac had an advantage: those characters were standard in all the Apple fonts, and there was a standard keyboard and GUI method for entering them. I think I might even remember some of them!

Option-/ for ÷, option-shift-8 for ×, option-= for ≠

and if all else failed, there was the Keycaps desk accessory which not only showed you a GUI reproduction of the keyboard and all typeable characters, but could be used to enter those characters.

It still never fails to astonish me that in some ways, things which I could trivially do in 1988 require a major effort in 2022.

“rather than literal x etc.”

That’s just a plain ol’ ASCII x. Surely even scientists know how to type x on their keyboard?

pauljurczak · April 1, 2022, 7:18pm

I used Julia for a while, where math symbols work very well (see my post above). It could be a model for Python, but their advantage was language and tooling design with this feature present right from the start.

Again, I’m just a user without in-depth knowledge of how difficult implementing this feature would be.

CAM-Gerlach · April 1, 2022, 8:21pm

None, but several caveats were mentioned above by @methane and others, and as mentioned, PEP 672 describes a number of similar issues with Unicode identifiers.

Sure, but that was back in the days of 8-bit extended ASCII character sets like MacRoman. Nowadays with Unicode, as Inada-san mentioned, there are numerous lookalikes for each character, which greatly exacerbates the problem. And this would

Yeah, I miss them too from my younger days growing up on a Mac. The option codes were both easier and faster to type and easier to remember, since they usually logically corresponded to letters on the keyboard—Option-8 (*) was •, Option-hyphen was em dash (and with shift it was en dash, or the other way around), Option-c was ©, etc. Unfortunately, on most other machines, its done using far more cryptic Alt codes, of which I only have a common few memorized, e.g. Alt-7 is •, Alt-247 is ≈, Alt-0151 is em dash, Alt-0150 is en dash, Alt-0169 is ©, Alt-0153 is ™, etc. Its enough of a hassle that I couldn’t see myself using this feature, all for a bit prettier output.

Sure I do, its \times It just doesn’t work here, since Discourse ≠ LaTeX. And I rarely use it, since typically multiplication is indicated by a space, unless you want to mean a matrix product, vector cross product or some other special case. I never use that one and I was too lazy to look up the alt code, sorry.

fungi · April 1, 2022, 10:07pm

Yeah, I miss them too from my younger days growing up on a Mac.
The option codes were both easier and faster to type and easier to
remember, since they usually logically corresponded to letters on
the keyboard—Option-8 (*) was •, Option-hyphen was em dash (and
with shift it was en dash, or the other way around), Option-c was
©, etc. Unfortunately, on most other machines, its done using far
more cryptic Alt codes, of which I only have a common few
memorized, e.g. Alt-7 is •, Alt-247 is ≈, Alt-0151 is em dash,
Alt-0150 is en dash, Alt-0169 is ©, Alt-0153 is ™, etc. Its enough
of a hassle that I couldn’t see myself using this feature, all for
a bit prettier output.

That must be a Microsoft problem. As a Linux/Xorg user with the
useless “Windows Key” remapped to compose, those are all trivially
enterable with sequences like “compose x x” and “compose > =” and so
on.

PeterL · April 4, 2022, 8:28am

An interesting couple of notes:

Some programmer fonts, such as Monoid have mathematically-looking ligatures. There are issues in using them in many editors, though. Monoid
IPython display in a Jupyter Notebook allows rendering of generated LaTeX strings.
The interesting module sympy allows symbolic expression to LaTeX to a value.

henryiii · April 4, 2022, 4:27pm

FWIW, I implemented this a few years ago using IPython Uncertainty extension for IPython - ISciNumPy.dev for the ± symbol. You could implement other symbols for interactive work similarly.