Add complex literals?

skirpichev · September 10, 2023, 7:38am

Currently, we have only unsigned imaginary literals with the following semantics:

±a±bj = complex(±float(a), 0.0) ± complex(0.0, float(b))

While this behaviour is well documented, most users would expect instead here:

±a±bj = complex(±float(a), ±float(b))

i.e. that it follows to the rectangular notation (e.g. Complex number - Wikipedia) a+bi (or a+bj) for complex numbers. I think it’s a POLA violation in the Python language. Things are little worse, because in the language itself there is a some “brain split”: in the repr() output we instead follow to the rectangular notation.

Here few examples

signed zero in the real part

   >>> complex(-0.0, 1.0)  # (note funny signed integer zero)
   (-0+1j)
   >>> -0+1j
   1j
   >> -(0.0-1j)  # "correct" representation with Python numeric literals
   (-0+1j)
   >>> -(0-1j)  # also "correct"
   (-0+1j)

signed zero in the imaginary part

   >>> complex(1.0, -0.0)
   (1-0j)
   >>> 1-0j
   (1+0j)
   >>> -(-1 + 0j)  # "correct"
   (1-0j)

Apparently, the complex.__repr__() uses a different meaning for the j symbol. It’s not the same as 1j literal. And also we have another (related) problem: the eval(repr(x)) == x invariant is broken for the complex type. Quoting from the docs:

For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval(); otherwise, the representation is a string enclosed in angle brackets

But (-0+1j) is not an object with the same value as complex(-0.0, 1.0). Neither complex(1.0, -0.0) and 1-0j have same value.

Yet another instance of this is in the sphinx docs for complex class and in its docstring as well:

class complex(real=0, imag=0)
…
Return a complex number with the value real + imag*1j or …

Simple counterexamples

>>> complex(-0.0, -0.0)
(-0-0j)
>>> -0.0 + (-0.0)*1j
(-0+0j)
>>> complex(-0.0, 0.0)
(-0+0j)
>>> -0.0 + 0.0*1j
0j

Again - here our docs live with a wrong assumption, that we have complex literals and real + imag*1j is a representation of the complex number in the rectangular form.

On a first sight, this is a very minor issue. Clearly, it affects only “corner cases” - when either real or imaginary part of the complex number is -0.0 (signed zero). On another hand, it’s a limitation, that bite us already in the stdlib docs, see the note about branch cuts: we are forced to use here a verbose complex(-2.0, -0.0)-like constructions, instead of using literals (like -2-0j, that we could expect in mathematical texts). It’s not because we can’t express same number with the current imaginary literals. But would be an expression like -(-2+0j) transparent to readers? Or -(-0.0 - 0j), where using floats in the real part is required? These “corner cases” are common in fact, because we want to talk about behaviour of functions on branch cuts, and not surprisingly there is a long (not exhaustive) list of recurring issues:

Maybe we can do better?

Solution

Lets use complex literals (like Scheme, since r3rs) instead, i.e.

bj = complex(0.0, b)
±a±bj = complex(±a, ±b)

where a (nonzero) and b are floating point literals (or a decimal integer literal for b).

While this will make tokenization more complex, with the above change we could fix the eval(repr) issue without changing the repr output at all (well, except maybe in the case of a signed zero real component) or arithmetics for mixed operands.

And this replacement for the imaginary literal will match the common mathematical notation. I believe this is most transparent solution for our end users of the complex type (i.e. doing math). No changes on their side, unless they are using funny notation -(-0.0 - 0j) to represent the “corner case” complex(0.0, -0.0).

Edit: More detailed formalization of the above proposal, based on the discussion. With some code.

Perhaps, it would be cleaner if I emphasize that the proposal is restricted to Add/Sub’s (BinOp) with special arguments (second is an imaginary literal and the first is ±int or ±float literal. (We could also discuss if we can redefine also unary Sub of an imaginary literal.) For n-ary ± we should keep current evaluation rules, i.e. a±b±c±d=(((a±b)±c)±d). If you want to place a complex literal somewhere between - use parentheses! After all, maybe they are for purpose in the complex.__repr__() output?

Here is an example of the AST transformation that does above.

from ast import *
from ideas import import_hook

class ComplexLiteralTransform(NodeTransformer):
    def visit_BinOp(self, node):
        match node:
            case BinOp(Constant(x), Add(), Constant(complex(imag=y))):
                match x:
                    case int(x) | float(x):
                        x, y = map(Constant, [float(x), y])
                        return Call(Name('complex'), [x, y], [])
            case BinOp(Constant(x), Sub(), Constant(complex(imag=y))):
                match x:
                    case int(x) | float(x):
                        x, y = map(Constant, [float(x), y])
                        return Call(Name('complex'), [x, UnaryOp(USub(), y)], [])
            case BinOp(UnaryOp(USub(), Constant(x)), Add(), Constant(complex(imag=y))):
                match x:
                    case int(x) | float(x):
                        x, y = map(Constant, [float(x), y])
                        return Call(Name('complex'), [UnaryOp(USub(), x), y], [])
            case BinOp(UnaryOp(USub(), Constant(x)), Sub(), Constant(complex(imag=y))):
                match x:
                    case int(x) | float(x):
                        x, y = map(Constant, [float(x), y])
                        return Call(Name('complex'), [UnaryOp(USub(), x), UnaryOp(USub(), y)], [])
        return self.generic_visit(node)

    def visit_UnaryOp(self, node):
        match node:
            case UnaryOp(USub(), Constant(complex(imag=x))):
                return Call(Name('complex'), [Constant(0.0), UnaryOp(USub(), Constant(x))], [])
        return self.generic_visit(node)

def transform_cl(tree, **kwargs):
    tree_or_node = ComplexLiteralTransform().visit(tree)
    fix_missing_locations(tree_or_node)
    return tree_or_node

def add_hook(**kwargs):
    return import_hook.create_hook(hook_name=__name__,
                                   transform_ast=transform_cl)

Alternative C version (a draft, no error checks, etc): GitHub - skirpichev/cpython at complex-literals-with-usub.

With André Roberge’s https://github.com/aroberge/ideas:

$ python -q -m ideas -a cl-transform
Ideas Console version 0.1.5. [Python version: 3.12.0rc1+]
ideas> 1-0j
(1-0j)
ideas> 1+0j
(1+0j)
ideas> -0j
-0j

In fact, I think we can consider (±a±bj) to be the true complex literal. Whereas a feature that we can omit parentheses sometimes (e.g. for simple assignment like x=1+2j) - a syntactic sugar.

Alternative

We also could solve the problem, using additional complex subtype (see this), the imaginary class (like does e.g. the C11 standard, annex G).
There will be new special rules for mixed arithmetics (see section 5 of the annex G for details), e.g:

float + imaginary = complex(float.real, imaginary.imag)

New rules, however, alter only cases where mixed operands will have nans, infinities or signed zeros in their components.

No new literal types, no changes in parsing of source code or altering the complex.__repr__() (just as in the above solution), but a “little” new thing:

>>> type(3.14j)
<class 'imaginary'>

On another hand, as it was mentioned by Serhiy Storchaka and Mark Dickinson in the issue #84450, the new type could solve other “gotchas”. For example, currently in Python:

>>> complex(0, math.inf) * 1
(nan+infj)

will be

>>> complex(0, math.inf) * 1
infj

because multiplication of a complex to a real (or to an imaginary number) will be componentwise. For same reasons, ±1j will be a correct rotation in the complex plane (multiplying any complex number z, not just finite, by 1j 4 times exactly recovers z).

Edit: a variant of above is a special treatment in arithmetic ops for ``complex(0, imag)`` instances without introduction of a new type.

Draft implementation: GitHub - skirpichev/cpython at complex-annex-g

No changes in string parsing, only minor changes in the repr output (signed zero used for real component, if needed). The cost is some unusual rules for complex arithmetics in corner cases (signed zeros, infinities, nans in components), e.g. `complex(-0.0, 1) + complex(0, 1) == complex(-0.0, 2).

Other

Finally, I would also mention attempts to solve only the eval(repr) issue for the complex type.

First, we could use the “verbose” form in the repr() output like complex(real, imag) (obviously, this was too verbose for Guido). A variant of: using this form of the repr() format only for complex numbers with signed zeros in components.

Alternatively we could use “hackish” form like -(-2+0j) for our “corner cases”, like did Serhiy Storchaka in the pr #19593.

Both solutions make the repr() output even less uniform than now (currently we sometimes omit parens).

mdickinson · September 10, 2023, 9:13am

@skirpichev Thank you for bringing this discussion to python-ideas!

Under your proposal, I assume that 1.0 - 0j would be interpreted as complex(1.0, -0.0) (rather than complex(1.0, 0.0) as it is now). That’s all well and good, but how would each of the following be interpreted under your proposal?

(1.0) - 0j
+1.0 - 0j
0.0 + 1.0 - 0j
float(1) - 0j
x=1.0; x - 0j

If those aren’t all interpreted the same way as 1.0 - 0j then we’ve lost referential transparency and code becomes harder to refactor and reason about.

petersuter · September 10, 2023, 9:46am

Python 3.11.4 (tags/v3.11.4:d2340ef, Jun  7 2023, 05:45:37) [MSC v.1934 64 bit (AMD64)] on win32
>>> eval(repr(-0.0j)) == -0.0j
True
>>> eval(repr(1-0.0j)) == 1-0.0j
True
>>> eval(repr(1-0j)) == 1-0j
True
>>> eval(repr(0.0-0j)) == 0.0-0j
True
>>> eval(repr(1-0j)) == 1-0j
True
>>> eval(repr(-0+1j)) == -0+1j
True
>>> eval(repr(-0.0+1.0j)) == -0.0+1.0j
True
>>> eval(repr(1.0-0.0j)) == 1.0-0.0j
True
>>> (-0+1j) == complex(-0.0,1.0)
True
>>> (1-0j) == complex(1.0,-0.0)
True
>>> complex(-0.0, -0.0) == (-0-0j)
True
>>> complex(-0.0, -0.0) == -0.0+(-0.0)*1j
True
>>> complex(-0.0, 0.0) == (-0+0j)
True
>>> complex(-0.0, 0.0) == -0.0+0.0*1j
True
>>> 0.11111111111111111123456789
0.1111111111111111
>>> 0.11111111111111111123456789 == 0.1111111111111111
True

It seems the invariant holds and the values are considered the same. That it doesn’t always look the same seems unavoidable and is also the case for other literals like float numbers.

pochmann · September 10, 2023, 10:03am

Would 1 - 2 + 3j become 1 - complex(2, 3)?

pochmann · September 10, 2023, 10:07am

What’s your evidence for that?

skirpichev · September 10, 2023, 10:39am

True. BTW, I think we shouldn’t omit also the alternative proposal (imaginary class). It wasn’t clearly rejected in mentioned issues and it is supported beyond C and C++ languages (e.g. in Go too).

I think, they will be same.

(1.0) - 0j will be parsed as a Sub of a float(1.0) and 0j==complex(0.0, 0.0).
+1.0 - 0j as a complex literal == complex(1.0, -0.0).
0.0 + 1.0 - 0j as a Sub(Add(0.0, 1.0), 0j).
float(1) - 0j - a variant of (1), there are an integer literal 1 and 0j.
x=1.0; x - 0j - again a variant of (1): same literals.

Formal syntax could be found in the R7RS standard, sec. 7.1.1, p. 62 (obviously, we will exclude @ notation for polar form and special handling for nan/inf literals).

In some sense it’s true, they are same wrt the == op. Yet the complex(-0.0, 0.0) and complex(0.0, 0.0) are not same objects just as -0.0 and 0.0 (they behave differently):

>>> from math import copysign
>>> copysign(1.0, +0.0) == 1.0
True
>>> copysign(1.0, -0.0) == 1.0
False
>>> 0.0 == -0.0
True

Rather Add(Sub(1, 2), 3j).

Clearly, this is a subjective judgment. One argument is that a number of recurring issues in our bugtracker. Other argument is that users of the complex type (and the cmath library) are coming with some background in mathematics, while we can’t assume they are familiar with some other computer language.

pochmann · September 10, 2023, 12:00pm

How about 1 - 2+3j Or z - 2+3j? Wouldn’t you argue that the user meant 2+3j as one complex literal?

Rosuav · September 10, 2023, 12:38pm

That’s no different from any other misuse of whitespace, like writing 2 * 3+4 and expecting a result of 14.

Don’t forget that the repr for a complex number includes surrounding parentheses. The order of operations would remain correct:

>>> x = 3+4j
>>> y = 5
>>> x * y
(15+20j)
>>> eval("%r * %r" % (x, y))
(15+20j)

So this is only going to be a problem when the parens are omitted, and no worse a problem than anywhere else.

pochmann · September 10, 2023, 1:00pm

But 3+4 is not a literal. And to be clear: I’m not saying that that 2+3j should be treated as one literal there. I’m asking whether Sergey thinks it should. Based on what they wrote, I think they might. And I’m wondering what criteria they apply to decide.

skirpichev · September 10, 2023, 1:12pm

I would agree with @Rosuav in the first case. Second should be parsed as Add(Sub(z, 2), 3j). I admit, parsing with imaginary literals (present state of art) is much simpler.

Evaluation order in +?

You forgot my leading sentence: “Clearly, this is a subjective judgment.” Would you argue instead, that people will learn complex analysis from Python docs? What is your guess about user expectations?

pochmann · September 10, 2023, 1:20pm

I didn’t forget that. But you stated it not as “subjective judgement” or as a “guess” but as a fact.

What’s my guess? I don’t have one. I don’t have enough data to make one.

petersuter · September 10, 2023, 1:22pm

These arguments also seem similar in the usual misunderstandings with float numbers. They will have to learn a bit about Python and numerical computing at some point if they are curious about such details.

Even 1000 is not the same object as eval(repr(1000)). id(1000) gives different result.

pochmann · September 10, 2023, 1:31pm

Ok actually I do have a guess now: most users … don’t care :-). Or never even notice.

storchaka · September 10, 2023, 1:47pm

Thank you for good presentation @skirpichev.

I am personally a great fan of the imaginary class. It looks simpler and more coherent in comparison with alternative solutions. Most changes are local to 1-2 classes:

A new imaginary class. Making it a subclass of complex makes many things easier. It needs to overload a bunch of methods: __new__, __repr__, __reduce__, __neg__, __add__, etc.
The complex subclass only needs a tiny tweak in __repr__ (to represent the real negative zero as -0.0 instead of -0) and specialize arithmetic operation if other operand is a real number.
The parser needs an update to produce imaginary instead of complex numbers.
Few parts of the compiler that expect an exact complex type (such as _PyCode_ConstantKey()).
A new marshal protocol to support imaginary values. The marshal format was not changed for many years, it is a good opportunity to add other minor features, such as the support of slice objects.

And most of the rest should just work.

I see the main obstacle to this idea is that the imaginary type will not survive storing in array.array and NumPy arrays. Accordingly, the results of some operations on bare Python numbers and NumPy arrays will be different.

skirpichev · September 10, 2023, 2:26pm

Don’t you think it will break more things? I.e. not just float-0j, but also other cases of mixed operands (real op complex, imaginary op complex).

That’s a simple (few lines) change in the parsenumber_raw(), that’s why it wasn’t mentioned.

Not sure I got you. The stdlib’s array type don’t support now complex numbers.

My major concern with this solution is that this new class looks to be alien to the numeric tower (PEP 3141)…

Hardly most users who care about using complex numbers in python don’t have some expectations on how this stuff should work…

sirosen · September 10, 2023, 2:57pm

While that PEP is relevant and useful prior art, it’s not clear to me that it conflicts with the addition of Imaginary .

Decimal is noted by the PEP as existing outside of the tower already. And part of what’s being defined there is how to make new, well behaved numeric types, like Imaginary.

Can the change be made easier on numpy if Imaginary is added separately before it is used by complex? Addition of Imaginary with any other numeric can be defined to produce a complex number as the result.
And then complex can change to use imaginary internally later.

I’m not sure if that helps, or helps enough, to be worth the added complexity of introducing it more slowly.

franklinvp · September 10, 2023, 8:35pm

Sounds like trading the astonishment that there are no complex literals for the astonishment that + and - in a+bj are no longer the operators + and -.

And unlike in 1.0e+3 there is no e that tells you ahead of time that the + is not the operator.
In a+bj one would need to wait until the j at the end.

ntessore · September 10, 2023, 9:27pm

Isn’t 2*1+1j the problem with complex laterals?

skirpichev · September 11, 2023, 2:42am

Perhaps, it would be cleaner if I emphasize that the proposal is restricted to Add/Sub’s (BinOp) with special arguments (second is an imaginary literal and the first is ±int or ±float literal. (We could also discuss if we can redefine also unary Sub of an imaginary literal.) For n-ary ± we should keep current evaluation rules, i.e. a±b±c±d=(((a±b)±c)±d). If you want to place a complex literal somewhere between - use parentheses! After all, maybe they are for purpose in the complex.__repr__() output?

AST-transformation to formalize this a little (and/or to play with)

# cl-transform.py

from ast import *
from ideas import import_hook

class ComplexLiteralTransform(NodeTransformer):
    def visit_BinOp(self, node):
        match node:
            case BinOp(x, op, Constant(y)) if isinstance(op, (Add, Sub)) and isinstance(y, complex):
                y = y.imag
                y = Constant(y) if isinstance(op, Add) else UnaryOp(USub(), Constant(y))
                match x:
                    case Constant(x) if isinstance(x, (int, float)):
                        return Call(Name('complex'), [Constant(x), y], [])
                    case UnaryOp(op, Constant(x)) if isinstance(op, (UAdd, USub)) and isinstance(x, (int, float)):
                        if isinstance(x, int) and x == 0:
                            x = float(x)
                        x = Constant(x) if isinstance(op, UAdd) else UnaryOp(USub(), Constant(x))
                        return Call(Name('complex'), [x, y], [])
        return self.generic_visit(node)

def transform_cl(tree, **kwargs):
    tree_or_node = ComplexLiteralTransform().visit(tree)
    fix_missing_locations(tree_or_node)
    return tree_or_node

def add_hook(**kwargs):
    return import_hook.create_hook(hook_name=__name__,
                                   transform_ast=transform_cl)

With André Roberge’s https://github.com/aroberge/ideas:

$ python -q -m ideas -a cl-transform
Ideas Console version 0.1.5. [Python version: 3.12.0rc1+]

ideas> (1-0j)
(1-0j)

In fact, I think we can consider (±a±bj) to be the true complex literal. Whereas a feature that we can omit parentheses sometimes (e.g. for simple assignment like x=1+2j) - a syntactic sugar.

That’s somewhat an implementation-dependent feature:

Python 3.9.16 (7.3.11+dfsg-2, Feb 06 2023, 16:52:03)
[PyPy 7.3.11 with GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>> id(1000)
16001
>>>> id(eval(repr(1000)))
16001

While signed zeros - feature of the IEEE 754.

As well as for imaginary literals

I think - no. See above formalization with the AST transformation. Should be Add(Mul(2,1), complex(0, 1)).

P403n1x87 · September 16, 2023, 4:52pm

I was reading through this discussion trying to understand what the source of the problem might be. I’m not sure I have been able to follow everything so, if possible, I’d like to ask for a brief summary. Presumably all the arithmetic operations between floats are “fine”, so I struggle to understand how it is possible that suddenly things break when dealing with a pair of floats.

As for the idea of introducing a dedicated type for “imaginary numbers”, I don’t quite see what the need would be. Even in mathematics there isn’t generally a need to define/use the set of imaginary numbers. What problems would this new type solve?