Include `math.sign`

I don’t believe it’s been suggested before that it always return a float. It must do so for floating-point types, but in all other cases I’d like to see it return an int in {-1, 0, 1}. A float is about the least useful result type imaginable for, say, a Fraction. You can’t even multiply the result by a Fraction then without risking total loss of information.

Remains to be seen :wink:.

Right, that’s not going to happen :smile:.

Can’t speak for others, but I’ve reached my burnout threshold on this discussion. I’m frankly amazed - and impressed by! - that you’re still pursuing it, after nearly a year. It shouldn’t be this hard :frowning:.

1 Like

That’s a bit too harsh. While the standard requires signaling “InvalidOperation” when an ordered comparison is applied to a NaN (quiet NaN or not), the standard also requires that conforming environments not “trap” on “InvalidOperation” by default.

It’s Python’s decimal that violates the standard here, by enabling the InvalidOperation trap by default (also the, DivisionByZero and Overflow traps).

That’s why, e.g., on most (all?) platforms,

>>> math.nan < math.nan
False

simply returns False, Most (all?) platform C implementations follow the standard by disabling all 754 traps by default. If you look at the hardware state, though, you’ll see that the above did trigger the InvalidOperation signal,

Here’s how it works in decimal:

>>> import decimal
>>> decimal.getcontext().flags
{<class 'decimal.InvalidOperation'>:False, <class 'decimal.FloatOperation'>:False, <class 'decimal.DivisionByZero'>:False, <class 'decimal.Overflow'>:False, <class 'decimal.Underflow'>:False, <class 'decimal.Subnormal'>:False, <class 'decimal.Inexact'>:False, <class 'decimal.Rounded'>:False, <class 'decimal.Clamped'>:False}

# all False;; turn off the Invalid trap

>>> decimal.getcontext().traps[decimal.InvalidOperation] = False

# try an ordered compare on a NaN; no exception now

>>> nan = decimal.Decimal("nan")
>>> nan < nan
False
>>> pprint(decimal.getcontext().flags)

# But the Invalid _flag_ is set now:
{<class 'decimal.InvalidOperation'>:True, <class 'decimal.FloatOperation'>:False, <class 'decimal.DivisionByZero'>:False, <class 'decimal.Overflow'>:False, <class 'decimal.Underflow'>:False, <class 'decimal.Subnormal'>:False, <class 'decimal.Inexact'>:False, <class 'decimal.Rounded'>:False, <class 'decimal.Clamped'>:False}

# try again after enabling the trap
>>> decimal.getcontext().traps[decimal.InvalidOperation] = True
>>> nan < nan
Traceback (most recent call last):
  ,,,
    nan < nan
decimal.InvalidOperation: [<class 'decimal.InvalidOperation'>]
1 Like

You obviously did not read my post, which you did answer to, properly:

My function was never thought as a general purpose function to be provided by the standard library, but just a simple function that programmers write themself to fit their use case.

What means dangerous? If an exception is raised in python code, it is is just propagated through python code. No danger there. But, if C code mishandles exceptions, it can easily crash the interpreter or do other crazy stuff:

>>> class Mine():
...     def __lt__(self,other):
...         try:
...             return math.sqrt(7) < other
...         except:
...             print("Boom!")
...
>>> Mine()<0
False
>>> signum.sign(Mine())
Boom!
Traceback (most recent call last):
  File "<python-input-43>", line 1, in <module>
    signum.sign(Mine())
    ~~~~~~~~~~~^^^^^^^^
TypeError: signum.sign: invalid argument `<__main__.Mine object at 0x7fbb0189c830>`. Type 'Mine' does not support order comparisons (>, <, ==) or NaN detection.

I wrote about this issue in your csignum-fast package here 3 days ago, but you chose to ignore it. So, yes, speed without brakes is actually dangerous.

I sort of explained this in another thread:

The issue is that == is what is used by dict and set so you have to choose between defining == in a way that makes the expressions efficiently hashable vs defining == as mathematical equality. SymPy chooses the hashable version but that means that == is not in the same semantic plane as < and > and the equality operator that corresponds to < is actually spelled as Eq(a, b) rather than a == b:

In [1]: e = log(sin(4)**2 + cos(4)**2)

In [2]: Eq(e, 0)
Out[2]: True

Note that if you want to use that reliably you have to spell it like

if Eq(e, 0) == True:

because bool(Eq(e, 0)) can blow up on you for some expressions.

The alternative choice that a == b be equivalent to Eq(a, b) would make == semiundecidable even for well-defined simple numeric expressions.

1 Like

I’m firmly against this. converting to float will cause problems, and we know it will cause problems with even relatively simple numeric types, thats why we aren’t doing it for the standard lib types.

I’m not in favor of making a function do the wrong thing for the sake of “consistency”. It’s much more important that the functions do the right thing. Beyond that, it’s not actually consistent. The other functions that already exist in math that aren’t continuous functions don’t convert to work, they use properties of the type (that in their case, were defined for this purpose, but we don’t need a new dunder to attempt to compare to zero)

1 Like

It is this hard I think because there just isn’t a clear understanding of what is needed to make these things well defined. On the face of it it might seem like Python is a good language for writing code that works generically with different mathematical types e.g.:

def area(width, height):
    return width * height

There you have a function that can work for int, float, Fraction, numpy arrays and so on and you didn’t have to do anything complicated to make it work. The problem is that as you go beyond __add__ and __mul__ you start to get into trouble and more is needed but I don’t think people have a clear understanding of what that more is.

On the face of it sum only uses __add__ but it still runs into trouble:

In [13]: type(sum([Fraction(1,2), Fraction(1,6)]))
Out[13]: fractions.Fraction

In [14]: type(sum([]))
Out[14]: int

This is why sum has a second argument. The caller has to provide “the type’s idea of zero”:

In [15]: type(sum([], Fraction(0)))
Out[15]: fractions.Fraction

I think if this was well understood by everyone then math.sumprod would have been given the same parameter.

A better option here would be for the fractions module to provide a sum function:

>>> fractions.sum([])
Fraction(0)

Then the implementation could use an algorithm that is designed for rational numbers (the trivial algorithm used by sum is not good here). It would in fact also be useful that you could pass in floats and get a Fraction back out so that you have an exact counterpart to math.fsum.

If you look inside the code for math.prod then it is (simplifying a bit):

def prod(nums, *, start=1):
    if all(isinstance(n, int) for n in nums):
        # Fast path for integers
    elif all(isinstance(n, float) for n in nums):
        # Fast path for floats
    else:
        return reduce(mul, nums, start)

To me that just looks like it should be two or three separate functions. It is useful to have a function that gives a fast product of floats or one for a fast product of ints. The generic prod is convenient but the actual reason this function is more useful than just writing your own is because of the fast paths for int and float but those could be two separate functions (and would not need the start parameter).

The people designing the array API understood what was needed to write generic code with different array types:

In [23]: a = np.array([1])

In [24]: ns = a.__array_namespace__()

In [25]: ns is np
Out[25]: True

In [26]: ns.cos(a)
Out[26]: array([0.54030231])

A function can take an array from numpy, tensorflow etc and call its __array_namespace__ method to get the module that provides the functions for that type. That is what is needed to write code that works polymorphically with different types in Python: the set of functions for each type has to be provided somehow.

1 Like

While I can at least somewhat sympathize with this view, I don’t think it’s a good fit for the current layout of the standard library. Very few people who aren’t deeply entrenched are going to look at math and instantly think “I should only expect correct behavior for floats”, and even though the math library has it’s roots with floats, there’s nothing preventing us from ensuring correct behavior for more so that user expectations can be reasonable.

I do think that if this is your stance on it, you of all people should be firmly against __float__ ever being used implicitly though, so I’m having a very hard time reconciling various things you’ve said in this thread.

I think the only good options that actually ensure the standard library is properly supported and that don’t cause other predictable wrong results are:

  1. Only accept the standard library types, do the right things for each.
  2. Duck type an attempt at a comparison to zero, and document it as such. Advanced users should know to use their library’s sign rather than math’s.
  3. Add a __zero__ dunder that must only be implemented by numeric types for which a numeric comparison to zero via rich comparison makes sense.
    3b. Also add a __is_nan__ dunder that must only be implemented by types that have a nan concept.

While 3/3b allows this to work for more without the ducktyping, it also requires yet another new dunder for math operations, and would still likely involve rejecting sympy numeric values because the way sympy uses __eq__

1 Like

I should further clarify that __zero__ rather than __sign__ because if we’re going the route of more dunders, it would be better to pick one that enables what we want and more. dunders aren’t free to add. There’s costs for the implementation of the standard library, and on third party types.

1 Like

I’m not sure how productive it is to dive into the history and reasons behind Python’s design, given that we ultimately need to work with what we’ve got, not what we wish we had, but I do think it’s useful to remember that originally, duck typing was Python’s approach to generic code.

This is an excellent example of a duck typed generic function - it just works for any type that has a * operator that works sensibly.

Dunders weren’t originally about providing hooks for generic functions, they were for allowing types to implement the built in operators of the language[1]. But they were useful and convenient, so they got used for other cases as well.

So things got a bit muddy, but they worked just fine, in a “consenting adults” fashion[2].

But people started writing more critical software in Python, and static type checkers came along, and “good enough” was no longer good enough. Which is where we are now. Writing 100% robust generic code in Python is an extremely difficult task, especially if you want it to be typed, and usually requires the participating types to follow some agreed conventions to make it work. But on the other hand, “practicality beats purity” is a hard sell for people who insist on consistency and extensibility. Because at the core, the language simply isn’t that consistent/extensible - look at the problems with the numeric tower if you don’t believe me.

Personally, I’d like a traditional, practical solution - return integer -1, 0, 1, except in cases where we need signed zeroes or nans to handle real-world use cases. And to heck with problems expressing the resutling type signatures cleanly, or fitting unconventional “numerical” types into the model. Write a wrapper using singledispatch or overloads if you care about those cases.

But what I want isn’t what dictates what happens here, and as you say, people don’t all want the same thing. So the question becomes, do we implement something that’s a 90% solution, covering the commonest use cases[3], or do we give up and implement nothing because we can’t reach a consensus?


  1. More generally, operations - see, for example, __bool__. ↩︎

  2. i.e., sometimes they didn’t work, but you were expected to take responsibility for that and not expect too much of the language :slightly_smiling_face: ↩︎

  3. Whatever they are :slightly_smiling_face: ↩︎

6 Likes

This is the last post I hope to make about semantics. Following numpy is almost always the best idea when adding a new number-crunching function to Python, and I don’t think this is an exception, except for Python’s own Decimal type.

  • For numpy’s own floating types, it returns numpy floats, including passing through NaNs, but losing the sign bit of 0.
    • But @mdickinson pointed to a numpy issue suggesting they’ll be changing to preserve a zero’s sign. We should do that from the start then.
  • For types other than numpy floats, it does not appear ever to convert to its own float types. Instead it returns ints in {-1, 0, 1}.

Which it appears to do by “the obvious” compare-to-int-0 method.

>>> import numpy
>>> import datetime
>>> numpy.sign(datetime.timedelta(0))
Traceback (most recent call last):
   ...
TypeError: '<' not supported between instances of 'datetime.timedelta' and 'int'

For floating types it doesn’t know about, that usually words fine too, but the result is not a float:

>>> import mpmath
>>> numpy.sign(mpmath.mpf(-3.1))
-1
>>> type(_)
<class 'int'>
>>> import gmpy2
>>> numpy.sign(gmpy2.mpfr(-3.1))
-1
>>> type(_)
<class 'int'>
>>> import decimal
>>> numpy.sign(decimaL.Decimal(-3.1))
-1
>>> type(_)
<class 'int'>

For NaNs of floating types it doesn’t know about, it raises an exception, presumably by detecting that trichotomy fails on its attempts to compare with int 0:

>>> numpy.sign(gmpy2.mpfr("nan"))
Traceback (most recent call last):
    ...
TypeError: unorderable types for comparison

But it’s worse for Decimal, because Python’s Decimal deviates from IEEE-754 by enabling the invalid operation trap by default (and trying to do an ordered compare with a NaN IS an invalid operation by the standard’s rules):

>>> numpy.sign(decimal.Decimal("nan"))
Traceback (most recent call last):
  ..
    numpy.sign(decimal.Decimal("nan"))
    ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
decimal.InvalidOperation: [<class 'decimal.InvalidOperation'>]

For Python’s other numeric types (ints and Fraction), it “just works”, and also for many other package’s numeric types (e.g. gmpy2.mpq, its rational type).

Python’s version should certainly work seamlessly with Python’s own Decimal type, but other than that I’m perfectly happy with what numpy does in all cases. I don’t even mind enough about timedelta failing to complain.

WRT Decimal, the “compare” method almost does the whole job by itself (just losing the sign of 0), including not raising InvalidOperation for comparing a NaN:

>>> for  base in -2, 0, 4, "nan":
...     X = decimal.Decimal(base)
...     r = X.compare(0)
...     print(X, r, type(r))
-2 -1 <class 'decimal.Decimal'>
0 0 <class 'decimal.Decimal'>
4 1 <class 'decimal.Decimal'>
NaN NaN <class 'decimal.Decimal'>

I don’t see any good reason for why Python should be less capable than numpy here, nor any particularly good reason for why it should try to be more capable. If you have a sane type that can be compared to int 0, it will “just work”. Else it’s garbage in, garbage out. Use a sign implementation for that type’s oddball idea of “sign”.

The solution to the real but apparently very rare “well, it would work if you coerced to float first”[1] is obvious: apply float() yourself before calling math.sign(). Then if unanticipated overflow or underflow gives you a bad outcome[, it’s entirely on you too. You asked for it, the implementation didn’t force it on you.

And now I’ll make a note to check in again here after another year passes :wink: .


  1. a small subset of sympy expressions are all I’ve seen in this class ↩︎

2 Likes

FYI, by adding prints to my own toy “numeric class”, I observe that numpy.sign() applied to x of a type T numpy doesn’t know about:

  • Never invokes float().

  • Never invokes T(0).

  • Tries, in this order, x < 0, x > 0, x == 0,. That’s the Python int 0.

  • The first of those that returnsTrue determines the result.

    • And tests after it in the list aren’t tried.
    • So it doesn’t care if more than one (or even all) return True.
  • If all 3 return False, it raises

      TypeError: unorderable types for comparison
    

That’s all fine by me too. It has no pretensions to “exhaustive” sanity-checking (that’s not sign’s job), just enough to stop it from making up a result that flatly contradicts what x claimed was true.

Putting it together, this is a Python prototype for a sign() that works as much like numpy’s as reasonable. But - shudder - has cases! :smirking_face:

code
    from decimal import Decimal as _Decimal
    from fractions import Fraction as _Fraction

    _Decimal0 = _Decimal(0)

    def sign(x):
        typ = type(x)
        if typ is float:
            if x < 0.0:
                return -1.0
            elif x > 0.0:
                return +1.0
            else:
                # pass through NaNs and signed zeroes
                return x

        elif typ is _Decimal:
            if x == _Decimal0:  # pass through signed zeroes
                return x
            else: # .compare() does everything else "right"
                return x.compare(_Decimal0)

        elif typ in (int, _Fraction):
            return (x > 0) - (x < 0)

        else:
            # Follow the bare-bones definition of signum.
            # This will work fine for any type that supports
            # a sensible notion of comparing to int 0.
            if x < 0:
                return -1
            elif x > 0:
                return +1
            elif x == 0:
                return 0
            else:
                raise TypeError(f"can't be ordered against int 0: {x!r}")

Deviations from numpy:

  • Preserves the sign of a 0 for Python’s float types (as it appears numpy will be changing to do too).
  • Doesn’t do weird things for Python’s Decimal.

But, just as Decimal is a “weird type” to numpy, numpy’s float types are “weird” to this. There’s no attempt at duck-typing “is a float type” here, and so they fall into the NaN-unaware and signed-zero-unaware catch-all “unknown type” case:

>>> import numpy
>>> numpy.sign(float("nan"))
np.float64(nan)
>>> sign(_)
Traceback (most recent call last):
  ,,,
TypeError: can't be ordered against int 0: np.float64(nan)

I don’t care. numpy users are going to use numpy.sign on numpy’s floats, not Python’s

Types that are plain nuts to use with this generally blow up in other ways:

>>> sign("abce")
Traceback (most recent call last):
    ,,,
TypeError: '<' not supported between instances of 'str' and 'int'
1 Like

Yes, and I touched on some of the subtleties in my own follow-up later - I think the original left the impression that sympy may be incoherent. But it isn’t, not at all.

The larger point is that even when external types do implement dunder methods (like __eq__), we still have no idea what they were written to do. Granted that “equality” is an especially slippery concept, though.

Can you link this again? I can’t find any reference for that. The Array API defines sign here. Sign of zero is very clearly zero for at least the numpy floating types. I would find it really odd if it worked differently for float.

It is here:

It features a similar discussion as here for the dtype=object case where similarly ill-advised attempts at type checking (isinstance(obj, Real)) or duck-typing trichotomy etc are considered. It would have been better if np.sign had never attempted to handle dtype=object and along similar lines here it would be better not to add math.sign if it can only be done with poorly defined duck-typing or some ad-hoc special-casing of particular types.

As a float function it makes sense and is well defined but it seems no one agrees with me that a float version of sign would be reasonable in the math module. The alternatives to a float version of sign that have been suggested all sound to me like setting bad precedents that would be worse than any benefit the sign function itself would bring.

In what way is “For types other than the two special cases (float, decimal) in the standard library, use a rich comparison with zero” poorly defined?

Sign only has any weirdness at all because of float-like types. All of the special case behavior needed for numeric types is due to floats not being well-behaved numerically.

1 Like

This implementation actually lets through some bizzare things with the implicit truthiness of if x < 0: et al. I should have specifically called out why I used match-case in my implementation, it more neatly represented rejecting things that weren’t returning a clear indication of numeric comparability since unfortunately, we have types that use the rich comparison dunders to create a DSL rather than having macros.