Include `math.sign`

There are definitions (multiple!) for the complex signum.

Though, same seems to be true for the sign on floats (not on real numbers!).

This is rather an arbitrary decision. Why not prefer -0.0 instead? Signed zeros matters!

Ah, indeed. Latest revision looks different.

This definition looks most natural for representation of the mathematical function sign() in the IEEE 754-compatible arithmetic. But it’s just a special case of copysign() and thus not suitable for the stdlib.

This one-liner make sense for me, based on presented examples:

def sign(x: SupportsFloat) -> float:
    x = float(x)
    return x if x == 0 or math.isnan(x) else math.copysign(1.0, x)

(And we want the float return type just to pass nans.) But I still don’t think we have enough practical cases to warrant something like this in the stdlib.

Excuse me… I’m not an expert, but IMHO the majority of users will expect a simple 1, 0 or -1.

I read carefully the thread, but I really don’t understand why math.sign(x) should coerce x to a float, and should return 1.0, 0.0 or -1.0. I don’t see any practical advantage. It’s a really simple function, it’s really different from, for example, math.sin().

About NaN, IMHO the majority of user will expect a ValueError. For @hprodh use case, I suppose math.sign() can have a kwarg err or whatever, as also suggested before.

To sum up… it’s a dead simple function. I think it’s not worth to overcomplicate it. Just return 1, 0 or -1.

Just my 0.0.1 cents.

4 Likes

Don’t you feel any reluctance to say that a function that erases the difference between, say, 0.123 and 1.23e200 should nevertheless preserve the difference between -0 and +0? Why should it? The purpose of the function is to reduce the number of cases, and drastically so.

And, best I can tell, only JavaScript’s Math.sign() does preserve the difference. What’s the use case?

It’s enough for me that “almost everyone else does offer it”. Even frickin’ JavaScript :wink: A minor function to be sure, but with small implementation burden and possibly zero maintenance cost. The “cost” part of “cost/benefit ratio” is as close to zero as it gets.

Essentially all the math module functions work that way. That’s, why, e.g., you can pass not only floats to math.tan(), but also ints, fractions.Fraction, decimal.Decimal, and any number of custom numeric types the core knows nothing about. There’s no special code needed to support this: it’s just a general truth that appropriate arguments to math functions are magically converted to float.

Not to say that’s a “good” way, but it’s the way things have worked from the start.

Ok, but I suppose there’s a reason math.sin(), tan and company coerce all inputs to floats, also ints: because these kind of function uses an underlying, already existing, C function, that will not work with a giant 10**10000.

Furthermore, their outputs are not simple 1, 0 or -1. The output is a float, of course. It’s the nature of these functions, and furthermore the C function returns a float.

But math.sign() has no C function. There’s no reason to require it must return a float. Furthermore, it’s simple to accept also a giant int without coercing it to a float and get a overflow error.

You said “special cases are not special enough”. But also “practicality beats purity”. I know they are not real rules. They serve to make you ponder.

I pondered and, IMHO, it’s really practical and easy to coerce all but ints for sign, and return dead simple ints. We can’t compare a function so simple like sign to tan and the other ones. From my point of view, it’s not a special case, it’s a completely different one.

Honestly, I don’t see any “heresy”, overcomplication or side effects in doing that. But maybe I don’t have enough experience in this field.

0.0.2_alpha_0 cents.

1 Like

A practical problem is that Python is an OO language, but math supplies a functional interface. That’s not a perfect fit, and it shows.

Coercing to float is practical, even for custom numeric types Python knows nothing about, because custom types can supply a special __float__ method. Then they work without math having to know anything about them. The same applies to core types (like int, Fraction, and Decimal). They also have __float__ methods, and math doesn’t generally require special code to accept them either.

There are rare exceptions, such as math.log(). That does make a special case of ints, because logs of giant ints are useful, and “just blowing up” is no good then. But that is special-case code, and custom numeric types have no way to participate in that.

As to the return type, to the extent that we have “use cases” here, they all appear to work with floats, and go on to multiply the result of sign() with a float. Returning an int would require coercing the result to a float anyway.

That said, some other implementations of sign() do return ints. Arguing about it seems pointless, because there’s no QED to be had. Pick one and move on. Since numpy picked float, that’s carries a lot of weight to me. I don’t anticipate ever using the function anyway :wink:

For this context where you want nans to pass through (and all other contexts that I can think of where I would want nans to pass through[1]) don’t you already have np.sign available? And furthermore, wouldn’t you want to apply np.sign to the whole array?


  1. I do see the utility of having nans in float arrays ↩︎

1 Like

I don’t see obvious use cases. It just happens to be yet another arbitrary decision, if the returned value expected to be a float. But any other choice introduce asymmetry.

But not C. And not Fortran. And not Go. Are we still sure about everyone? :wink:

Almost every one-liner, probably, will fit to this camp.

I’m fascinated to see the conflict between ISO/IEC 10967-1 and other draft standards regarding signed zeros. It confirms that a ‘simple’ signum function is an architectural choice rather than a mathematical definition. Currently, my csignum-fast stays with the (x > 0) - (x < 0) logic and an integer result for switching or indexing as a robust baseline for all Python types. Many thanks to all, this discussion gifts a lot of ideas for the upcoming v1.3.0!

1 Like

Yes, it is the kind of numpy usage my case is inspired from.
(Side note : Usually, I do not even use np.sign, but np.where(x > 0, ...))

I wanted to underline that it is convenient to distinguish mathematically expected errors from computation error with nans.

I tested this out for completeness. I doubt that in practice this matters because there is so much other overhead going on before and after the code in question like:

  • Looking up the sign function name in globals.
  • The interpreter preparing the args and invoking the function call.
  • Checking the arg is a float or otherwise calling __float__.
  • Actually calling the C function.
  • Taking the return value and putting it into a heap allocated Python float.
  • Probably a bunch of refcount twiddling along the way.

I find it hard to imagine that the actual implementation here could possibly have a noticeable effect on the runtime performance as long as it is roughly as simple as I have suggested.

(With csignum-fast it is different because it does way more in the body of the function.)

Take these three C implementations:

#include <math.h>

double sign(double x)
{
    if (x > 0.0)
        return 1.0;
    else if (x < 0.0)
        return -1.0;
    else if (x == 0.0)
        return 0.0;
    else
        return x;
}

double sign_cs(double x)
{
    if (x == 0.0)
        return 0.0;
    else if (isnan(x))
        return x;
    else
        return copysign(1.0, x);
}

double sign_np(double x)
{
    return x > 0 ? 1 : (x < 0 ? -1 : (x == 0 ? 0 : x));
}

The first one is what I showed in the diff to mathmodule.c above. The last one is what numpy uses as the inner of sign’s float ufunc loop. I would have expected the compiler to generate the same code for sign and sign_np at least but all three get different code. The compiler (gcc -O2) is smart enough not to actually call the copysign function but otherwise I don’t see how to choose one from another from the generated assembly.

None of them comes out branchless or seems to have generated code for a non-branching common case that x would be an ordinary positive or negative value. I was hoping for something like

  1. Some clever bits operation that puts 1.0 or -1.0 somewhere and sets some flags.
  2. A rarely used jump to somewhere else that handles zeros and nan.
  3. A quick return of 1.0 or -1.0.

It doesn’t look like any of them got that though so maybe there just aren’t the instructions available to do that.

Yes, in some situations you would want T=int but then you want a whole set of functions that does int -> int. The math module provides a whole set of functions that does float -> float.

This isn’t just about some sense of mathematical or type-theroetic purity but also about things being discoverable and understandable. The docs for the math module are nowhere near detailed enough for someone to understand these things and would benefit IMO from showing the types explicitly. When you look at the type annotations for the math module it is very clear what the general model is and which functions are the outliers:

Ok, I understand the point. Furthermore, doing an exception only for int as I proposed doesn’t make sense, since you can have also a big Fraction and a big Decimal.

But since we only need to check if x > 0, x == 0 and x < 0 – and maybe if the input x is a NaN, – is it really needed to convert the input to a float?

Take a look at my comment over here:

It just isn’t well defined to try to duck-type these things. There are many different numeric and non-numeric types in Python that implement operators like < and == in many different ways returning varying types and having varying semantics.

If you want well-defined predictable behaviour then you need a well-defined and predictable way of getting the inputs to known numeric types and __float__ is the math module’s way of doing that.

It is not needed, it is just what the functions in the math module generally do, see also Tim’s answer. Functions from math can be applied to floats and numbers that can be converted to floats:

>>> class Foo:
...     def __float__(self):
...         return 42.
...         
>>> f = Foo()

I can apply functions from math, like

>>> import math
>>> math.pow(f,2)
1764.0

but not the corresponding functions from __builtins__

>>> pow(f,2)
Traceback (most recent call last):
  File "<python-input-11>", line 1, in <module>
    pow(f,2)
    ~~~^^^^^
TypeError: unsupported operand type(s) for ** or pow(): 'Foo' and 'int'

since they usually delegate to a dunder, like __pow__ here. This is what functions in __builtins__ generally do. So, a hypothetical __builtins__.sign should delegate to a __sign__ dunder.

As opposed to a conversion to float that only requires one function call (__float__), this would need three function calls (__lt__, __gt__ and __eq__; from C via PyObject_RichCompareBool). In order for this to work, the type would need to implement those three comparization operators for mixed-type (type(x) and int) inputs.

Dunno. The only one that comes into my mind are numpy arrays, and they can’t be converted to a float anyway.

Well, of course I read it, I quoted it. But maybe it didn’t seem :slight_smile: I answer below

Ok, and this is also the point of Tim. In theory, we could add an abstract sign() method to Number and its implementation to Rational, Real and Integral. I suppose it’s an overkill… ^^

The general approach of math is pragmatic. Just convert to float and do your calculations. This is a strong point in favor if the function is something like sin, because you don’t have to reinvent the wheel. C already has a sin() function. It works from dozen of years and it’s fast.

EDIT and I forgot to say what you and Tim already said: that you need only __float__.

But for something like sign, honestly, I don’t expect it will require more than 50 lines of C code. And I don’t know if the speed really matters.

Currently, comparing against a integer (zero) works for int, floats, Fraction and Decimal. If it’s not enough, we can add the conversion to float as a fallback.

But it doesn’t work for everything and you don’t have a way of validating whether the input type is something that it can work for. Also converting to float works for all of those types as well.

It is useful not just that the function works for some types but that it is known what types it works for and what the caveats and failure modes are. The C implementation I showed above only has one failure mode: it fails if __float__ or __index__ fails or if the type does not have those methods. If the type has those methods and those methods succeed in returning valid int/float objects then the function succeeds.

Whether just calling __float__ results in the correct behaviour is a context-dependent question but if you know that it calls __float__ then you can reason about whether it does what you want. If the implementation randomly calls other methods then it is much harder to reason about whether that does actually do what you want for the type that you are passing in or how it might fail when calling the different methods.

It is also very useful from the perspective of the people maintaining the implementation if the scope of the function is clearly defined. When you declare the type as SupportsFloat you are making that clear: we assume that you give us __float__ and from there we guarantee correct behaviour.

1 Like

If consistency with the other math functions is the main criteria of choice, math.sign should have the same behavior as math.ceil, math.floor, math.trunc, i.e. return ints and raise on NaNs.

I was in favor of passing NaNs through… but raising on NaN is also ok, the user can catch them before input, if necessary. My main concern was that sign(nan) should not return 0.

1 Like

Pretty much, yup. Don’t know beans about Go. C is so minimal that it still doesn’t even have max/min. Fortran recognized the importance for its audience from the start, with the mondo weird “arithmetic if”:

IF (expr) label_lt0, label_eq0, label_gt0

It was a 3-way GOTO branching based on “the sign” of expr. That’s been removed in the latest version of Fortran, sacrificed to the “structured programming” gods.

But most one-liners are not so very widely offered in computing environments. And few one-liners bring up so many points of disagreement (should the sign of 0 matter? what should be done with a NaN? what should the return type be?). Standardizing on some choice for each spares users from the cognitive burden of making their own guesses, and a plethora of incompatible implementations across Python packages.

That it’s easy to implement doesn’t drive this. But I don’t accept it as “a reason” for not adding it either. If it required heroic efforts to implement, that could disqualify it. But it doesn’t.

1 Like