Include `math.sign`

+1 on that!

No, the sign of a small positive number is still 1, while the sin of a small positive number x is also a small positive number (actually smaller than x as one can see from the series expansion) and 0 is a good approximation for that.

1 Like

Which is why I went on to suggest different code to catch underflow. At least for all of Python’s numeric types convertible to float, if x itself says it’s non-zero, but converting to float yields a 0, it underflowed. otherwise not.

Flinrt’s arbs are different in this way:

>>> import flint
>>> a = flint.arb(0, 1)
>>> a
[+/- 1.01]
>>> bool(a) # it's truthy
True
>>> float(a) # but converts to 0 anyway
0.0

But in that case, the “always use float(x)” approach will also return 0. flint.arbs don’t work well with any approach here.

??? @acolesnicov’s implementation works fine:

>>> from signum import sign
>>> x = [1, -2, 3]
>>> abss = [abs(y) for y in x]
>>> signs = [sign(y) for y in x]
>>> abss
[1, 2, 3]
>>> signs
[1, -1, 1]
>>> [a*b for a, b in zip(signs, abss)]
[1, -2, 3]
>>> _ == x
True

Yes, checking for potential errors in all cases is much more tedious in C code than in Python.

1 Like

Of course, you can apply signum.sign component wise, but I was referring to a direct application:

>>> import numpy as np
>>> from signum import sign
>>>
>>> x = np.array([1,-2,3])
>>> abs(x)
array([1, 2, 3])
>>> sign(x)
Traceback (most recent call last):
 File "<python-input-5>", line 1, in <module>
   signum.sign(x)
   ~~~~~~~~~~~^^^
TypeError: signum.sign: invalid argument `array([ 1, -2,  3])` (type 'numpy.ndarray'). Inner error: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In order for sign to work as well as __builints__.abs, it would need to delegate to a designated method of the type and not to the rich comparizations.

Cut me a break :wink:. You know the definition of the mathematical signum function, and its application to non-zero rationals is dead obvious.

Not reasonable to me. And we have two fleshed-out implementations now (numpy.sign and @acolesnicov’s) that do behave reasonably for all rationals.

It’s the convert to float approach that’s harder to explain. Why introduce a conversion at all? The obvious implementation is

return 1 if x > 0 else -1 if x < 0 else 0 if x == 0 else x

Quite arguably so :wink: But at least so far nobody is forced to use them in their own code. Python wasn’t designed with static typing in mind.

Of course it will matter to some code. The infinitely precise value of math.sin(tiny_fraction) is a tiny real that can’t be represented as a machine float. The 0 result is as close to infinitely precise as it’s possible to get, correctly rounded to machine precision.

But the infinitely precise value of sign(tiny_fraction) is exactly 1 or -1. And it’s exactly representable as a float. That it can return 0 instead is unacceptable to me.

Do a right thing, or restrict math.sign() to inputs that can’t get every result bit wrong.

And you happened to pick a function where a small rational underflowing to 0 doesn’t matter, but because log(1 + abs(x)) approaches 0 as x approaches 0, even with infinite precision. At x = 0, it doesn’t matter what sign returns, because it goes on to multiply sign’s result by 0.

That doesn’t imply that sign’s result doesn’t matter in all contexts. In your specific example, the jump discontinuity is hidden by the multiply.

You, of all people, should be sensitive to this. For example, someone uses a rational process that delivers increasing good approximations to “the true” result. After a number of partial results that appear to be moving toward 0, they want to know whether it ever reaches, or crosses, 0. sign just isn’t usable for that if it can deliver nonsense results.

Yes, arbs “appear to be” NaNs by failing trichotomy here. mpmath’s idea of intervals do not support conversion to float (ValueError), unless an interval contains only a single point. I understand why arbs “pick the midpoint” instead, and that’s fine. I’m happy to cater to arb’s quirks too, but not really at the cost of making things weird for the very widely used core types.

2 Likes

Things like math.cos() can’t be applied to numpy arrays either. None of the math functions know anything about numpy arrays. @acolesnicov’s sign() implementation is no better or worse in this respect.

1 Like

I should clarify that I’m not “a purist” here either. “Heroic efforts” also count.

>>> import math
>>> math.tan(math.pi / 2)
1.633123935319537e+16

math.pi is a little bit less than the infinitely precise value of \pi, and so the tangent of math.pi / 2 is “really large”, but nowhere near +\infty, or even near the limit of what a float can express. It is the correctly rounded result of tangent applied to the machine approximation math.pi / 2.

It’s certainly possible to create rationals whose true tangents are very much larger (there’s no upper bound), but math ignores that possibility. Rationals are converted to floats, and you get what you get then.

I don’t recall that it was ever discussed, but that was a deliberate choice by me long ago. And I don’t recall anyone disputing it (or, for that matter, even noticing it :wink:).

sign() is different. It’s not a continuous function, and has a very clear and simple definition for all of Python’s numeric types except complex. Rounding of the result is not an issue at all, because all possible infinitely precise results are exactly representable. There is nothing at all about limits or approximations in its definition.

Getting that wrong is unacceptable to me. You get the correct result, or you don’t. There is no sense in which, e.g., 0 is a “good approximation” to 1. To machine precision, 1.633123935319537e+16 is a “good approximation” to \tan{\pi/2},

2 Likes

Yes, because math functions do not do duck-typing, but convert inputs generally to floats.

Duck-type sin function that also works for numpy arrays could be naively implemented as follows...
def sin(x):
     factor = x
     result = x
     for i in range(1,30):
         factor = -(factor*x*x)/(2*i)/(2*i+1)
         result = result + factor
     return result

Neither does __builtins__.abs, but it delegates to the type.

But it neither fits expectations of math, nor of __builtins__.

1 Like

Yes, and it was noted long ago that doing this “absolutely right” would require adding a new dunder method (like __sign__(), akin to __abs__()) that types could supply to do whatever they like best.

But nobody wants all that pain for such a relatively minor function, and no way do people want to elevate the function to a built-in.

Because it is exactly defined for most types, it’s also a strain to stuff it into math,

So it goes - not everyone will be 100% happy. But that math is the “least bad” place to put it doesn’t mean it has to deliver dead wrong results in some cases.

3 Likes

To be a bit nit-picky: +\infty is only the left-sided limit \lim_{x\uparrow \pi/2} \tan x, the right-sided limit \lim_{x\downarrow\pi/2} \tan x is -\infty.

So, a sufficiently good lawyer could argue that inf snd -inf and everything in between was a good answer.

1 Like

Not in an IEEE-754 world :wink:. math.pi is a specific representable float, and the tangent of it has a uniquely defined result under nearest/even rounding. Python does the best possible job of it (well, on my platform - the libm in use really drives it).

2 Likes

That is actually a very common feature of the way that sign is used and not a coincidence or something that is contrived. Note how the defining equation for sign(x) has the same property and does not care what the value of sign(0) is:

x = sign(x)*abs(x)

I do care about this but if you really want to do this sort of thing reliably (more reliably than float) with rational or exact numbers then you need more machinery than the math module provides. I would like Python as an ecosystem to provide many bits of machinery that are useful for different things so that people can choose the right tool for the job. I think that 99% of the time for scalar calculations the math module is exactly what is needed and is entirely good enough. When it isn’t you should use something else.

For Fraction it would be better if the fractions module provided the relevant functions:

>>> from fractions import sign
>>> sign(2.5)
Fraction(1, 1)

That is always the model that makes (numeric) things work reliably in Python.

1 Like

Except I’m not questioning how the function works for floats. Even if it’s 100% benign for all uses with floats, that doesn’t address the obvious failures when a “force everything to a float first” approach can destroy correct results for types other than float.

I chafe at calling that a “defining equation”, because it’s not. It’s an identity satisfied by (a correctly defined) sign().

As you say, it’s of no use in defining the result of sign(0)’ Well, beyond ruling out an infinite result. And says nothing at all about sign(NaN).

So, no, not a definition. You could try to define the result as x/abs(x) - but then that blows up for zero. At least it gets sign(NaN) right :wink:

So it’s a cute identity, but loses too much information. The actual definition involves comparing an input to 0. Which floats certainly support, but so do many other numeric types.

We already have implementations that work correctly for all int and Fraction arguments. But they don’t always convert to float first. Which is why they can work correctly.

Why you’re fighting that escapes me. I haven’t finished thinking about Decimal, though. “Because the math module is inherently sloppy” doesn’t sway me: the definition of sign() relies on nothing about floating-point arithmetic, rounding, representation error, or ULP errors that are central to almost all the other math functions. It’s a function that can be applied to floats, but has no other actual connection with the business math is in. math is just the “least bad” place to put it, but it’s a strained fit all the same.

Exact numbers are a different beast entirely. They don’t even fit the “compare against 0” definition. Not even talking about that such comparisons are formally undecidable. In real life, packages for exact arithmetic use a cutoff about how many digits to compute before giving up. So sets of results are, in general, the best they can hope for. “Well, after computing 10000 digits, I can say for sure it’s not negative , but still can’s say for sure whether it’s 0 or strictly positive”.

I don’t care about them at all here. They need their own model for what “sign” means.

While I’m a “perfect is the enemy of the good” guy. A function that works correctly for Python’s own numeric types is more “minimal acceptable competence” than “perfection”.

An alternative I already said I could live with: cut math.sign() back to only accepting floats. I expect that’s the only way we’ll ever see other types offer their own sign() functions. I would prefer that math.sign() spare everyone the bother.

3 Likes

if math.sign can’t do the correct thing for a standard library numeric type, and we know it can’t, the function shouldn’t accept that type.

I think the only options here are do the right thing for each standard library type, or only accept floats. underflow by float conversion doesn’t give “float imprecision” it gives very large discrete differences that are extremely wrong.

3 Likes

Trying to see if I’m following the discussion.

@tim.one you’re bothered that if we always coerce inputs to float using __float__ at the start before doing the 0 comparison then there are some failures that can occur for inputs from standard library numerical types. This is surprising because it is painfully obvious what the right answer is. The specific failures are:

  • Large integers raise an exception
  • Small fractions return 0.0 when they should return +/- 1.0
  • Maybe some similar issues with Decimal but I don’t think those have appeared in the thread yet.

You’re suggesting we put in code to handle these edge cases in a special way while still coercing to float on non-edge cases, so that users get the least-surprising answer for the stdlib numeric types.

Have I got this right?


@oscarbenjamin you are bristling at the idea of adding in this special casing behavior and seem to have zero sympathy for users trying to use math (in your mind float-centric) functions with non-float types. It sounds like you’re ideal, given how python is now, would be to have math.sign that fails on these edge cases, but also have math.integer.sign() that for sure gives correct behavior on int, even big int, also have fractions.sign() that for sure gives correct behavior for all Fraction, and also have decimal.sign() that for sure gives correct behavior for all Decimal.

Have I got this right?

Is your proposal that we introduce all 4 of these functions for this idea? I don’t think it makes sense to introduce math.sign() with the level of type purity that you’re advocating for WITHOUT introducing these 3 other functions. The one alternative would be to use __sign__ like is done with e.g. __ceil__ so that math.ceil can work well even on Fraction inputs but you seem more opposed to that and think it was a mistake to engineer those functions that way so let’s keep that ruled out.


Let’s put the question this way:

@tim.one would you entertain the possibility of ALSO adding math.integer.sign(), fractions.sign() and decimal.sign() in the same PR that adds math.sign()?

@oscarbenjamin If it’s decided that math.integer.sign(), fractions.sign(), and decimal.sign() are definitely not going to be added, then would you be ok with math.sign() doing a little bit of special casing beyond just casting to float so that it gives correct behavior for all (even non-float) standard library type inputs?

1 Like

Nope :wink: But partly. The definition of signum has nothing to do with real arithmetic, just comparison against 0. It’s forcing things to float first that’s the weird special case.

Every type I care about (including float) could be handled with this Python one-liner:

def sign(x):
    return 1 if x > 0 else -1 if x < 0 else 0 if x == 0 else x

which includes sign(NaN) -> NaN. That’s the heart of it. Very simple. For extra credit, people could argue about return type(s), and more code to verity that at most one of the tests succeeds (else the type is insane from the POV of signum’s definition). Coding in C also requires more cruft to check for error returns from comparison attempts.

I’m not clear on where the zeal for using float arithmetic comes from. It’s certainly not a correctness issue. Quite the contrary,.

I assume it’s for speed. C level compares of native machine doubles is certainly much faster than invoking the C API’s spelling of all-purpose comparisons.

I don’t care about speed here, but others might. So I’m also fine with using floats when possible, as an optimization, but not at the cost of correctness.

I expect the Steering Council would reject that, for raising the cost/benefit ratio too high. And @oscarbenjamin is correct that Python people will expect math.sign() to figure out what’s needed by itself. But I think he’s incorrect about people just brushing off grossly wrong behaviors when they occur. This is very far from a “typical” math function, which is continuous and returns an approximation to a real number. This function’s infinitely precise results are less than a handful of mathematical tiny integers, and it’s a “piecewise jump function”. It doesn’t really belong in math, because it’s so unique - but there’s no better place to put it.

That said, I can live with an only float math.sign() too. If would just be ugly to leave other types out in the cold at first. What I can’t live with is a function of this kind that does wrong things.

7 Likes

Ok, can live with that too. I have no veto power anyways.

So you would prefer a sign(x:T) -> int | T type behavior rather than throwing an exception on values where all comparizations with 0 are false?

In the end, it boils down to how its documented, along the lines of Returns the sign of x. This is 1 if x>0, otherwise -1 if x<0, and otherwise 0 if x==0. If none of those comparizations hold (like for math.nan) …. This would make clear that the object needs to be comparable to integer 0 and that dubious cases (like objects that are both >0 and <0) are not specifically handled.

1 Like

For reducing friction and surprise, I think we should follow numpy.sign to the extent possible. Which means:

  • For a float argument, return a float (possibly including NaN), and drop a zero’s sign bit.
  • For seemingly everything else, return an int.
    • Which includes Decimal! But I think should not. See below. That’s the one thing I would change.
>>> from decimal import Decimal as D
>>> import numpy
>>> numpy.sign(D(.5)) # returns an int
1
>>> type(_)
<class 'int'>
>>> numpy.sign(D('inf'))
1
>>> numpy.sign(D('-inf'))
-1
>>> numpy.sign(D('-0')) 
0
>>> numpy.sign(D('1e2000')) # huge is no problem
1
>>> numpy.sign(D('-1e-2000')) # neither is tiny
-1
>>> numpy.sign(D('nan')) # error
Traceback (most recent call last):
    ...
    numpy.sign(D('nan'))
    ~~~~~~~~~~^^^^^^^^^^
decimal.InvalidOperation: [<class 'decimal.InvalidOperation'>]

I don’t know where that’s coming from:

>>> D('nan') # no problem
Decimal('NaN')
>>> float(_) # or converting to float
nan
>>> int(D('nan')) # which certainly can't be converted to int
Traceback (most recent call last):
    int(D('nan'))
    ~~~^^^^^^^^^^
ValueError: cannot convert NaN to integer # but that's a different exception

I am not keen on having, e.g., sign(Fraction) return a Fraction. Seems unnecessarily novel, and the type has no “NaN” concept so there’s no need to return anything other than int. I don’t mind SCTS (special casing the snot) out of float and Decimal to force them to return floats of their types, but it would be nice if that fell out of a more general approach. Which didn’t annoy people by forcing them to endure silly spellings of -1, 0, and 1 (like Fraction(-1, 1))

3 Likes

Fortran users :wink:. They generally weren’t CompSci or “system programmer” types.

I started my career at Cray Research, working mostly on their Fortran compilers, so it has a special place in my heart. My first boss was a rocket scientist - literally!

When Fortran 77 was being designed, he really disliked the fancy new “block IF”. He told me it always tied his brain into knots trying to remember whether “the jump” happened if the expression was true or false.

BTW, for years their compiler was written entirely in Cray assembly language. You really haven’t lived until you spend years trying to write a massive symbol-processing program in the assembly language for a RISC architecture almost entirely aimed at vectorizing floating point operations, and for which “a byte” (in the C sense of smallest addressable storage unit) was 64 bits.

When the maintenance burden became unbearable, we did start over, rewriting in a homegrown variant of Pascal. My old boss wanted to write it in Fortran instead, but by then Cray had hired enough “CompSci types” who realized that things would go better in a language with first-class support for strings :wink:

Still, I remember my first encounter with Fortran in college days. “A DIMENSION statement? What kind of dark magic is that?! Waaay cool, dude” :rofl: .

4 Likes

Personally, my expectations would be as follows. I don’t have actual use cases right now, so feel free to take these comments with that in mind, but I do work on problems where I could imagine the need for a sign function coming up, and these are what I would expect for that. I should note that Tim’s

return 1 if x > 0 else -1 if x < 0 else 0 if x == 0 else x

would suit my needs perfectly, and it’s what I would write[1] if there was no stdlib implementation.

  1. It would always give correct answers given correct input. So “convert to float and don’t worry about underflow” is a non-starter for me.
  2. It would work on at least floats and integers, including integers too large to convert to a float.
  3. I would naturally assume that indexing with the result of sign() would work. Specifically something like lst[sign(x)+1]. I’d be OK with this failing on NaN input - if it mattered, I’d test the input and bail early, though, so the return value of sign(nan) doesn’t matter to me.
  4. I would expect it not to support complex numbers.
  5. I wouldn’t really care about whether it supported other stdlib types (decimal and fraction) although I’d be pleasantly surprised if it did - as long as it followed the previous rules.

I would prioritise correctness over performance - I’m obviously happy to have a faster implementation, but not at the cost of abandoning the expectations above. So by all means have a fast path for float input, but don’t coerce non-floats to float “because it’s faster”.

Regarding point (3) on indexing, I could live with having to explicitly convert to an integer - int(sign(x)) - but I’d consider it a code smell, as sign, in my mind at least, returns an integer value in all cases[2] anyway.

Unlike Tim, I wouldn’t be OK with a float-only version. If that’s what we ended up with, I’d have to write my own implementation for any realistic use cases I’d have. Support for integers (including long integers) is too important for me to ignore.


  1. Possibly without the final else x because I never use or think about nan :slightly_smiling_face: ↩︎

  2. ignoring NaNs ↩︎

6 Likes

If we do end up with a float to float operation, then I believe there’s a case for preserving the sign of a zero. Suppose you’re modelling a continuous odd function f on the reals - i.e., a continuous function f from \mathbf R to \mathbf R satisfying f(-x) = -f(x) for all real inputs x. (Examples are \sin(x), \tan(x), \sinh(x), x^{1/3}, the sinc function, and many many more.) Note that if f is odd and well-defined at zero then we must have f(0) = 0.

For the float version f of f you’ll typically want f(0.0) and f(-0.0) to have opposite signs - e.g., sin(0.0) is 0.0 and sin(-0.0) is -0.0. This convention for handling the sign of zero is well established in Annex F of the C standards and in IEEE 754.

An implementation of f might find it convenient to do the hard parts of the computation (Taylor series, argument reductions, piecewise Chebyshev polynomial approximation, whatever other tricks) only for “positive” inputs (i.e., those with sign bit 0) and then use symmetry to fill in the implementation of f for negative inputs. I’ve used this approach myself many times (there may even be some examples still in the math or cmath source).

With math.sign available and mapping float to float, this approach would be conveniently easy to spell as:

import math

def f(x: float) -> float:
    return math.sign(x) * f_for_positive_inputs(math.fabs(x))

where f_for_positive_inputs does the hard numerical stuff, but now only needs to do that for inputs with zero sign bit.

Now if math.sign preserves the sign of a zero, then this code will automatically do the right thing for an input of -0.0, preserving the usual IEEE 754 conventions. If instead math.sign drops the sign of a zero, there’ll be a corner case to fix up.

Without math.sign available, I might spell the implementation of f instead as:

def f(x: float) -> float:
    return math.copysign(1, x) * f_for_positives(math.fabs(x))

Not quite the same, since for zero x, math.copysign(1, x) is giving 1.0 with an appropriate sign, rather than 0.0, but it does automatically do the Right Thing for negative zero; it would be nice if the math.sign version continued to have this property. As @oscarbenjamin observed, it’s often the case that the value of math.sign at 0 doesn’t matter because we’re multiplying by zero; that’s the case here. However, if we care about signs of zeros in the results, then the sign of math.sign at 0 does matter.

Special case: for non-NaN x, I’d like math.sign(x) * math.fabs(x) to recover x, including giving it the correct sign.

See also the discussion at np.sign(-0.0) should return -0.0, not +0.0 · Issue #29485 · numpy/numpy · GitHub

[Meta: I’m staying out of the wider discussion of what the input and output types of math.sign should be: I don’t have anything to add that hasn’t already been said. The above only applies in the case that the community settles on a float -> float solution.]

3 Likes