The Inner Logic of csignum-fast
From the very beginning, this project was designed to be type-agnostic. You won’t find any literal < or > operators in my C code. All comparisons are performed using the internal methods of the argument itself—its native __gt__, __lt__, and __eq__ implementations. In CPython, this is achieved via PyObject_RichCompareBool(x, Py_zero, Py_GT) and its counterparts for LT and EQ.
This approach works for any numeric type right out of the box. My sign function seamlessly handles Decimal, Fraction, and sympy numbers, even though I didn’t write a single line of specialized code for them. If someone implements an UltraDecimal or ExtraFraction a hundred years from now, this program won’t need a single change, as long as those types follow the numeric protocol.
The Trade-off: There is a minor overhead for converting Py_zero (Pythonic int(0)) to the type of x. A small price to pay for universal type independence.
The “Ternary Logic” Challenge
If you use PyObject_RichCompareBool for comparisons, I have two pieces of news for you.
The Good News: The function returns a standard C int. Therefore, the calculation of (x > 0) - (x < 0) is performed without additional type casting or complex conversions.
The Not-So-Bad News: The result can be 1 (True), 0 (False), or -1 (Error). This is tricky because, without proper handling, you might get results like +2 or -2, or even conclude that sign("error") = 0. However, it’s “not-so-bad” because the argument itself handles the type validation. If the argument cannot be compared to zero, it will report the error to you.
To handle all possibilities, I implemented an Index of State (state_idx):
- I normalize the comparison result to 2 (True), 1 (False), or 0 (Error) by simply adding
1 to the output of RichCompareBool.
- I multiply the results of three comparisons (GT, LT, EQ). The result is either 0 or a power of two: 0, 1, 2, 4, 8.
- I then apply a bitwise
& 3 operation (which is equivalent to % 4 but significantly faster) to determine the final state.
State Index Interpretation:
- Index = 2: The ideal case. One “True” (2) and two “False” (1). The argument is a valid number; the result is
gt - lt.
- Index = 1: Three “False” results. This suggests the input might be
NaN. I perform an additional x != x check to confirm.
- Index = 0: (The
mod 4 result of 0, 4, or 8). This indicates either a comparison error (0) or a logical contradiction (multiple “True” flags). This is treated as a non-numeric type error, and an exception is raised.
Micro-Optimizations: Switches vs. Multiplications
Modular arithmetic tells us that (A * B * C) mod 4 == ((A * B) mod 4) * C mod 4. In the code, I replaced “heavy” multiplications with a switch, bitwise shifts (<< 1), and bitwise AND (& 3).
There are two such switches: for lt and for eq. If the value of the multiplier is 0, I go directly to error handling. If the value is 1, I don’t need to multiply, and I simply issue break;. To multiply by 2, I use bitwise shift, and then bitwise AND to truncate index mod 4.
Multiplication consumes more CPU cycles than these bitwise operations and jumps. By providing explicit optimization hints (accounting for 12% of the total code), I ensure the compiler generates highly optimized machine code. Furthermore, this branchless/low-jump sequence is more likely to stay in the CPU cache, further accelerating execution.
For more technical details, please refer to the comments in the csignum-fast source code.