Sincos(x) from math.h missing

C has a nice function: sincos(x), return sin(x),cos(x)

In Python math import, we could add:
def sincos(x): return sin(x),cos(x)

In many cases (e.g. rotation matrix) you need sin & cos of same angle.
The advantage would be little speed by 2 reasons: only 1 call vs 2 before, and X86+X64-FPUs can calculate sincos in same time as only sin or cos (about 100 ticks each). Especially the number of function calls should hurt more in Python.

Yes, I checked before:

Thanks for feedback.

of course we should do it in some deeper level (underlying C or ASM code), not just inserting a sincos() on python level to gain something.

For reference: StackOverflow question and NumPy issue.

2 Likes

A nitpicky note: C doesn’t have sincos (it’s not part of standard C). The GNU C library does provide it as an extension.

Rather than messing with sincos (detecting whether it exists in autoconf, doing #ifdefery to switch between platforms that have it and platforms that don’t), the right way to do this sort of thing at C level is probably simply to call sin(x) and cos(x) in close proximity; platforms that are capable of doing so would likely optimize this into an internal sincos call with no additional help. Here’s gcc 12.2 / x64 with “-O2”, for example: Compiler Explorer

For posterity: input is

#include <math.h>

double do_something_with_sin_and_cos(double x) {
    double r = cos(x);
    double s = sin(x);
    return 3.0*r + 4.0*s;
}

and the assembly output is

do_something_with_sin_and_cos:
        sub     rsp, 24
        mov     rsi, rsp
        lea     rdi, [rsp+8]
        call    sincos
        movsd   xmm0, QWORD PTR .LC0[rip]
        movsd   xmm1, QWORD PTR .LC1[rip]
        mulsd   xmm0, QWORD PTR [rsp]
        mulsd   xmm1, QWORD PTR [rsp+8]
        add     rsp, 24
        addsd   xmm0, xmm1
        ret

A 2x performance improvement isn’t a foregone conclusion here; some profiling would be needed. Note that (back in Python-land)

y, x = sincos(theta)

would involve a tuple packing (in the sincos implementation) and a tuple unpacking operation compared to

x = cos(theta)
y = sin(theta)

And I’m not convinced that performance alone would be sufficient reason to add this to the math module; an argument that it made code more natural or more readable would help. (I’m not yet convinced on that front, either.)

3 Likes

FWIW: your do_something_with_sin_and_cos function compiles to the equivalent function on macOS (__sincos [1]). __sincos is a public API with a name in the implementer namespace due to it not being a standard API (as you mentioned already).

[1] nitpicky detail: the assembly code contains a call to __sincos_stret which is an expansion of the inline definition of __sincos.

2 Likes

IMO it would help its use case (most code dealing with rotation) about as much as divmod helps its use cases (things like conversions of total seconds to minutes & seconds).

But it’s easy to define in user code, and the performance overhead is most likely insignificant. If you care enough to need this, you’re probably already using something like NumPy :‍)

NumPy doesn’t have it (yet), which is a good indication that CPython can do without it.

Also, function call overhead is exactly the thing I expect faster-cpython to address over the next few releases.

5 Likes

Yes, that seems like a very apt comparison, and also a telling one. In principle, I love that divmod exists: using it gives me a warm fuzzy feeling that I’m not wasting cycles by computing quotient and remainder twice. But in practice, I find it often represents a readability cost compared to an inline // and % pair, often requiring an extra statement for the divmod result unpacking and making me write code in a way that’s more procedural than functional. And in the end, many of the places that I use divmod don’t actually care much about performance anyway.

3 Likes

Hm, I don’t think there’s a readability cost. I definitely prefer

mins, seconds = divmod(end_time - start_time, 60)

to

mins = (end_time - start_time) // 60
seconds = (end_time - start_time) % 60

The second has the usual non-DRY issues, e.g. if one of the repeated arguments is misspelled I’d not notice it that easily.

But of course, sincos only takes one argument, so its two-line version is not that bad. I’ve definitely written things like:

y = sin(radians(angle))
x = cos(radians(angle))
2 Likes

In C, the speed up can be almost 100%, if the trigonometric functions are slow, and the compiler is not clever anough to do this optimization for you. But in Python the relative performance of cos() and sin() is smaller in comparison of overhead added by Python for bytecode interpretation, operator dispatch, memory allocation for float numbers, and object refcounts management. The benefit of using the combined function may be much less, perhaps only 20% or 10%. The net effect may even be negative, due to the cost of tuple packing/unpacking and use of local variables. It needs to be tested with microbenchmarks, but the performance gain is a weak argument here.

3 Likes