I think it’s easy to write the math protocol and implement the methods, e.g. CoPilot seems to work from a quick play. The math protocol can be incrementally implemented, e.g. start with float and int. It doesn’t break old code.
Perhaps something similar to NumPy’s __array_ufunc__ or __array_function__ “universion function” (ufunc) API could be considered?
https://numpy.org/doc/2.1/user/basics.subclassing.html#array-ufunc-for-ufuncs
https://numpy.org/doc/2.1/user/basics.interoperability.html#the-array-function-protocol
I’ll be honest, I don’t exactly understand all the various proposals here. But by way of example I want to share the uncertainties package. This package exposes a UFloat type which models a random variable. The UFloat has a mean value and a standard deviation. Users can perform mathematical operations like addition, multiplication, trig functions, etc. on and between UFloats. The resulting UFloats will have the mean values you expect based on the math functions, but the new standard deviation will be the standard deviation calculated according to the rules of linear error propagation.
uncertainties basically patches each function from the math module into a function in a uncertainties.umath module. If users want to take the sqrt they have to know about and import this special umath module. from uncertainties.umath import sqrt. Fortunately uncertainties.umath.sqrt behaves nicely when called on regular float so users can get by replacing from math import sqrt with from uncertainties.math import sqrt, but it’s a little bit annoying.
On a feature branch, I was able to get UFloat to play nice with np.sqrt by defining a sqrt method on the UFloat class. This is pretty nice.
However, numpy is only an optional dependency of uncertainties. So if we want to support users without numpy we have to define umath.sqrt. But if we want to support users with numpy to use np.sqrt, we have to define UFloat.sqrt. It is annoying that we have to make the definition in these two places. It is also annoying that there is no way to get math.sqrt to work.
I’m coming here to say that, for me, it would be nice if we could define a hook like __sqrt__ that would allow math.sqrt, np.sqrt, scipy.sqrt, etc. to just work with UFloat like they already work for float.
And for concreteness, yes it would basically be helpful for me to see hooks for each of the functions in the math module documentation under the
- Power, exponential and logarithmic functions
- Angular conversion
- Trigonometric functions
- Hyperbolic functions
- Special functions
headings.
This is what the __array_ufunc__ and __array_function__ protocols are for: Standard array subclasses — NumPy v2.3.dev0 Manual, Standard array subclasses — NumPy v2.3.dev0 Manual.
From practical point of view, delegating implementation of math functions to special methods will add significant overhead, because it will need to look up the method name in the type’s dict and perform general function call. For many math functions which are simple wrappers around C floating-point functions the overhead can be tens percentes.
For builtins and operators which use special methods (__add__, __bool__, etc) this is mitigated by using “slots” in the type object. But filling slots is the main contributor to class creation time. Adding 40 new slots to existing 83 slots can increase the time of creation any class by 50%. This will correspondingly increase the import time.
For NumPy arrays this is mitigated by the fact that operations are usually performed on large homogenous arrays.
If this was done for the math module then it could of course have a fast check for float or int. I assume that it already does a fast check before calling __float__. The current math module calls __float__ on everything and changing that would be a compatibility break. It would need to be a new module for generic functions rather than the existing math module.
Note also that it is very commonplace to access math functions from a module object like np.sin, sympy.sin etc. The reason that this is done rather than from numpy import sin is because there is a need to distinguish the different incompatible sin functions from different modules. Mixing these up is a very common source of headaches for novice users. That means though that every time you call any mathematical function you have the runtime and visual overhead of the lookup on the module object:
import numpy as np
x = np.exp(np.real(y)) * np.sin(np.imag(y))
If you had generic functions that could work for all the different types then you would not need to use the module namespace like this:
from genericmath import exp, sin, real, imag
import numpy as np
from sympy import Symbol
y = np.array([1, 2, 3])
x = exp(real(y)) * sin(imag(y)) # array
y = Symbol('y')
x = exp(real(y)) * sin(imag(y)) # symbolic expression
Also talking about performance here misses the point. Python is slow for these things when working with individual floats rather than arrays so anything that uses the math module is slow. If you want fast scalar calculations you will need to somehow compile the code and there are lots of approaches for this like numba, numexpr, etc. The compilation step would be far easier if you could analyse the function or mathematical expression symbolically rather than analysing the Python code as numba does. What that means is that you want to have something like:
from genericmath import exp, sin, real, imag
# This function would be defined in numba:
def numba_compile(func):
y = numba_symbol('y')
result = func(y) # symbolic representation
return numba_llvm_compile(result)
@numba_compile
def func(y):
return exp(real(y)) * sin(imag(y))
Various libraries have implemented different workarounds for the lack of generic mathematical functions. Currently numba does this by analysing the AST which is fragile and then by special-casing things like np.sin. The autodiff library for example has implemented a context manager that monkey patches the numpy API. With jax you have to uses the special jnp module object.
SymPy has various ways to provide the capability to make a symbolic expression and then compile that into code or into a function that can be evaluated e.g.:
import sympy as sym
y = sym.Symbol('y')
# Symbolic expression:
eq = sym.exp(sym.real(y)) * sym.sin(sym.imag(y))
# Callable function:
f = sym.lambdify([y], eq, modules='numpy')
print(f(1))
f2 = sym.lambdify([y], eq, modules='mpmath')
This is supposed to make it easy to evaluate the same mathematical expression with different numerical backends and can be used in combination with numba etc. The way that this works is flakey and not scalable though: it assumes that your module object has the right functions and needs a lot of special casing to handle differences between numpy, mpmath etc.
If there were generic math functions then this could be done more reliably and extensibly. That is exactly how it works with symbolic expressions in Julia. The symbolic expression exp(x)*sin(x) really just represents the generic functions exp and sin applied to some as yet unspecified object x. You can substitute arrays, numbers, etc in for x and the generic exp and sin functions can dispatch to the appropriate methods. This provides a scalable, extensible approach for reusing functions, expressions etc across automatic differentiation, compiling to machine code, evaluation over arrays, GPUs, multiprecision, etc.
I find this to be a pretty strong argument, but there’s nothing about it that I can see which prevents the genericmath library from being a 3rd party project on PyPI. Did I miss something that means this needs to be in the stdlib? If not, then why isn’t it already available? Is it simply because no-one has thought of it before, or are there fundamental issues?
I don’t have a personal need for this, I just think generic functions in general are cool, and I’d like to see more use of them ![]()
It doesn’t need to be in the standard library; someone needs to define and publicize the protocol and then libraries need to adopt it. For NumPy doing the work of definining the __array_function__ protocols was worth it, because suddenly external libraries could work around numpy limitations their users had been hitting for years. Things like np.where(astropy_quantity_array) returning a numpy array with units stripped with no way for users to fix it without manually re-adding the units on every np.where call. And then repeated for lots of other functions in the numpy API that aren’t ufuncs.
For overriding the math module or adding a genericmath module the benefit is a little less clear, since users are already working around this limitation by using np.sin or sympy.sin as needed and there isn’t any big missing feature that adding the protocol would allow besides improving code clarity sometimes.
Still, as a NumPy maintainer I’d probably approve a PR adding a __math_namespace__ or whatever protocol gets decided on to support this, assuming the overhead is acceptably small for existing users.
I guess making it “official” and supported by CPython makes it easier to publicize and get projects on-board, but that’s not a good reason for CPython to do it.
That already exists: It’s the Array API.
100% agree with you.
I think the compilation looks a bit nicer in Jax:
import jax
@jax.jit
def eq(y):
return y.real * y.imag
print(eq(1)) # Compiles and runs. Subsequent calls use compiled code
Totally agree with you. I think this is just the Array API.
But this isn’t what Matt’s proposal above is asking for.
SciPy and therefore SciPy’s distribution library already has plans to be built on the Array API. That means that they will benefit from the various JITs, and various Array API clients. What “built on” means here is that the various functions (e.g., pdf) would have their implementations use the Array API.
I think what Matt wants is another set of functions that can be applied to distribution objects. E.g.,
from scipy.distribution import Gamma, InverseGamma
g = Gamma(alpha, beta)
h = pow(g, -1)
reveal_type(h) # InverseGamma
You cannot use Array API’s pow function here because these distribution objects don’t implement the Array API’s minimum interface.
Brénainn’s suggested having a separate API for these functions. This is the simplest solution.
The proposal above suggests hooking into Python’s functions. This has the benefit of being able to write functions that accept both numbers and distribution objects.
Maybe a more idealistic solution would be to see if the Array API could incorporate a more basic type than the “array”, which would allow distribution objects to be passable to some of the Array API’s math functions. This would allow you to write functions that accept numbers, arrays, or probability distributions (since probability distributions can be vectorized according to their design, it would make sense to mix them in computations with arrays).
I don’t think that it needs to be in the stdlib but it would be better if it was. Regardless it cannot practically the enter the stdlib without existing first and proving itself as a third party package.
I think it just requires a lot of coordination. It needs to begin with some agreement between maintainers of different libraries that the idea is worthwhile. Historically the focus has been all about wrapping numpy or having compatibility with numpy.
Then there needs to be a well designed and specified interface that projects can agree on. Historically that kind of coordination has not generally been there except via the stdlib and the various dunder methods that exist there. The recent example of the array API shows how this kind of third party coordination can work though although it is all focused on arrays (i.e. numpy compatibility).
If you don’t think that there is a big missing feature then I think that you might be lacking the imagination to see the kinds of things that this would end up being used for. Here is a simple example:
In [19]: import sympy
In [20]: import numpy as np
In [21]: x, y = sympy.symbols('x, y')
In [22]: a = np.array([x, y])
In [23]: a
Out[23]: array([x, y], dtype=object)
In [24]: np.sin(a)
...
TypeError: loop of ufunc does not support argument 0 of type Symbol which has no callable sin method
I notice that you have suspiciously not called any mathematical functions like sin here unlike the example I showed. Since it is relevant let me show how this is done:
import jax
import jax.numpy as jnp
@jax.jit
def eq(y):
return jnp.exp(y.real) * jnp.sin(y.imag)
I agree that this looks nice but notice that as soon as we need mathematical functions we have to write the code in a way that only works with jax. You can’t write out a big formula in your eq function and then reuse the same function for many different things because it either needs to be a jax function or a numpy function or a sympy etc function.
The sympy interface is different because it has a different purpose and then hits the interoperability problem in a different way. You want to derive the expressions that would go in the function eq rather than writing them explicitly in the code and then you want to use the expression for many different things like display the expression, evaluate etc:
In [2]: e = integrate(exp(-x)*sin(2*x + 1), x)
In [3]: e
Out[3]:
-x -x
ℯ ⋅sin(2⋅x + 1) 2⋅ℯ ⋅cos(2⋅x + 1)
- ──────────────── - ──────────────────
5 5
In [4]: f = lambdify([x], e, 'jax')
In [5]: f(1)
Out[5]: Array(0.13529614, dtype=float32, weak_type=True)
It is difficult though for sympy to make this work across all backends like jax, numpy, mpmath etc because there is not a consistent API or agreement on the exact naming of functions and their arguments and where to import them from. Each of these backends needs to be hard-coded explicitly in the sympy codebase.
The analogue of lambdify for SymbolicUtils.jl in Julia is almost trivial to implement because every function like exp, sin, cos etc dispatches. In Julia it would not be necessary for either jax or sympy to know anything about each other but it would still be easy to turn a sympy expression into a jax function just by substituting a jaxpr for x into the expression. The only thing connecting sympy and jax in that situation would be the generic functions that they share the use of.
That’s not true anymore thanks to the Array API! Your example could be written:
import jax
import numpy as np
from array_api_compat import get_namespace
def eq(y): # One definition to rule them all!
yp = get_namespace(y)
return yp.exp(yp.real(y)) * yp.sin(yp.imag(y))
print(jax.jit(eq)(1.0)) # With Jax.
print(eq(np.ones(()))) # With NumPy.
The Jax code actually produces the same expression graph as SymPy. It just specifies it in a different way. It’s a matter of taste which is better.
There is a consistent API: It’s the Array API. You can use it will all of its adopters:
Traceback (most recent call last):
File "t.py", line 3, in <module>
print(get_namespace(1))
^^^^^^^^^^^^^^^^
File ".venv/lib/python3.12/site-packages/array_api_compat/common/_helpers.py", line 560, in array_namespace
raise TypeError(f"{type(x).__name__} is not a supported array type")
TypeError: int is not a supported array type
The problem is that Python scalars are not an adopter of the Array API. You need to use an array from one of the adopters. You could use numpy.asarray(1) if you want a scalar integer.
I think the point is that Oscar knows all of this, and this thread isn’t about the array API. It’s about scalars.
I’m not sure where you’re getting that? I think this topic is about operations on probability distribution objects that can polymorphically handle scalars and arrays.
I’m responding to recent comments have been talking about “generic functions that could work for all the different types” that seem to be neglecting the existence of the Array API that has (in my opinion) solved writing universal functions that process arrays.
I think we should refocus the thread on writing functions that support both arrays and probability distributions. I liked the proposed simple solution (have separate interfaces for now), and I suggested an idealistic solution that might be a good target for later (unify the interfaces):
Yes, exactly. The array API is a big improvement.
It is however only for arrays, does not work with int and float and has a fixed list of dtypes that the arrays can have that are all fixed width machine types. Using dtype=object was intentionally left out. There is no scope for symbolics, multiprecision, validated numerics etc. Going beyond basic types brings in other questions like context management and so on.
The set of mathematical functions in the array API is also very limited compared to those that are provided by scipy, sympy, mpmath, etc. It doesn’t even include all the functions that are in the math module. Many of the incompatibilities across different math APIs arise when you start looking at less commonly used mathematical functions like the arguments might be in a different order or the domain (integer vs real vs complex) might not be compatible across different functions.
I don’t mean this as a criticism of the array API: it has a clear scope and purpose and as a result it brings something useful. However many things that were out of scope lead to very similar reasons for wanting extensibility and interoperability.
The other thing that is missing is the actual generic functions like sin and cos. Where do I import those from without depending on any particular array library?
Yes, right.
It does not have a fixed list of dtypes. Those are the minimum types that must be supported by adopters. Plenty of libraries add their own dtypes. Jax, for example, adds bfloat16.
As I already mentioned, the Jax code I showed builds a symbolic graph. (It does this by converting the parameters into tracer object that build an expression graph.) Most of the adopters can build symbolic graphs (TensorFlow and Pytorch definitely can). You can absolutely have symbolic computations with the Array API.
If you think it would be fruitful to extend the Array API, you could propose your ideas on their Github?
Yes, however:
- The Array API is evaluating a special function extension that looks like it will be accepted eventually, and
- SciPy will eventually become a provider of its functions to all (or maybe just many) Array API adopters.
The Array API is new and while they don’t have all the features you may want, the benefit of their patience is a (in my opinion) phenomenal design.
sin and cos are already in the specification. You can use them like so:
from array_api_compat import get_namespace
import numpy as np
def f(x):
xp = get_namespace(x)
return xp.square(xp.sin(x)) + xp.square(xp.cos(x))
print(f(np.asarray(0.4))) # 1.0
I don’t think excluding scalars in general, and non-fixedwidth scalars as array items in particular, is unreasonable given the purpose of the array API. With that said, it’s clear something here isn’t enough to cover everything people need.
I don’t think this is going to be resolvable without functions that do multiple-dispatch though because there are some fundamental problems with python’s numeric tower that will prevent this from having good outcomes if these need to interact with being included on the objects themselves. This is one of the places where python automatically promoting specific types in some calculations, rather than giving a domain error (forcing the developer to explicitly opt into that domain) complicates matters.
That is not what I mean by symbolics. In those expression graphs the symbols are all arrays. I want to have an array of symbolic expressions rather than a symbolic representation of array operations: I want the dtype of the array to be “symbolic scalar expressions”.