`math.exp` and `math.log` delegate to `exp` and `log` methods?

pf_moore · December 18, 2024, 5:04pm

This sounds very much like extension methods in C#, and Rust traits. So I think it probably could be done in a type-safe way, but maybe not without significant changes to Python’s type system (I have no idea if extension methods have even been discussed by the typing specialists).

mikeshardmind · December 18, 2024, 5:26pm

Unfortunately, it can’t be done in a type-safe way within Python’s type system when looking at python’s numeric types and cases where the type system intentionally disregards incompatibility. (see prior discussions about int | float and where this causes problems for people writing code that interacts with native libraries, or the fact that type checkers allow LSP violations. There’s also issues with bool now that ~ doesn’t work on bools, but bools are ints.)

This wouldn’t be properly expressible in python’s type system without those being fixed or at least adding type negation, and then manually excluding all the types that pretend to be compatible and aren’t.

oscarbenjamin · December 18, 2024, 6:16pm

It should at least be possible to do something like this although I’ve used type:ignore:

from __future__ import annotations
from typing import Protocol, Self
import math

#######################
# genericmath:
#######################

class Context[T](Protocol):
    ...

class TransContext[T](Context[T], Protocol):
    """Common transcendental functions"""
    def sin(self, x: T, /) -> T: ...
    def cos(self, x: T, /) -> T: ...

class HasTransContext[T](Protocol):
    def __get_context__(self) -> TransContext[T]:
        ...

class _math_context:
    def __init__(self):
        self.sin = math.sin
        self.cos = math.cos

math_context = _math_context()

def get_context[T: float | HasTransContext](x: T) -> TransContext[T]:
    if isinstance(x, (float, int)):
        return math_context # type: ignore
    else:
        return x.__get_context__()

def sin[T: float | HasTransContext](x: T) -> T:
    ctx = get_context(x)
    return ctx.sin(x)

To me the ambiguity about ints and floats is more a question of what exactly is the behaviour that you would want from a generic function e.g. when should an integer be treated as an exact number and when should it just be converted to float. Using explicit contexts makes it type safe to mix the ints up with other types like symbolic expressions or rationals or floats or whatever without ambiguity. Otherwise you need to think carefully about what should happen for each individual function.

It is nice to have a generic function like sin(obj) but there will always be cases where it doesn’t do the right thing and the proper thing is to use a context explicitly. Consider e.g. this:

a = np.array([obj1, obj2])
b = np.sin(a)

Should numpy use this generic sin function on each object? What if one object is an int and another is a mpmath.mpf or a symbolic expression or something? Should this convert the int to a float?

I would prefer to be able to arrange it more like this:

a = np.array([obj1, obj2])
b = ctx.sin(a)

Or perhaps even this as Neil suggested before:

a = np.array([obj1, obj2], context=ctx)
b = np.sin(a)

Now I have full robust control over what the behaviour should be through the context. If I am doing this in a function that takes an explicit context argument then a caller of that function can propagate their control through and so on. I can also change the behaviour without needing to change the types e.g. I can use types that I don’t control with a context that I make myself.

My point is that while there should be sin(obj) I think that should mostly be thought of as a convenience. The basis of robust type-safe generics should be interfaces that allow the use of explicit contexts in the first instance.

oscarbenjamin · December 19, 2024, 12:02am

Sorry, I didn’t mean to imply that it is not thread-safe or anything like that.

The point I was trying to convey is that even np is a context object in some sense. You don’t have the option to create multiple contexts but it is still an object with mutable state much like a decimal context. The fact that you can use np.sin etc without thinking about contexts is at least in part because global variables are used instead of context objects.

With decimal you can create multiple contexts and use them side by side like:

>>> from decimal import *
>>> ctx_dn = Context(rounding=ROUND_DOWN)
>>> ctx_up = Context(rounding=ROUND_UP)
>>> ctx_dn.divide(1, 3)
Decimal('0.3333333333333333333333333333')
>>> ctx_up.divide(1, 3)
Decimal('0.3333333333333333333333333334')

That gives us an upper and a lower bound for 1/3. It is useful for this sort of thing to be able to have the multiple contexts there without thinking about mutating any global (even if thread-local or async-safe) state.

The options provided by np.errstate are more limited so there is less of a need for this but in generic programming ideally you don’t want to have to think about the differences between the possible realisations of the generic types. Having local independent context objects and pure functions helps with this.

Even better than having multiple contexts is if they are each immutable. Most Python libraries that have contexts with configuration options do not have immutable contexts even though it would be very useful (e.g. for caching).

jagerber · December 19, 2024, 12:09am

It would be nice if adoption of this “scalar” API allowed automatic interoperability with the array API. So that

import numpy as np
from uncertainties import ufloat
from genericmath import sin, sqrt

u_arr = np.array([ufloat(1, 0.1), ufloat(2, 0.2), ufloat(3, 0.3)])
result = np.mean(sqrt(sin(10 * (u_arr)**2 + 0.2 * u_arr)))

“just works”. This makes me wonder if there should be a generic mean function (which accepts an iterable or sequence of scalars) in the genericmath scalar API.

oscarbenjamin · December 19, 2024, 12:14am

Justin Gerber:

So that

import numpy as np
from uncertainties import ufloat
from genericmath import sin, sqrt

u_arr = np.array([ufloat(1, 0.1), ufloat(2, 0.2), ufloat(3, 0.3)])
result = np.mean(sqrt(sin(10 * (u_arr)**2 + 0.2 * u_arr)))

“just works”

Does it not currently work?

oscarbenjamin · December 19, 2024, 12:19am

Okay, I see np.sin fails for this array:

In [11]: u_arr
Out[11]: array([1.0+/-0.1, 2.0+/-0.2, 3.0+/-0.3], dtype=object)

In [12]: np.sin(u_arr)
...
TypeError: loop of ufunc does not support argument 0 of type Variable which has no callable sin method

jagerber · December 19, 2024, 12:21am

Yes, in the current version of uncertainties it doesn’t work. In some working branches I have found that

from uncertainties import umath


class UFloat:
    ...
    def sin(self):
        return umath.sin(self)

gets it to work which is good but a bit clunky. I actually wonder if there is an existing way to do this better currently, short of the existence of a generic math package and scalar API.

This is what I came up with after researching numpy ufuncs. But I have been confused about how to apply the numpy documentation on ufuncs and subclassing numpy arrays to this case.

oscarbenjamin · December 19, 2024, 12:24am

If your goal is just to get np.sin etc working then numpy looks for a .sin method. You said that works so is there something else missing?

jagerber · December 19, 2024, 12:25am

No it’s good. I’m just curious if what I’ve come up with defining a .sin method on the object is the “proper” way to do this.

And then I’m also calling out that it would be useful if any proposed scalar API/generic math pacakge (together with the array API) explicitly supports interoperability between scalars and arrays of this type.

oscarbenjamin · December 19, 2024, 12:29am

The answer is yes and no. Yes, it will make your thing work right now. No, it isn’t a “proper” mechanism because e.g. (1).sin() doesn’t work and nothing apart from numpy will use your sin method.

Agreed. We have the array API any new convention along these lines should definitely align with it and be compatible with it.

ngoldbaum · December 19, 2024, 1:09am

FWIW, if you wrote a dtype using the DType API (a tall order, I know) this example would work.

Part of the DType API is that each DType has a scalar type it’s associated with. So items of arrays with a dtype set to an uncertainty dtype would be instances of your ufloat type.

oscarbenjamin · December 19, 2024, 1:31am

A lot of things that I have referred to could potentially work using custom dtypes. No one has yet suggested that though so I presume there are some reasons why that might not be a good approach.

Are there limitations or complexity issues that make the custom dtype approach difficult?

I think I would still like something more general than numpy’s dtype system (at least I want to use it separately from numpy) but if something more general was there then I assume it could easily be adapted to provide a dtype and then work with numpy.

ngoldbaum · December 19, 2024, 1:39am

I just wanted to provide a suggestion that could work today and would also be substantially faster than object arrays.

I think the big issue with the DType system is that there’s a bit of a complexity cliff since there’s no python API. Most people aren’t comfortable using the NumPy and Python C APIs directly. In principle there could be a Python API (NEP 42 is written assuming there one day will be) but getting it working would be a fair amount of work.

That said ultimately I suspect that for uncertainties a custom DType really is what you want. Items of the array would be a pair of floats (the value and uncertainty), so everything would be contiguous in memory too.

For a custom DType you really need a good understanding of the memory layout of the data you want to store in the array.

That said, it occurs to me that this is all also related to the issue I’ve run into where there’s no good way to share metadata about custom DTypes. The buffer protocol does this for NumPy’s builtin DTypes but the list of DTypes supported by the buffer protocol is set in stone and would require a PEP to change.

Your context idea is a bit more extensible and general. The DType metadata could live in the context that gets passed around with the data.

oscarbenjamin · December 19, 2024, 1:48am

Okay so to be clear this approach does not extend to the array elements being arbitrary Python objects. That matches my previous understanding that even custom dtypes have to resolve down to at most a structured array of elementary dtypes (no dtype=object here).

ngoldbaum · December 19, 2024, 2:01am

You could have an array of references (StringDType works like this), but that introduces a lot of headaches because you become responsible for the heap bookkeeping.

If you really want the scalar to be an arbitrary python object then a custom DType only really buys you runtime type checking over just using object dtype.

That said, adding a Python API to NumPy that lets people define a sub-dtype of object dtype that is more or less object dtype but with runtime type checking and proper dispatch, that would probably would be useful even if it wouldn’t be fast.

jagerber · December 19, 2024, 2:04am

The idea of a custom DType DID cross my mind when reading through the numpy documentation but I’m not “comfortable using the NumPy and Python C APIs directly.” Specifically I do not know C. So yes complexity cliff stops that idea for me (likely for the other maintainers of uncertainties also.)

oscarbenjamin · December 19, 2024, 2:08am

I think that is where we are going here. Note that when you say “it wouldn’t be fast” I’m thinking of a very different performance regime from what you are probably thinking of where the size/shape of arrays is small but the elements are potentially very large (e.g. megabyte/gigabyte/blow-all-the-memory large).

ngoldbaum · December 19, 2024, 2:25am

I created a NumPy issue to track this idea: Python API for object dtype subtypes · Issue #28037 · numpy/numpy · GitHub

I would happily mentor someone who wants to work on this or any other improvement related to dispatch or the DType API. It will take some C knowledge though.

`math.exp` and `math.log` delegate to `__exp__` and `__log__` methods?

`math.exp` and `math.log` delegate to `exp` and `log` methods?