Static typing: Specify if function is pure

This thread is an good example of how people speak of differing qualities of what they consider to be a pure function:

  • Does not mutate its input arguments
  • Returns the same value every time given the same input
  • Does not use a cache
  • Allows optimization by the interpreter
    (and I’m sure I’ve missed many more)

Many of these are orthogonal to each other and the notion of pure versus non-pure doesn’t have enough “nuance” to capture all of them. One might say practicality beats purity, but what counts as practical depends on which use case one is looking at.

2 Likes

You only need one :

  • Does not alter its context

You haven’t defined this and there is a wide range of what it can mean for different people/purposes. Take into account that to evaluate a function at least some state of the machine will have to change.

2 Likes

Is not altered by changes to its context.

3 Likes

I just ran into the list vs Sequence issue again adding annotations for this in python-flint:

>>> import flint
>>> p = flint.fmpq_poly([1, 2, 3])
>>> p
3*x^2 + 2*x + 1

The signature here is something like:

class fmpq_poly:
    def __init__(self,
       other: list[fmpq | fmpz | int] | fmpq_poly | fmpq | ...
    ):
        ...

The runtime implementation in Cython actually does require a list here and checks for that to distinguish from other possible input types:

cdef class fmpq_poly:
    def __init__(...)
        ...
        if typecheck(p, fmpq_poly):
            fmpq_poly_set(self.val, (<fmpq_poly>p).val)
        elif typecheck(p, fmpz_poly):
            fmpq_poly_set_fmpz_poly(self.val, (<fmpz_poly>p).val)
        elif isinstance(p, list):
            fmpq_poly_set_list(self.val, p)
        elif (v := any_as_fmpq(p)) is not NotImplemented:
            fmpq_poly_set_fmpq(self.val, (<fmpq>v).val)
        else:
            raise TypeError("cannot create fmpq_poly from input of type %s", type(p))

cdef fmpq_poly_set_list(fmpq_poly_t poly, list val):
    cdef long i, n
    n = PyList_GET_SIZE(val)
    fmpq_poly_fit_length(poly, n)
    for i from 0 <= i < n:
        c = val[i]
        x = any_as_fmpz(c)
        if x is not NotImplemented:
            fmpq_poly_set_coeff_fmpz(poly, i, (<fmpz>x).val)
            continue
        x = any_as_fmpq(c)
        if x is not NotImplemented:
            fmpq_poly_set_coeff_fmpq(poly, i, (<fmpq>x).val)
            continue
        raise TypeError("unsupported coefficient in list")

Note that the Cython type declaration list val actually does require a list. Even a subclass of list will raise an error there. There is no duck typing with C functions like PyList_GET_SIZE.

The parameter annotation list[fmpq | fmpz | int] is incompatible with passing e.g. a list[int] argument due to invariance even though we don’t mutate the list. Given a union of N types you would need to spell out 2**N - 1 possibilities for the parameter:

list[fmpq] | list[fmpz] | list[int] | list[fmpq | int] | ...

In this case N = 3 so there are 7 combinations but there are other poly types not yet annotated where it would be something like N = 8 so 255 combinations. I also want to add a generic Protocol for these types in which the parameter would be generic like list[T_coerce] meaning that it is impossible to enumerate the combinations.

I considered whether this could be changed to Sequence[fmpq | fmpz | int] but there needs to be a way to check for a Sequence and then convert it to a list. There is no good way to check for a Sequence at runtime though. If we are going to convert the Sequence to a list then all we need is __iter__ so I considered using Iterable[fmpq | fmpz | int] but then I realised that e.g. dict or set would be allowed and those don’t make sense here. Also for some other poly types it could make sense that the coefficients (e.g. fmpq) might be of an iterable type (or they might even be a sequence) so checking iterability could be ambiguous at runtime.

There is sort of a way to check for a Sequence using collections.abc.Sequence which compares against a list of registered types. It is quite slow though because of the metaclass __instancecheck__:

In [6]: %timeit isinstance((), tuple)
39.3 ns ± 0.0968 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [7]: %timeit isinstance((), Sequence)
804 ns ± 2.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

That is 20x slower in Python but in the Cython generated C code I think isinstance(p, list) would be quite a bit faster while isinstance(p, Sequence) would still be almost as slow.

In the end I decided to use Sequence[fmpq | fmpz | int] as the type annotation even though the runtime will only accept list. That means a type checker will allow some things that would fail at runtime (false negatives) but it gets across the covariance that is needed here. There should be a better way to handle this though.

There is a way to handle that situation. If you want list[X] with X behaving covariantly you can do,

T = TypeVar('T', bound=X)

def foo(x: list[T]):
  ...

Or more specifically T = TypeVar('T', fmpq | fmpz | int) will handle your use case.

3 Likes

While this works, there’s enough confusion over variance that calling this “behaving covariantly” is likely to lead to more headaches than useful definition. Within foo, it’s still invariant.

To actually treat it covariantly, while only allowing the operations that are safe while doing so requires more expressiveness than python has. There’s prior art in Kotlin, specifically Kotlin’s generic variance annotations (relevent here, the out modifier) and type projections

3 Likes

Thanks, I gave that a very quick test and it seemed to work.

Yes, so this is still not quite capturing the meaning. The code in foo does not use the methods that make list invariant. There is a subset of the interface of list that is covariant in the element type and only that is being used. It should be possible to express that somehow. I think that type variables should not be needed to do this for something as common as having a list as a function parameter. Yes, Sequence is sort of this but it is not well defined at runtime and sometimes the type should actually just be the nominal list type.

1 Like

This thread gave me the idea of creating a POC mypy plugin to check statically if a function has side-effects. Of course, given the Python dynamic nature you could bypass this check, but it is a matter of documenting more than enforcing. Please check it out!

2 Likes