Typing custom integer class

I am looking for guidance on how to correctly type a custom integer type for static type checkers.

I (or to be precise the SageMath project) have a Cython-implemented Integer class that behaves like an int and is intended to be usable wherever an int is accepted, but it does not actually inherit from int. Declaring it as class Integer(int) in a .pyi stub would therefore be dishonest.

I have considered several options, none of which feel entirely satisfactory:

  • Using SupportsInt/SupportsIndex is honest, but usually too weak: it does not allow Integer to be accepted by existing APIs annotated as int.

  • Using numbers.Integral would imply nominal inheritance that does not exist unless I explicitly register the type at runtime, and there are various issues with the numeric tower (see eg Avoid the builtin `numbers` module. Ā· Issue #144788 Ā· pytorch/pytorch Ā· GitHub)

  • Defining a custom Protocol (e.g. an ā€œIntLikeā€) is structurally correct, but does not help with third-party APIs expecting int.

  • Similarly, using a union alias (int | Integer) is explicit but only helps with our own code, not third-part APIs.

Is there an established or recommended way to model ā€œint-like but not an intā€ types so they interoperate well with static typing? If I accept that Integer will not be accepted by external APIs annotated with int, which of the other options (numbers.integral, custom protocol or custom alias) is the preferred option?


Does it actually work to pass Integer to these APIs? If so, is it because they do int(x) or index(x)?

If those APIs actually accept SupportsIndex then they should use that as the annotation.

If the APIs actually expect an int without validation or conversion then you should pass the actual int type and the conversion goes on the caller side:

def api_func(x: int) -> int:
    ...

def my_func(x: Integer) -> Integer:
    return Integer(api_func(int(x)))
2 Likes

Hi Tobias. I’m a big fan of SageMath.
How much other Integer specific machinery is there? Can many functions in SageMath be typed as generic (with a bound to an int like protocol)?

When you use the SageMath frontend the preparser turns int literals into Integer:

>>> type(1)
<class 'sage.rings.integer.Integer'>
>>> type(1/2)
<class 'sage.rings.rational.Rational'>
>>> preparse("1/2")
'Integer(1)/Integer(2)'

So the Integer type is everywhere.

1 Like

Oh yeah.

@tobiasdiez I know it’s a huge, mature library so I’m not suggesting a breaking change at this point in time. But class Integer(int) should be a quick change to make for the purposes of experimentation. Does the test suite pass in a fork with that change? And do Python’s tests pass with int=Integer? If not, then should Integer really be typed as int at all?

1 Like

Thanks for the comments.

We do have to handle Integer as a valid parameter in almost all places (as Oscar said, the preparser converts literal integers into Sage’s Integers), but this could be handled by either using SupportsXyz or Union[int | Integer].

But currently at least, I’m more wondering how to handle external APIs that expect int in their typing annotation. This problem is also not restricted to Sagemath, as the following example using numpy shows:

import numpy as np

n = np.int8(2)

for k,v in enumerate(['a', 'b', 'c'], n): # both ty and pyright complain that n is not of type ā€˜int’
    type(k) # int
    print(k, v)

(Perhaps this is actually a ā€˜bug’ in the typeshed declaration as start most likely can be SupportIndex)

Or suppose an API declares

def func(y: int) -> str:

    return str(y + 1)

then calling it with an numpy int is working fine without any conversions but the typechecker will complain. It’s also not straightforward to annotate that function with full flexibility (y needs to support adding a int literal, and the returned type has to be SupportsStr).


I should also say that the typing annotations in the SageMath project are at a very early stage and pretty minimal. So this question is at the moment more theoretical and we don’t know actually know often the typechecker will complain that we try to pass a Sage integer to an external library…

2 Likes

I have no experience with SageMath and don’t know how compatible Integer is with int (i.e. does isinstance(x, int) work? Does type(x) == int? That said, Integer(int) might be the most pragmatic solution, even though it is a lie (and I don’t like these kind of lies). Another solution might be lying about __new__:

class Integer:
    def __new__(cls, ...) -> int: ...

Yes, I would say that is a bug in the typeshed stubs. You can see that it very deliberately uses index in the CPython code:

Unfortunately that kind of thing is common but I assume that a simple PR to typeshed would be welcome.

This can go several different ways. Suppose I am the author of libfoo and you want to call libfoo.func and pass in an Integer which apparently works right now. If I have annotated that as int then it is potentially because I don’t want you to pass Integer in. I am potentially reserving the right to make changes in future that would break your unsupported use of the function.

The fact that the function currently happens to work with any y that can add with an int and then convert to str is not necessarily a guarantee that I want to provide. If I did want to allow more types then I would have picked a suitable type and used that:

def func(y: SupportsIndex) -> str:
    y = index(y)
    return str(y + 1)

Now we are all clear about what the contract is: you can pass in Integer and I agree that I will always call index.

It is of course difficult to know though whether or not an int annotation was intended to be strict like this or not so in the grand transition to type annotations there is a long process over time to work this out for all interfaces such as enumerate. Part of this transition is not just adding the annotations but also adding things like y = index(y) so that the runtime behaviour consistently handles the types as well. In SageMath’s case that may mean a lot of some_func(int(x)) in many places.

I think that the answer for what SageMath should use in its own annotations is that they should accurately reflect whatever is accepted or returned. For parameters it can be better to be generous with SupportsIndex but for return types it is better to be strict like -> Integer. I would not use Integer | int in any case:

  • SupportsIndex is better for public API parameters
  • Otherwise choose between Integer and int for parameters on internal functions and for return types on all functions.

Also I would not use SupportsIndex for anything other than function parameters e.g. a class attribute should never have that type:

class A:
     data: SupportsIndex

Instead

class A:
    data: Integer

    def __init__(self, data: SupportsIndex):
        self.data = Integer(data)

If it turns out that a function is called with a parameter that is sometimes an int and sometimes an Integer there are two options:

  • Change the parameter type to SupportsIndex and immediately convert with index/Integer so that the type is not ambiguous after the first line of code in the function.
  • Choose what the parameter type should be (Integer or int) and go back and fix the callers.

With public API you can’t fix the callers so only the first option is suitable.

If it turns out that function sometimes returns an Integer and sometimes an int then I would say that should be considered a bug and the fix is to choose what the type is, make the runtime match and annotate it accurately.

Focus more on getting the parameter and return types to be correct within SageMath and on its public API. If the type checker throws a false positive on enumerate then open a bug report with the type checker or the stubs and either:

  • Add # type: ignore.
  • Add an actual runtime conversion int(x).

The middle option of using cast gets a bit of the bad parts of the two above approaches (does not make the type correct at runtime and disables the checker).

3 Likes

Another option here with other tradeoffs, at least assuming the intent is for such a class to support Either your Integer or standard library int:

class A[T: (int, Integer)]
    data: T

    def __init__(self, data: T):
        self.data: T = data

This will appropriately surface any APIs that are being used that aren’t typed as compatibly as your intent when used with A

1 Like

This pretty much encapsulates the tension between static typing and duck typing. Static types make explicit statements about what is supported (as distinct from what might currently work), whereas duck typing essentially takes the view that ā€œif it works, it’s OKā€, and relies on a certain level of ā€œconsenting adultsā€ about how strictly to interpret that as constraints on how much the private implementation details can be changed.

In practice, I believe that for many libraries, when they add type information, they tend to increase the strictness of their guarantees, often without particularly meaning to or thinking about it. Partly this is unintentional, and partly it’s because correctly capturing duck typed interfaces is very complex with Python’s current typing machinery[1].

In this case, I suspect it’s inevitable that unless you lie and say that Integer is 100% compatible with int, you’re bound to have 3rd party APIs that will need a cast to be used with the Integer type.


  1. This is the case for one of my libraries, where I haven’t added type information because doing so correctly would be more complex than the actual code being annotated :slightly_frowning_face: ā†©ļøŽ

In practice when it comes to numeric types you could never really just rely on duck-typing as a general approach. I’m sure it is the same in SageMath as in other libraries where for SymPy, NumPy etc every public function has to convert every potentially numeric input parameter to its own numeric types:

def public_func(x):
    x = convert_to_known_types(x)
    # Now do stuff with x

With or without static typing that is needed because you have to validate the inputs and the different numeric types that might be passed in just aren’t interchangeable (otherwise we wouldn’t have so many of them!).

Take the function Tobias showed:

def func(y: int) -> str:
    return str(y + 1)

Now

In [3]: func(255)
Out[3]: '256'

In [4]: func(np.uint8(255))
Out[4]: '0'

(It also prints an overflow warning but I omit that to show the output clearly.) Now there are situations where you do actually want to get '0' out there but clearly there are also many situations where it would be totally wrong.

If the author of func wants it to work predictably then they either need to have control over what y + 1 means by controlling the type of y or otherwise it needs to be clear that the caller’s type is allowed to control that behaviour and therefore defines the meaning of func itself (some kind of generic code).

The Integer type we are talking about here is an example of this. Division with Integer gives Rational not float. You don’t actually want to write code that does arithmetic with a variable of type Integer | int because the whole point of Integer is that it behaves differently to int even in basic arithmetic.

1 Like

Good point, numeric types are weird. My experience is with other situations, so I’d forgotten that. Sorry for the noise.