I am looking for guidance on how to correctly type a custom integer type for static type checkers.
I (or to be precise the SageMath project) have a Cython-implemented Integer class that behaves like an int and is intended to be usable wherever an int is accepted, but it does not actually inherit from int. Declaring it as class Integer(int) in a .pyi stub would therefore be dishonest.
I have considered several options, none of which feel entirely satisfactory:
Using SupportsInt/SupportsIndex is honest, but usually too weak: it does not allow Integer to be accepted by existing APIs annotated as int.
Defining a custom Protocol (e.g. an āIntLikeā) is structurally correct, but does not help with third-party APIs expecting int.
Similarly, using a union alias (int | Integer) is explicit but only helps with our own code, not third-part APIs.
Is there an established or recommended way to model āint-like but not an intā types so they interoperate well with static typing? If I accept that Integer will not be accepted by external APIs annotated with int, which of the other options (numbers.integral, custom protocol or custom alias) is the preferred option?
Does it actually work to pass Integer to these APIs? If so, is it because they do int(x) or index(x)?
If those APIs actually accept SupportsIndex then they should use that as the annotation.
If the APIs actually expect an int without validation or conversion then you should pass the actual int type and the conversion goes on the caller side:
Hi Tobias. Iām a big fan of SageMath.
How much other Integer specific machinery is there? Can many functions in SageMath be typed as generic (with a bound to an int like protocol)?
@tobiasdiez I know itās a huge, mature library so Iām not suggesting a breaking change at this point in time. But class Integer(int) should be a quick change to make for the purposes of experimentation. Does the test suite pass in a fork with that change? And do Pythonās tests pass with int=Integer? If not, then should Integer really be typed as int at all?
We do have to handle Integer as a valid parameter in almost all places (as Oscar said, the preparser converts literal integers into Sageās Integers), but this could be handled by either using SupportsXyz or Union[int | Integer].
But currently at least, Iām more wondering how to handle external APIs that expect int in their typing annotation. This problem is also not restricted to Sagemath, as the following example using numpy shows:
import numpy as np
n = np.int8(2)
for k,v in enumerate(['a', 'b', 'c'], n): # both ty and pyright complain that n is not of type āintā
type(k) # int
print(k, v)
(Perhaps this is actually a ābugā in the typeshed declaration as start most likely can be SupportIndex)
Or suppose an API declares
def func(y: int) -> str:
return str(y + 1)
then calling it with an numpy int is working fine without any conversions but the typechecker will complain. Itās also not straightforward to annotate that function with full flexibility (y needs to support adding a int literal, and the returned type has to be SupportsStr).
I should also say that the typing annotations in the SageMath project are at a very early stage and pretty minimal. So this question is at the moment more theoretical and we donāt know actually know often the typechecker will complain that we try to pass a Sage integer to an external libraryā¦
I have no experience with SageMath and donāt know how compatible Integer is with int (i.e. does isinstance(x, int) work? Does type(x) == int? That said, Integer(int) might be the most pragmatic solution, even though it is a lie (and I donāt like these kind of lies). Another solution might be lying about __new__:
Yes, I would say that is a bug in the typeshed stubs. You can see that it very deliberately uses index in the CPython code:
Unfortunately that kind of thing is common but I assume that a simple PR to typeshed would be welcome.
This can go several different ways. Suppose I am the author of libfoo and you want to call libfoo.func and pass in an Integer which apparently works right now. If I have annotated that as int then it is potentially because I donāt want you to pass Integer in. I am potentially reserving the right to make changes in future that would break your unsupported use of the function.
The fact that the function currently happens to work with any y that can add with an int and then convert to str is not necessarily a guarantee that I want to provide. If I did want to allow more types then I would have picked a suitable type and used that:
Now we are all clear about what the contract is: you can pass in Integer and I agree that I will always call index.
It is of course difficult to know though whether or not an int annotation was intended to be strict like this or not so in the grand transition to type annotations there is a long process over time to work this out for all interfaces such as enumerate. Part of this transition is not just adding the annotations but also adding things like y = index(y) so that the runtime behaviour consistently handles the types as well. In SageMathās case that may mean a lot of some_func(int(x)) in many places.
I think that the answer for what SageMath should use in its own annotations is that they should accurately reflect whatever is accepted or returned. For parameters it can be better to be generous with SupportsIndex but for return types it is better to be strict like -> Integer. I would not use Integer | int in any case:
SupportsIndex is better for public API parameters
Otherwise choose between Integer and int for parameters on internal functions and for return types on all functions.
Also I would not use SupportsIndex for anything other than function parameters e.g. a class attribute should never have that type:
If it turns out that a function is called with a parameter that is sometimes an int and sometimes an Integer there are two options:
Change the parameter type to SupportsIndex and immediately convert with index/Integer so that the type is not ambiguous after the first line of code in the function.
Choose what the parameter type should be (Integer or int) and go back and fix the callers.
With public API you canāt fix the callers so only the first option is suitable.
If it turns out that function sometimes returns an Integer and sometimes an int then I would say that should be considered a bug and the fix is to choose what the type is, make the runtime match and annotate it accurately.
Focus more on getting the parameter and return types to be correct within SageMath and on its public API. If the type checker throws a false positive on enumerate then open a bug report with the type checker or the stubs and either:
Add # type: ignore.
Add an actual runtime conversion int(x).
The middle option of using cast gets a bit of the bad parts of the two above approaches (does not make the type correct at runtime and disables the checker).
This pretty much encapsulates the tension between static typing and duck typing. Static types make explicit statements about what is supported (as distinct from what might currently work), whereas duck typing essentially takes the view that āif it works, itās OKā, and relies on a certain level of āconsenting adultsā about how strictly to interpret that as constraints on how much the private implementation details can be changed.
In practice, I believe that for many libraries, when they add type information, they tend to increase the strictness of their guarantees, often without particularly meaning to or thinking about it. Partly this is unintentional, and partly itās because correctly capturing duck typed interfaces is very complex with Pythonās current typing machinery[1].
In this case, I suspect itās inevitable that unless you lie and say that Integer is 100% compatible with int, youāre bound to have 3rd party APIs that will need a cast to be used with the Integer type.
In practice when it comes to numeric types you could never really just rely on duck-typing as a general approach. Iām sure it is the same in SageMath as in other libraries where for SymPy, NumPy etc every public function has to convert every potentially numeric input parameter to its own numeric types:
def public_func(x):
x = convert_to_known_types(x)
# Now do stuff with x
With or without static typing that is needed because you have to validate the inputs and the different numeric types that might be passed in just arenāt interchangeable (otherwise we wouldnāt have so many of them!).
Take the function Tobias showed:
def func(y: int) -> str:
return str(y + 1)
Now
In [3]: func(255)
Out[3]: '256'
In [4]: func(np.uint8(255))
Out[4]: '0'
(It also prints an overflow warning but I omit that to show the output clearly.) Now there are situations where you do actually want to get '0' out there but clearly there are also many situations where it would be totally wrong.
If the author of func wants it to work predictably then they either need to have control over what y + 1 means by controlling the type of y or otherwise it needs to be clear that the callerās type is allowed to control that behaviour and therefore defines the meaning of func itself (some kind of generic code).
The Integer type we are talking about here is an example of this. Division with Integer gives Rational not float. You donāt actually want to write code that does arithmetic with a variable of type Integer | int because the whole point of Integer is that it behaves differently to int even in basic arithmetic.