I have a function that can accept many types but in particular bool and int and it should return different types for bool and int. This is a simple demonstration:
from __future__ import annotations
from typing import overload
class A: pass
class B: pass
@overload
def func(arg: bool) -> A: ...
@overload
def func(arg: int) -> B: ...
def func(arg: bool | int) -> A | B:
if isinstance(arg, bool):
return A()
else:
return B()
Both pyright and mypy reject this example code:
$ mypy a.py
a.py:9: error: Overloaded function signatures 1 and 2 overlap with incompatible return types [overload-overlap]
Found 1 error in 1 file (checked 1 source file)
$ pyright a.py
a.py:9:5 - error: Overload 1 for "func" overlaps overload 2 and returns an incompatible type (reportOverlappingOverload)
1 error, 0 warnings, 0 informations
What is the appropriate way to communicate in the type system that a function returns different types depending on whether the input is a bool or an int?
Thatâs a fair ask, but itâs quite an odd design if bool does something different than int given that bools are ints. It would be more reasonable if A < B, (since bool < int) but this doesnât seem to be the case.
Consider that someone may have:
def f(x: int) -> B:
return func(x)
Do you expect that to be okay? Because f(True) is rightly okay, and func(True) gives A, and based on your annotations, it seems that A ⎠B.
Are you sure this isnât indicative of a design error?
Feel free to go back and read the history of the decision, but without this compromise Python may have never gotten a Boolean type and simply stuck with integers.
Just to add to this though that even though we might prefer bool not being a subclass of int, it is a subclass of int, so I think itâs your design that should probably change here.
I wasnât there but I think in terms of usability having bools be ints was a bad choice in some contexts, similarly to how strs being iterables of strs is a bad design choice in some contexts. For the latter I most often like it but Iâm bitten by it every once in a while.
The function that I am referring to here is SymPyâs sympify function. Its purpose is to convert Python objects into SymPyâs symbolic mathematical representations:
In [2]: type(sympify(2))
Out[2]: sympy.core.numbers.Integer
In [3]: type(sympify(True))
Out[3]: sympy.logic.boolalg.BooleanTrue
This conversion function is used everywhere and makes it possible to use ordinary Python types like int and bool when creating and manipulating symbolic expressions:
In [6]: x, y, z = symbols('x, y, z')
In [7]: (x & ~y) | (z & True)
Out[7]: z ⨠(x ⧠y)
In [8]: _.subs(y, False)
Out[8]: x ⨠z
This function maps between different type systems and in the other type system Boolean is a fundamentally distinct type from Expr with different operations and methods e.g. &|^ vs +-*. Regardless of mathematical sensibilities it isnât possible for BooleanTrue and BooleanFalse to be subclasses of Integer because operators like & and | are defined in incompatible ways for Boolean and for Integer (which implements the Integral ABC).
Right, this is the problem. You have two different inheritance trees, so it isnât going to be possible to resolve this issue easily.
Personally, I would give up on supporting True and False in SymPy expressions (raise an error if anyone tries), and introduce sympy.true and sympy.false, which have the appropriate behavior.
That may be uncomfortable, but I consider LSP errors much worse in the long run.
SymPy already has these objects as S.true and S.false: those are precisely what sympify(True) returns. Raising an error would be as major compatibility break and is not an option.
There has never been any problem in practice with treating bool and int differently like this. The fact that isinstance(True, int) returns True is easily solved by checking isinstance(obj, bool) first. Centralising this logic in a single function sympify ensures that this doesnât get confused anywhere. It just means that we now canât express the type hints for sympify itself in a way that type checkers will accept.
I think we will stick with type: ignore as I said above.
Okay. As you realize, Python isnât going to change its inheritance hierarchy for Booleans, so then you have to live with the LSP violation.
Personally, I think thatâs worse than forcing users to use sympy.true and sympy.false, but I understand that the compatibility break is too annoying.
The reason LSP violations can be extremely annoying is that it breaks invariants about how we expect functions to behave. Itâs not as easy as saying âjust do some instance checksâ since those instance checks have to go in all sorts of places that youâre not expecting, and you often only discover those places after you spend time debugging odd behavior.
Thatâs why I suggested that you avoid this mess in the first place, but I understand that your hands are tied. There may unfortunately not be an easy solution for you.
This is why everything uses the sympify function. Most public functions perform this conversion and then there are consistent types internally:
def func(arg):
arg = sympify(arg)
...
It is much more complicated than just bool and int because there are np.int64, gmpy2.mpz, fractions.Fraction, decimal.Decimal, np.ndarray, mpmath.matrix etc. Downstream code also defines classes with a ._sympy_() method that allows arbitrary types to be converted by sympify. It is important that this conversion be handled in a consistent way and the only way to do that is to call this function at every public boundary much like NumPy does where every public interface accepts an ndarray or a list of ints etc.
Where a type checker should be able to help is by distinguishing between sympified and unsympified types and verifying that sympify is used everywhere it is needed while not being used unnecessarily in internal code. I havenât yet figured out how to define a Sympifiable type that would be analogous to NumPyâs ArrayLike though. Ideally the signatures for sympify and for public functions would be like:
T = TypeVar('T', bound=Basic)
def sympify(arg: Sympifiable[T]) -> T:
...
def func(arg: Sympifiable[Expr]) -> Expr:
arg = sympify(arg)
if not isinstance(arg, Expr):
raise TypeError
...
I donât know how exactly to define Sympifiable[T] type so that it is understood by a type checker that int (or any SupportsIndex) is Sympifiable[Integer] and float is Sympifiable[Float] and both are Sympifiable[Number] where Number is a superclass of Integer and Float and anything with ._sympy_() -> T is Sympifiable[T] and so on.
We also have a similar typing problem here with int and float regardless of bool:
from typing import overload, Any
class Basic: pass
class Integer(Basic): pass
class Float(Basic): pass
@overload
def func(x: int) -> Integer: ...
@overload
def func(x: float) -> Float: ...
def func(x: Any) -> Basic:
assert False
This case is accepted by mypy but rejected by pyright. In this case int is not an actual subclass of float but PEP 484 sort of specified that it should be treated as a subtype. Apparently mypy and pyright handle that differently with @overload. It would be necessary to use type: ignore for int vs float and float vs complex regardless of bool vs int (or even bool vs complex!).
Thanks for the suggestion. There might be other cases where someone wants to distinguish int and bool and that approach would work but I donât think it works for my case.
In this case the types A and B really are incompatible and really are expected to be used in incompatible ways. They have different methods and there is no common protocol that allows each type to be used as intended.
You can start from PEP 285, and look in the mailing list archives (this was on email, before Discourse) to find the discusssions that went on at the time.
You can start with PEP 285 and then scan the python-dev archives from around that time. Looks like it starts here but there may be more threads if you look in different months.
The main thing to understand is that Python did not previously have a bool type and so int was used instead. Introducing bool as a subclass of int was smoother for compatibility at the time because it meant that code like e = x > y could still treat e as an int even if it was changed to evaluate as the new bool type instead. The PEP notes that much code in CPython itself needed this.
I donât know if it was discussed for the Python 3 transition but there was an opportunity there to make bool not be a subclass of int especially since __nonzero__ was renamed to __bool__ so any old methods returning int would have needed updating anyway. It would have broken arithmetic with bools but that is a far smaller breakage than changing integer division.
Iâm sure that it was a prudent choice at the time even if I donât like the way it works out now. I just couldnât help replying to Neilâs leading question: yes this does result from a deep design error.
Ty and thanks to @pf_moore for the links. I want just to add a quote from PEP 285:
In an ideal world, bool might be better implemented as a separate integer type that knows how to perform mixed-mode arithmetic. However, inheriting bool from int eases the implementation enormously [âŚ]
I omit the rest since itâs more or less what you said.