Type hints for bool vs int overload

oscarbenjamin · November 25, 2024, 12:56am

I have a function that can accept many types but in particular bool and int and it should return different types for bool and int. This is a simple demonstration:

from __future__ import annotations

from typing import overload

class A: pass
class B: pass

@overload
def func(arg: bool) -> A: ...
@overload
def func(arg: int) -> B: ...

def func(arg: bool | int) -> A | B:
    if isinstance(arg, bool):
        return A()
    else:
        return B()

Both pyright and mypy reject this example code:

$ mypy a.py 
a.py:9: error: Overloaded function signatures 1 and 2 overlap with incompatible return types  [overload-overlap]
Found 1 error in 1 file (checked 1 source file)
$ pyright a.py 
a.py:9:5 - error: Overload 1 for "func" overlaps overload 2 and returns an incompatible type (reportOverlappingOverload)
1 error, 0 warnings, 0 informations

What is the appropriate way to communicate in the type system that a function returns different types depending on whether the input is a bool or an int?

hauntsaninja · November 25, 2024, 6:08am

That’s the best you can do.

Because bool is a subtype of int, type checkers need to warn on those overloads to attempt to prevent the following situation:

def foo(x: int):
    assert isinstance(func(x), B)
foo(True)  # boom

Type checkers will still typically infer the type you want:

reveal_type(func(True))  # A
reveal_type(func(5))  # B

oscarbenjamin · November 25, 2024, 7:02pm

Thanks for confirming. I guess type: ignore it is.

There should really be some way to express exact basic types like bool, int, float, complex without them being conflated with each other.

NeilGirdhar · November 26, 2024, 7:13pm

That’s a fair ask, but it’s quite an odd design if bool does something different than int given that bools are ints. It would be more reasonable if A < B, (since bool < int) but this doesn’t seem to be the case.

Consider that someone may have:

def f(x: int) -> B:
  return func(x)

Do you expect that to be okay? Because f(True) is rightly okay, and func(True) gives A, and based on your annotations, it seems that A ≮ B.

Are you sure this isn’t indicative of a design error?

oscarbenjamin · November 26, 2024, 7:55pm

The design error is in Python by making bool a subclass of int.

NeilGirdhar · November 26, 2024, 8:35pm

100% agree with you

brettcannon · December 3, 2024, 10:44pm

Feel free to go back and read the history of the decision, but without this compromise Python may have never gotten a Boolean type and simply stuck with integers.

NeilGirdhar · December 4, 2024, 8:12am

Just to add to this though that even though we might prefer bool not being a subclass of int, it is a subclass of int, so I think it’s your design that should probably change here.

ajoino · December 4, 2024, 9:16am

I wasn’t there but I think in terms of usability having bools be ints was a bad choice in some contexts, similarly to how strs being iterables of strs is a bad design choice in some contexts. For the latter I most often like it but I’m bitten by it every once in a while.

oscarbenjamin · December 4, 2024, 12:32pm

What change do you suggest?

The function that I am referring to here is SymPy’s sympify function. Its purpose is to convert Python objects into SymPy’s symbolic mathematical representations:

In [2]: type(sympify(2))
Out[2]: sympy.core.numbers.Integer

In [3]: type(sympify(True))
Out[3]: sympy.logic.boolalg.BooleanTrue

This conversion function is used everywhere and makes it possible to use ordinary Python types like int and bool when creating and manipulating symbolic expressions:

In [6]: x, y, z = symbols('x, y, z')

In [7]: (x & ~y) | (z & True)
Out[7]: z ∨ (x ∧ ¬y)

In [8]: _.subs(y, False)
Out[8]: x ∨ z

This function maps between different type systems and in the other type system Boolean is a fundamentally distinct type from Expr with different operations and methods e.g. &|^ vs +-*. Regardless of mathematical sensibilities it isn’t possible for BooleanTrue and BooleanFalse to be subclasses of Integer because operators like & and | are defined in incompatible ways for Boolean and for Integer (which implements the Integral ABC).

NeilGirdhar · December 4, 2024, 12:47pm

Right, this is the problem. You have two different inheritance trees, so it isn’t going to be possible to resolve this issue easily.

Personally, I would give up on supporting True and False in SymPy expressions (raise an error if anyone tries), and introduce sympy.true and sympy.false, which have the appropriate behavior.

That may be uncomfortable, but I consider LSP errors much worse in the long run.

oscarbenjamin · December 4, 2024, 12:59pm

SymPy already has these objects as S.true and S.false: those are precisely what sympify(True) returns. Raising an error would be as major compatibility break and is not an option.

There has never been any problem in practice with treating bool and int differently like this. The fact that isinstance(True, int) returns True is easily solved by checking isinstance(obj, bool) first. Centralising this logic in a single function sympify ensures that this doesn’t get confused anywhere. It just means that we now can’t express the type hints for sympify itself in a way that type checkers will accept.

I think we will stick with type: ignore as I said above.

NeilGirdhar · December 4, 2024, 1:04pm

Okay. As you realize, Python isn’t going to change its inheritance hierarchy for Booleans, so then you have to live with the LSP violation.

Personally, I think that’s worse than forcing users to use sympy.true and sympy.false, but I understand that the compatibility break is too annoying.

The reason LSP violations can be extremely annoying is that it breaks invariants about how we expect functions to behave. It’s not as easy as saying “just do some instance checks” since those instance checks have to go in all sorts of places that you’re not expecting, and you often only discover those places after you spend time debugging odd behavior.

That’s why I suggested that you avoid this mess in the first place, but I understand that your hands are tied. There may unfortunately not be an easy solution for you.

ntessore · December 4, 2024, 2:45pm

If your result types A and B implement the same protocol, you could make that explicit:

from __future__ import annotations

from typing import Protocol

class Result[T](Protocol): pass

class A: pass
class B: pass

def func[T](arg: T) -> Result[T]:
    if isinstance(arg, bool):
        return A()
    else:
        return B()

reveal_type(func(True))  # Result[bool]"
reveal_type(func(1))  # Result[int]

oscarbenjamin · December 4, 2024, 3:30pm

This is why everything uses the sympify function. Most public functions perform this conversion and then there are consistent types internally:

def func(arg):
    arg = sympify(arg)
    ...

It is much more complicated than just bool and int because there are np.int64, gmpy2.mpz, fractions.Fraction, decimal.Decimal, np.ndarray, mpmath.matrix etc. Downstream code also defines classes with a ._sympy_() method that allows arbitrary types to be converted by sympify. It is important that this conversion be handled in a consistent way and the only way to do that is to call this function at every public boundary much like NumPy does where every public interface accepts an ndarray or a list of ints etc.

Where a type checker should be able to help is by distinguishing between sympified and unsympified types and verifying that sympify is used everywhere it is needed while not being used unnecessarily in internal code. I haven’t yet figured out how to define a Sympifiable type that would be analogous to NumPy’s ArrayLike though. Ideally the signatures for sympify and for public functions would be like:

T = TypeVar('T', bound=Basic)

def sympify(arg: Sympifiable[T]) -> T:
    ...

def func(arg: Sympifiable[Expr]) -> Expr:
    arg = sympify(arg)
    if not isinstance(arg, Expr):
        raise TypeError
    ...

I don’t know how exactly to define Sympifiable[T] type so that it is understood by a type checker that int (or any SupportsIndex) is Sympifiable[Integer] and float is Sympifiable[Float] and both are Sympifiable[Number] where Number is a superclass of Integer and Float and anything with ._sympy_() -> T is Sympifiable[T] and so on.

We also have a similar typing problem here with int and float regardless of bool:

from typing import overload, Any

class Basic: pass
class Integer(Basic): pass
class Float(Basic): pass

@overload
def func(x: int) -> Integer: ...
@overload
def func(x: float) -> Float: ...

def func(x: Any) -> Basic:
    assert False

This case is accepted by mypy but rejected by pyright. In this case int is not an actual subclass of float but PEP 484 sort of specified that it should be treated as a subtype. Apparently mypy and pyright handle that differently with @overload. It would be necessary to use type: ignore for int vs float and float vs complex regardless of bool vs int (or even bool vs complex!).

oscarbenjamin · December 4, 2024, 3:38pm

Thanks for the suggestion. There might be other cases where someone wants to distinguish int and bool and that approach would work but I don’t think it works for my case.

In this case the types A and B really are incompatible and really are expected to be used in incompatible ways. They have different methods and there is no common protocol that allows each type to be used as intended.

Lucas_Malor · December 4, 2024, 6:33pm

This is interesting. Where can I find the rationale?

pf_moore · December 4, 2024, 6:57pm

You can start from PEP 285, and look in the mailing list archives (this was on email, before Discourse) to find the discusssions that went on at the time.

oscarbenjamin · December 4, 2024, 7:41pm

You can start with PEP 285 and then scan the python-dev archives from around that time. Looks like it starts here but there may be more threads if you look in different months.

The main thing to understand is that Python did not previously have a bool type and so int was used instead. Introducing bool as a subclass of int was smoother for compatibility at the time because it meant that code like e = x > y could still treat e as an int even if it was changed to evaluate as the new bool type instead. The PEP notes that much code in CPython itself needed this.

I don’t know if it was discussed for the Python 3 transition but there was an opportunity there to make bool not be a subclass of int especially since __nonzero__ was renamed to __bool__ so any old methods returning int would have needed updating anyway. It would have broken arithmetic with bools but that is a far smaller breakage than changing integer division.

I’m sure that it was a prudent choice at the time even if I don’t like the way it works out now. I just couldn’t help replying to Neil’s leading question: yes this does result from a deep design error.

Lucas_Malor · December 4, 2024, 8:00pm

Ty and thanks to @pf_moore for the links. I want just to add a quote from PEP 285:

In an ideal world, bool might be better implemented as a separate integer type that knows how to perform mixed-mode arithmetic. However, inheriting bool from int eases the implementation enormously […]

I omit the rest since it’s more or less what you said.