Runtime type checking using parameterized types

zhangyx · November 4, 2024, 9:23pm

Updated: reorganized the post, summarized objections into “known issues”

AFAIK: Currently parameterized types cannot be used directly for type checking in python.

For example:

>>> isinstance(["hello type check"], list[str])
TypeError: isinstance() argument 2 cannot be a parameterized generic

However, recursive type checking seems to be only one step away thanks to the current typing infrastructure. All it takes is a new hook method __type_check__ and a builtin function type_check() that calls __type_check__

Benefits

Providing a standard type checking interface could maximize compatibility across 3rd party packages.

It will also simplify type checking for small scripts, freeing up the need to install a entire wheel (like cattrr, pydantic , beartype, msgspec, etc…) just for checking something simple

Comparing type_check() against current available approaches:

# Suppose we have a deserialized object to check.
DataType = list[tuple[float, str, float]]
data: DataType = deserialize(buffer)

type_check way - one liner
```
type_check(data, DataType)
```

manual type check

def check_data(data) -> bool:
  for item in data:
    if not isinstance(data, tuple) or len(data) != 3:
      return False
    for el, t in zip(item, (float, str, float)):
      if not isinstance(el, t):
        return False
  return True

3rd party wheel (using cattrs as an example)

Disclaimer

Please forgive me if I did not do this the right way – this is the best I can do with 30 minutes playing with it.
If you think I did not do it properly, then perhaps many others will run into the same situation - and that supports my point that we need a simpler and more standardized way to do type checking.
In addition, this does not mean cattrs is bad - it is GREAT. My point is that they might be too heavy for simple use cases and their behavior does not always align with your expectation (especially if you are new to it).
```
from cattrs import structure
def check_data(data) -> bool:
  # bad:
  #   set([
  #     (1.0   , ""  , False),
  #     (int(1), None, True ),
  #   ])
  # will pass validation
  try:
    structure(data, DataType)
    return True
  except ValueError:
    return False
```

Known issues

Type checking could be destructive for iterators and generators (thanks to @Nineteendo @mikeshardmind):

This could be mitigated by throwing a TypeError in the builtin type_check implementation for generators.
Each classes in the inheritance chain have to correctly implement their own __type_check__ (thanks again to @mikeshardmind):

For now I cannot think of a way to remove this maintenance burden. This is kind of the nature of type variables: you cannot possibly know what to check unless you’re the author of that class…

But this burden seems to reside mostly on package maintainers, not end users. This is because type_check will attempt to invoke the nearest parent class in the inheritance chain if the object’s own class does not provide __type_check__.

For example:
```
class MyArray(np.ndarray[np.float32]):
    pass # No type check specified

arr = MyArray(dtype=np.int64) # bad
type_check(arr, MyArray)
# hasattr(MyArray, '__type_check__') is False
# will invoke:
ndarray.__type_check__(arr, np.float32)
# Note: this behavior is possible
# but not implemented in the demo below.
```

A demonstrative implementation of `type_check()`:

from typing import get_args, get_origin, Union
from types import UnionType

def type_check(obj, t: type) -> bool:
    origin, args = get_origin(t), get_args(t)

    if origin is None:
        # t is not a generic type, fallback to direct type check
        return isinstance(obj, t)

    if origin is UnionType or origin is Union:
        # Union type, check if any of the types match
        return any(type_check(obj, arg) for arg in args)

    if not isinstance(obj, origin):
        # Origin type mismatch
        return False

    if len(args) == 0:
        # No type hint, anything is allowed
        return True

    if hasattr(origin, "__type_check__"):
        # Use t's type checker whenever possible
        # e.g. 
        #   class B(A[Generic[T]]): ...
        #   type_check(B[int](), A[int]) => should give True
        return origin.__type_check__(obj, *args)

    if hasattr(obj, "__type_check__"):
        # Type check supported by this object
        return obj.__type_check__(*args)

    # Type args specified but type_check not supported by this object
    return False

Extending builtin `list` to support typecheck:

from builtins import list as builtin_list
from typing import TypeVar

T = TypeVar("T")

class list(builtin_list[T]):
    def __type_check__(self, *t: type):
        if len(t) == 0:
            # No type hint, anything is allowed
            return True
        elif len(t) == 1:
            # Single type hint, all elements must be of this type
            typ = t[0]
            return all(type_check(el, typ) for el in self)
        elif len(t) == len(self):
            # Type check each item with corresponding type hint
            return all(type_check(el, typ) for el, typ in zip(self, t))
        else:
            # Number of items mismatches with number of type hints
            return False

Usage:

(correctness has been checked on 3.13.0)

# Simple Examples
assert type_check(list([1, 2, 3])  , list[int]) is True
assert type_check(list([1, 2, 3.0]), list[int]) is False # 3rd element type mismatch

# Multiple types
assert type_check(list([1, "2", 3.0]), list[int, str, float]) is True
assert type_check(list([1, "2", "3"]), list[int, str, float]) is False # 3rd element type mismatch
assert type_check(list([1, "2"])     , list[int, str, float]) is False # element count mismatch

# Recursive type checking is automatically supported
assert type_check(list([list([1, 2]), list([3, 4])]), list[list[int]]) is True
assert type_check(list([list([1, 2]), list([3, "4"])]), list[list[int]]) is False # 2nd list fails

# Union types are also supported
L = list[list[int] | list[float]]
assert type_check(list([list([1, 2]), list([3.0, 4.0])]), L) is True
assert type_check(list([list([1, 2.0]), list([3, 4.0])]), L) is False # list[int | float] != list[int] | list[float]

P.S. I expected to find a lot of similar proposals or discussions, but somehow I did not find any similar proposal out there when I did my research. Not sure what’s going on…

Nineteendo · November 4, 2024, 10:08pm

Do you have any use cases for this? And why can’t you use a TypeGuard / TypeIs for that?
There’s probably a reason for this behaviour though…

Edit: aha (PEP 585 – Type Hinting Generics In Standard Collections | peps.python.org):

This functionality requires iterating over the collection which is a destructive operation in some of them. This functionality would have been useful, however implementing the type checker within Python that would deal with complex types, nested type checking, type variables, string forward references, and so on is out of scope for this PEP.

zhangyx · November 4, 2024, 10:17pm

I am working on socket based communication protocols.

The deserializer is supposed to check for correctness of packet content before returning the value. It will be great to reuse type annotations for type checking instead of hand-writing some code to check it - and handwritten checks might not strictly align with type annotations.

For example:

# (timestamp: float, label: str, score: float)
CorrelationStamped = list[list[float, str, float]]

class CorrelationPipe(JsonProtocol):
  def decode(obj: object) -> CorrelationStamped:
    # Some data transform omitted
    ...
    # Manual type check
    for l in obj:
      for d, t in zip(l, (float, str, float)):
        assert isinstance(d, t)
        # potential bug: false positive if len(l) < 3
        # zip() will match the shortest iterable

Moreover, everytime I change the definition of type CorrelationStamped, I have to also change the manual type check code to align with it, which is prone to all kind of mistakes.

zhangyx · November 4, 2024, 10:23pm

Explicitly calling type_check() indicates the user is aware of the danger - and we can make Generator.__type_check__() throw an Exception to mitigate this.

Daverball · November 4, 2024, 10:26pm

There’s already quite a few great and well established libraries for runtime checking, like beartype, pydantic and cattrs, just to name a couple. This is not a simple topic, so it’s better served by third party libraries and it seems like what you’re looking for is one of those libraries.

There’s usually a lot more things you want to validate in real world data than what the static type system provides, e.g. validating that a port number is in the valid range and not just an int. So you typically want to write your own runtime validation code anyways, so writing a standard implementation for validating a couple of additional static types does not seem worth the maintenance burden, especially since typing is still evolving, so it seems premature to nail down a runtime behavior that can’t easily be changed later.

Nineteendo · November 4, 2024, 10:38pm

Note that that list[float, str, float] isn’t a valid type: mypy Playground
You would have to write tuple[float, str, float], which you could check like this:

from typing import TypeGuard, Any

def is_correlation_stamped(obj: Any) -> TypeGuard[list[tuple[float, str, float]]]:
    if not isinstance(obj, list):
        return False
    for l in obj:
        if not isinstance(l, tuple) or len(l) != 3:
            return False
        if not all(isinstance(d, t) for d, t in zip(l, (float, str, float))):
            return False
    return True

zhangyx · November 4, 2024, 11:22pm

Thanks for the information. I read through each of their introductions. pydantic and cattrs seems to be working for the example I provided - and that is what I need for now.

However, as I browse through their documentation, I found that the approaches they took are “intrusive” - it will be hard to use it to validate typed objects from external libraries. For example:

from attrs import define
from cattrs import structure

from typing import Generic, TypeVar

T = TypeVar("T")

class Extern(Generic[T]):
    value: T
    def __init__(self, var: T):
        self.value = var

@define
class C:
    e: Extern[float]

instance = structure({'e': Extern(1.0)}, C)

# cattrs.errors.StructureHandlerNotFoundError:
# Unsupported type: __main__.Extern[float].
# Register a structure hook for it.

Suppose Extern is a third-party supplied library, then it will be tricky to include it in your type checked code.

As clearly shown in the error message, the cattr package is also using type checking hooks to do its trick (and the other two are very likely doing the same thing) - then why not provide a standard way of type checking to maximize 3rd party package compatibility?

zhangyx · November 4, 2024, 11:31pm

I know I should not do this but there is no tuple in JSON world… And it adds a lot of burden to manually convert list to tuple just to make typing system happy. Thankfully the python interpreter in my deployment environment does not throw the same error at me.

I am aware that I can assert len(l) == 3 as I mentioned how the code will cause potential trouble. The point is that the separation of type definition and type checking implementtion imposes unnecessary trouble to coding.

mikeshardmind · November 4, 2024, 11:53pm

It would be unsafe to check parameterized generics without specific knowledge. Even in some cases with specific knowledge, you have to be careful to construct checks so that they only work on the exact class being checked and no subclasses, see discussion about the unsafety of TypeIs here: Problems with TypeIs - #46 by mikeshardmind, more complex libraries avoid this by parsing into a structured type rather than checking if a type looks like another type.

zhangyx · November 5, 2024, 12:51am

Not sure if I understand your point completely, but here is a probably working version using type_check():

from typing import TypeVar, Generic, TypeIs
from type_check import type_check

X = TypeVar("X", str, int, str | int, covariant=True, default=str | int)

class A(Generic[X]):
    def __init__(self, i: X, /):
        self._i: X = i

    @property
    def i(self) -> X:
        return self._i
    
    def __type_check__(self, t: type) -> bool:
        return type_check(self._i, t)


class B(A[X], Generic[X]):
    def __init__(self, i: X, j: X, /):
        super().__init__(i)
        self._j: X = j

    @property
    def j(self) -> X:
        return self._j
    
    def __type_check__(self, t: type) -> bool:
        return type_check(self._j, t) and super().__type_check__(t)

def do_not_boom(x: A[int]) -> int | str:
    if type_check(x, B[int] | B[str]):  # Changed from: isinstance(x, B)
        b: B[int] | B[str] = x          # Make static analyzer happy
        return b.i + b.j
    return "addition not viable"

def bad(x: A) -> TypeIs[A[int]]: # Bad no more!
    return type_check(x, A[int])

def indirection(x: A):
    if bad(x):
        return do_not_boom(x)


# example:
b: B[int | str] = B(1, "this")
print(indirection(b)) # "addition not viable"

# bad() is actually no longer needed:
print(do_not_boom(b)) # "addition not viable"

P.S. I’ve updated the original type_check() demo so it invokes A.__type_check__(obj, *t) when user calls type_check(b, A[t])

Update: I turned on mypy and found something interesting: b: B[int] | B[str] will cause both b.x and b.y to be casted to int | str. This misses out the information that they must either be both int or both be str.

I did not notice this because pylance does not mark this out by default.

mikeshardmind · November 5, 2024, 1:12am

Yeah, you’ve missed the point a little bit. Every subclass has to define __type_check__ and do so correctly (you can’t safely just inherit or generate this for runtime use), which means more repeating yourself and non-trivial behavior that may not be a good thing. There’s also the case of things like Iterator[T] where you’d have to consume the iterator to check, making it no longer useful.

Liz · November 5, 2024, 1:13am

You should look at msgspec. Especially since your example notes that this is json, you’ll get meaningful performance benefits as well.

Working with your example there:

>>> import msgspec

>>> class CorrelationStamped(msgspec.Struct, array_like=True):
...     timestamp: float
...     label: str
...     score: float

>>> data = CorrelationStamped(1730768750.610832, "example", 1)
>>> data
CorrelationStamped(timestamp=1730768750.610832, label='example', score=1)
>>> msgspec.json.encode(data)
b'[1730768750.610832,"example",1]'
>>> msgspec.json.decode(b'[1730768750.610832,"example",1]', type=CorrelationStamped)
CorrelationStamped(timestamp=1730768750.610832, label='example', score=1.0)

# or

>>> data = (1730768750.610832, "example", 1)
>>> msgspec.json.encode(data)
b'[1730768750.610832,"example",1]'
>>> msgspec.json.decode(b'[1730768750.610832,"example",1]', type=tuple[float, str, float])
(1730768750.610832, 'example', 1.0)

msgspec works recursively and can handle tagged unions as well as value constraints. cattrs does a good job here too. While some people like pydantic, I think it’s a very heavy dependency compared to other options.

You also shouldn’t use assert for that.

Serialization to and from arbitrary types is complex and needs to do things type systems don’t anyhow. Use a library made for it that works with the type system, the type system doesn’t need to do all the runtime work.

zhangyx · November 5, 2024, 1:40am

Thanks, looking into it.

I did not make it clear in the original example. But my outer wrapper (JsonProtocol.__iter__()) will catch decode exceptions and skip bad data points - exception control flow is intentionally used to indicate “no data returned”.

bwoodsend · November 5, 2024, 11:07am

You still shouldn’t use assertions though since they’ll be ignored if someone runs using python -O.

zhangyx · November 5, 2024, 3:53pm

Thanks for pointing that out. I did not know asserts can be turned off like this. I’ve removed those asserts from my code.

zhangyx · November 6, 2024, 3:42am

FYI: I created a package called “rttc” with two extra tools in the box: `type_assert()` and `@type_guard`. Check it out if you’re interested! GitHub | PyPi

NotoriousPyro · November 9, 2024, 4:18pm

Just save yourself a load of time and use pydantic. With pydantic you can define validators to transform your lists to tuples or do any kind of manipulation before validation, using BeforeValidator and Annotated on a BaseModel.

NotoriousPyro · November 9, 2024, 4:22pm

This is premature optimisation and you don’t know the full use cases. For majority of applications pydantic is performant hence why it makes it’s way into fastapi.

zhangyx · November 9, 2024, 4:35pm

I agree with you that anyone who is looking for a data validation tool should reach out to one of the well-maintained libraries listed above. I would also use one of those for production grade code, not the toy I made.

With that said, however, in my little playground, I am pushing on the limit of what runtime type checkers can do - to my best knowledge this has not been well explored in any other project.

For example:

@type_guard
@dataclass
class B[T]:
    x: T

B[int](x=1)   # ok
B[int](x="1") # TypeCheckError: B.x = str("1") is not int
B[str](x=1)   # TypeCheckError: B.x = int(1) is not str

Is this supported by any existing runtime type checker?

NotoriousPyro · November 10, 2024, 12:28am

If you are taking data from a json file for example, and use pydantic, the pydantic handling of types will decide if this will be implicitly converted: Conversion Table - Pydantic

For example, for strings, e.g. “1” you can tell pydantic that this is an int, and it will implicitly convert it. However, lists of ints (list[int]) it will not convert from a string, because that could be a mistake.

Pydantic have a page describing the conversion process and what is not supported. Anything not supported you have to convert yourself like so:

from typing import Annotated, Any
from pydantic import BaseModel, BeforeValidator
from annotated_types import MinLen

def parse_ints(value: Any) -> list[int]:
    assert isinstance(value, str)
    return [int(x) for x in value.split(",")]

class MyModel(BaseModel):
    my_prop: Annotated[list[int], BeforeValidator(parse_ints), MinLen(1)]

model = MyModel.model_validate_json("""{
    "my_prop": "1, 2"
}""")

print(model.my_prop)

Output: [1, 2]

Similarly for your case, this is possible (using TypeVar since I don’t have python 3.12 yet):

T = TypeVar("T")

def parse_int(value: Any) -> int:
    if isinstance(value, str):
        return int(value)
    return value

MaybeIntMaybeStrNumber = Annotated[T, BeforeValidator(parse_int)]

class B(BaseModel, Generic[T]):
    x: MaybeIntMaybeStrNumber

print(B[int].model_validate({"x": 1}))   # ok
print(B[int].model_validate({"x": "1"})) # ok
#B[str](x=1)   # not ok
print(B[int](x=1))   # ok

Output:

x=1
x=1
x=1

You can’t say that T will be like this:

B[str](x=1)   # not ok

or this:

B[str](x=str(1))

You can use AfterValidators instead, if you want to convert the type after you have validated it…

T = TypeVar("T")

def parse_int(value: Any) -> int:
    if isinstance(value, str):
        return int(value)
    return value

MaybeIntMaybeStrNumber = Annotated[T, AfterValidator(parse_int)]

class B(BaseModel, Generic[T]):
    x: MaybeIntMaybeStrNumber

print(B[int].model_validate({"x": 1}))   # ok
print(B[int].model_validate({"x": "1"})) # ok
print(B[str](x=str(1)))   # ok
print(B[int](x=1))   # ok

Output:

x=1
x=1
x=1
x=1

Runtime type checking using parameterized types

Benefits

Known issues

A demonstrative implementation of type_check():

Extending builtin list to support typecheck:

Usage:

FYI: I created a package called “rttc” with two extra tools in the box: type_assert() and @type_guard. Check it out if you’re interested! GitHub | PyPi

A demonstrative implementation of `type_check()`:

Extending builtin `list` to support typecheck:

FYI: I created a package called “rttc” with two extra tools in the box: `type_assert()` and `@type_guard`. Check it out if you’re interested! GitHub | PyPi