Type narrowing with validation function that raises exception?

I have data from a web API which returns JSON, and I want to make sure it has a particular shape before continuing with my code. I would like to do so by calling a validation function that throws if the data is the wrong shape.

Suppose the data can look like this. (This example has just one key in the TypedDict, but in practice there are many.)

class DataRaw(TypedDict):
    val: NotRequired[str]

class DataFull(TypedDict):
    val: str

def fetch_data_from_api() -> DataRaw | DataFull:
    # Some implementation here
    ...

After fetching the data from the API, I want to make sure it has the shape of DataFull before continuing.

One way to do this is to create a function called is_data_full1() which is a TypeGuard. Then I have a function, do_something1(), which fetches the data, makes sure it’s the right shape, and then does something with the data.

def is_data_full1(x: DataRaw | DataFull) -> TypeGuard[DataFull]:
    return "val" not in x

def do_something1():
    x = fetch_data_from_api()
    if not is_data_full1(x):
        raise ValueError("val is not present in x")
    print(x["val"])

That works OK with the type hints that I’ve added – the type checker (pyright) knows that at the print(x["val"]), it must be a DataFull object, and so it doesn’t flag any errors.

Unfortunately, I can’t use this pattern, because I’m trying to integrate the types into an existing untyped code base where it uses a validation function that just throws if the data does not have the right shape. The existing code looks something like this:

def validate_data_full2(x):
    if "val" not in x:
        raise ValueError("val is not present in x")

def do_something2():
    x = fetch_data_from_api()
    validate_data_full2(x)
    print(x["val"])

I would like to add types to the existing code so that, after the line where validate_data_full2(x) is called, it knows that x is a DataFull object. Is this possible? I was hoping that I could do it with @overload and a NoReturn, but it doesn’t seem to work.

@overload
def validate_data_full3(x: DataRaw) -> NoReturn:
    ...

@overload
def validate_data_full3(x: DataFull) -> None:
    ...

def validate_data_full3(x: DataRaw | DataFull) -> None:
    if "val" not in x:
        raise ValueError("val is not present in x")


def do_something3():
    x = fetch_data_from_api()
    validate_data_full3(x)
    print(x["val"])   # <-- flagged by type checker

The type checker (pyright 1.1.355) highlights the line with print(x["val"]), because it still thinks that x is of type DataRaw | DataFull.

Is it possible to do what I’m asking for here? Again, I’m trying to add types to an existing code base without changing the logic.

What you’re looking for is a “type assert function”. TypeScript has the ability to define such functions using the asserts <type> return type annotation. There is currently no analogous mechanism in Python’s type system. There has been some discussion of adding a TypeAssert special form to support this use case, but it hasn’t gotten to the formal specification phase.

In the meantime, you can wrap your untyped validation function with a type guard function.

def validate_data(x) -> None:
    if "val" not in x:
        raise ValueError()

def is_data_valid(x: DataRaw) -> TypeGuard[DataFull]:
    validate_data(x)
    return True

Ofcourse, completely failing to solve the problem as outlined in OP.

There has been some discussion of adding a TypeAssert special form to support this use case, but it hasn’t gotten to the formal specification phase.

That would be a very useful feature.

So do you think that best thing to do for now is to add in a cast after the validation, like this?

def do_something():
    x = fetch_data_from_api()
    is_data_valid(x)
    x = cast(DataFull, x)    
    print(x["val"])

It always feels a little weird to me when I put a cast() in my code, since it is an actual function that gets called at runtime, and in this case it also involves an assignment back into x, but its only purpose is to help type checking which doesn’t happen at runtime.

One thing you could do is change the function to return DataFull, then return the input after checking it. That just hides the cast inside the function though.