Is type narrowing with assert a code smell?

abessman · March 27, 2023, 11:24am

Here is a (slightly contrived) example where I read some data from an external source, check that the data is of a type I expect, and then try to use an attribute of that specific data type:

from typing import Type, Union

def receive() -> bytes:
    ...

DTYPE_BYTE = 0

# DataType, VersionData, ChecksumData omitted for brevity.

PARSE_MAP: dict[int, Union[Type[VersionData], Type[ChecksumData]]] = {
    DataType.VERSION: VersionData,
    DataType.CHECKSUM: ChecksumData,
}

def read(expected: DataType) -> Union[VersionData, ChecksumData]:
    raw = receive()
    data = PARSE_MAP[raw[DTYPE_BYTE]](raw)
    if data.dtype == expected:
        return data
    raise RuntimeError

read(DataType.VERSION).version  # mypy error: Item "ChecksumData" of "Union[VersionData, ChecksumData]" has no attribute "version".

Since the read function can return types which do not have the version attribute, mypy does not like this. Mypy’s documentation suggests using assert to make the error go away:

data = read(DataType.VERSION)
assert isinstance(data, VersionData)
data.version

However, since I have already checked that the data does have this attribute, I’m wondering if there is any advantage to the above over just

read(DataType.VERSION).version  # type: ignore

?

The latter is less verbose and clearer (in my opinion, it’s not obvious what the assert is doing unless one is already familiar with mypy). So,

Is type narrowing with assert a code smell in this example?
Is type narrowing with assert always a code smell?

steven.daprano · March 27, 2023, 1:18pm

I don’t understand your design here. There may be many things I am missing, or miisunderstanding, but at first read this looks like MyPy has discovered a type bug in your code, and you are trying to silence it instead of fixing the bug.

You have a function that you claim always returns a VersionData object, so why do you declare that the function can return ChecksumData as well?

def read(expected: DataType) -> Union[VersionData, ChecksumData]:
    raw = receive()
    data = PARSE_MAP[raw[DTYPE_BYTE]](raw)
    if data.dtype == expected:
        return data
    raise RuntimeError

Why not use the return annotation VersionData instead of the union?

I cannot see any check in your code that the data object has a version attribute. The only check that I see is that you check whether the data object’s dtype attribute is equal to the passed in expected argument.

I don’t see how that check proves that (1) data is a VersionData object rather than ChecksumData, or (2) that it has a version attribute.

In other words, it seems to me that MyPy is correct to flag this as a type error. But maybe I am wrong.

If I am wrong, then I recommend that you follow the MyPy instructions and use an assertion:

data = read(DataType.VERSION)
assert isinstance(data, VersionData)
data.version

This has three powerful advantages over just disabling type-checking for that line:

It tells MyPy that data is a VersionData object.
It tells the human reader that data is a VersionData object.
And it checks that claim with a runtime test (unless you disable assertions).

Assertions are sometimes described as “checked comments” – they are comments to the reader that are actually executed to make sure that they are true.

Any time you have to fight with the type-checker it is a code smell. Sometimes that means your code is wrong and the type-checker is right, and sometimes it means your code is correct and the type-checker is wrong. Either way it is something that the human reader should look at it very carefully.

In otherwords, a code smell.

CAM-Gerlach · March 27, 2023, 1:30pm

For starters, I had to infer/guess (sometimes by trial and error) minimal implementations of DataType, VersionData and ChecksumData in order to reproduce this, which was rather tedious and may not perfectly accurate represent the relevant bits (though it should be close enough). In the future, make sure your examples are complete enough so that others can determine any relevant details and reproduce the issue you’re reporting. Here’s what I started with, which produces the same error above on the latest Mypy 1.1.1:

from enum import IntEnum
from typing import Type, Union

def receive() -> bytes:
    return b"01234"

DTYPE_BYTE = 0

class DataType(IntEnum):
    VERSION = 0
    CHECKSUM = 1

class VersionData:
    dtype = DataType.VERSION

    def __init__(self, version: bytes):
        self.version = version

class ChecksumData:
    dtype = DataType.CHECKSUM

    def __init__(self, checksum: bytes):
        self.checksum = checksum

PARSE_MAP: dict[int, Union[Type[VersionData], Type[ChecksumData]]] = {
    DataType.VERSION: VersionData,
    DataType.CHECKSUM: ChecksumData,
}

def read(expected: DataType) -> Union[VersionData, ChecksumData]:
    raw = receive()
    data = PARSE_MAP[raw[DTYPE_BYTE]](raw)
    if data.dtype == expected:
        return data
    raise RuntimeError

read(DataType.VERSION).version

Instead of passing doing the rather indirect, non-obvious and possibly error-prone check of .dtype, why not just directly pass the type you expect, check that normally with isinstance and return that type (using TypeVar)? This is more robust and correct, and mypy accordingly shows no errors:

diff --git a/temp3.py b/temp3.py
index 12adbbe..be527c5 100644
--- a/temp3.py
+++ b/temp3.py
@@ -1,5 +1,5 @@
 from enum import IntEnum
-from typing import Type, Union
+from typing import Type, TypeVar, Union

 def receive() -> bytes:
     return b"01234"
@@ -27,11 +27,13 @@ def __init__(self, checksum: bytes):
     DataType.CHECKSUM: ChecksumData,
 }

-def read(expected: DataType) -> Union[VersionData, ChecksumData]:
+T = TypeVar("T", bound=Union[VersionData, ChecksumData])
+
+def read(expected: Type[T]) -> T:
     raw = receive()
     data = PARSE_MAP[raw[DTYPE_BYTE]](raw)
-    if data.dtype == expected:
+    if isinstance(data, expected):
         return data
     raise RuntimeError

-read(DataType.VERSION).version
+read(VersionData).version

Full modified code

from enum import IntEnum
from typing import Type, TypeVar, Union

def receive() -> bytes:
    return b"01234"

DTYPE_BYTE = 0

class DataType(IntEnum):
    VERSION = 0
    CHECKSUM = 1

class VersionData:
    dtype = DataType.VERSION

    def __init__(self, version: bytes):
        self.version = version

class ChecksumData:
    dtype = DataType.CHECKSUM

    def __init__(self, checksum: bytes):
        self.checksum = checksum

PARSE_MAP: dict[int, Union[Type[VersionData], Type[ChecksumData]]] = {
    DataType.VERSION: VersionData,
    DataType.CHECKSUM: ChecksumData,
}

T = TypeVar("T", bound=Union[VersionData, ChecksumData])

def read(expected: Type[T]) -> T:
    raw = receive()
    data = PARSE_MAP[raw[DTYPE_BYTE]](raw)
    if isinstance(data, expected):
        return data
    raise RuntimeError

read(VersionData).version

abessman · March 27, 2023, 2:56pm

Steven D'Aprano:

I don’t understand your design here. There may be many things I am missing, or miisunderstanding, but at first read this looks like MyPy has discovered a type bug in your code, and you are trying to silence it instead of fixing the bug.

You have a function that you claim always returns a VersionData object, so why do you declare that the function can return ChecksumData as well?

Why not use the return annotation VersionData instead of the union?

I cannot see any check in your code that the data object has a version attribute. The only check that I see is that you check whether the data object’s dtype attribute is equal to the passed in expected argument.

I don’t see how that check proves that (1) data is a VersionData object rather than ChecksumData, or (2) that it has a version attribute.

The receive function receives data, in the form of an appropriate number of bytes, from some external source. The first of these bytes (DTYPE_BYTE) indicates the data type of the received data. I have created a dictionary to map the possible values of this data type indicator to various classes (PARSE_MAP). Let’s say that DataType is defined thus:

import enum

DataType(enum.IntEnum):
    VERSION = 10
    CHECKSUM = 20

Then, I could call the read function with either read(DataType.VERSION) or read(DataType.CHECKSUM). In either case, the function will return an instance of the corresponding class, or raise RuntimeError if the received data does not match the expected DataType.

abessman · March 27, 2023, 3:05pm

Sorry about that. I tried to keep my Minimally Reproducible Example as minimal as possible by eliminating some boiler plate, but in doing so I forgot to keep it reproducible

FWIW, your guess is pretty much spot on, for the purposes of this example.

C.A.M. Gerlach:

Instead of passing doing the rather indirect, non-obvious and possibly error-prone check of .dtype, why not just directly pass the type you expect, check that normally with isinstance and return that type (using TypeVar)? This is more robust and correct, and mypy accordingly shows no errors:

diff --git a/temp3.py b/temp3.py
index 12adbbe..be527c5 100644
--- a/temp3.py
+++ b/temp3.py
@@ -1,5 +1,5 @@
 from enum import IntEnum
-from typing import Type, Union
+from typing import Type, TypeVar, Union

 def receive() -> bytes:
     return b"01234"
@@ -27,11 +27,13 @@ def __init__(self, checksum: bytes):
     DataType.CHECKSUM: ChecksumData,
 }

-def read(expected: DataType) -> Union[VersionData, ChecksumData]:
+T = TypeVar("T", bound=Union[VersionData, ChecksumData])
+
+def read(expected: Type[T]) -> T:
     raw = receive()
     data = PARSE_MAP[raw[DTYPE_BYTE]](raw)
-    if data.dtype == expected:
+    if isinstance(data, expected):
         return data
     raise RuntimeError

-read(DataType.VERSION).version
+read(VersionData).version

I did try something similar first, except my attempt was

if isinstance(data, Union[VersionData, ChecksumData])

which did not work. Hence the roundabout .dtype check. I hadn’t thought to bind the Union to a TypeVar. That is much neater, thanks!

CAM-Gerlach · March 27, 2023, 3:18pm

Yeah, that won’t work at runtime (AFAIK); if you wanted that, you could instead just do isinstance(data, (VersionData, ChecksumData)). That would make the previous function work, but of course wouldn’t narrow the time all the way down to VersionData, just Union[VersionData, ChecksumData], unlike the above.

hauntsaninja · March 28, 2023, 6:18am

Note that you could potentially type this code more exactly using typing.overload

But to answer your question about using assert vs using # type: ignore. I think both can be reasonable in cases like this.

I typically prefer assert because:
a) if I’m wrong about something (now, or after future refactoring) I get an error in a context where it’s easy to debug, whereas if I type ignored I could potentially end up with an error somewhere far away
b) it more clearly states the condition which the type checker is unable to deduce, so I find it easier when re-reading much later

I only prefer # type: ignore (or some explicit Any type) in situations where something really dynamic is going on and it’s not worth proving to mypy or I’d need to assert several intermediate things (e.g. when unpacking a nested field in JSON).

I don’t think type narrowing with assert is really a code smell — or at least, it’s less of a code smell than Any or type ignore, since it clearly states the condition and doesn’t let your program proceed if the condition doesn’t hold. It does of course mean that you’ve constructed code that a type checker doesn’t fully understand. In most domains, I think that is perfectly acceptable.

CAM-Gerlach · March 28, 2023, 8:32am

Oh, is there something I missed that my TypeVar/isinstance-based solution incorrectly lets through that using overload correctly prevents? I’m a relative typing n00b (at least compared to you) and it took me a bit to put together a solution here, so I’d be grateful to learn more here. Thanks!

abessman · March 28, 2023, 8:37am

If my understanding is correct, using TypeVar tells mypy that the function accepts any one of the classes in the bound Union as input, and may return any one of them. However, it does not include the information that the input and return type must be the same. This additional constraint can be added using typing.overload.

In practice, there may not be a difference since mypy seems to recognize that the isinstance check narrows the possible return type down to the input type.

frostming · March 28, 2023, 8:55am

Not correct, the T appearing in both input arg and return value tells that whenever a type of either variant of the union is passed in, the function must return the instance of T of the same type. The type var is an element in generic types that can be substituted with any but a specific one of the defined types satisfying the bounds.

abessman · March 28, 2023, 8:58am

Thanks for the correction! I’ve only recently started using type annotations seriously, and I must admit I find it quite challenging to get the details right.

CAM-Gerlach · March 28, 2023, 9:22am

Yup, and if the function didn’t include the isinstance check, mypy would (correctly) point out that the actual return type didn’t match the expected, the type specified in the input argument.

Yeah, there’s definitely a learning curve. Personally, I found TypeVar to be one of the harder parts to grok in terms of the syntax at first; in fact, I still don’t reach for it as often as I should myself and learned quite a bit using it in the context of the answer to your question.

hauntsaninja · March 28, 2023, 7:38pm

Your solution works great, was mentioning more for completeness. Users sometimes find overload more intuitive than generics. In some cases it can be hard to change the code such that nominal typing just works.