A case for type intersection: duck typing file-like arguments with an ad-hoc set of small protocols

blhsing · March 8, 2024, 10:14am

Currently “file-like objects” in Python is perhaps the most extreme use case of a duck typing system. A function can take any object as a “file” or “stream” argument as long as the object happens to have the set of “file-like” methods required by the function.

While such a design is convenient and versatile to use in a small project, it makes it difficult to tell in a large project exactly which methods need to be implemented for a file-like object in order to satisfy a particular function.

@Jelle, the developer of the typing module, has addressed the issue in this discussion:

And indeed, a look at _typeshed/__init__.pyi does reveal a good variety of protocols covering most of the file-like methods:

github.com

python/typeshed/blob/52daae514ade4692756bc47d16d9579f70aea93f/stdlib/_typeshed/init.pyi#L244


      
          
          FileDescriptor: TypeAlias = int  # stable
          FileDescriptorLike: TypeAlias = int | HasFileno  # stable
          FileDescriptorOrPath: TypeAlias = int | StrOrBytesPath
          
          # stable
          class SupportsRead(Protocol[_T_co]):
              def read(self, __length: int = ...) -> _T_co: ...
          
          # stable
          class SupportsReadline(Protocol[_T_co]):
              def readline(self, __length: int = ...) -> _T_co: ...
          
          # stable
          class SupportsNoArgReadline(Protocol[_T_co]):
              def readline(self) -> _T_co: ...
          
          # stable
          class SupportsWrite(Protocol[_T_contra]):
              def write(self, __s: _T_contra) -> object: ...

But the problem is that firstly, it’s a .pyi file meant for stubs for type checkers, and is therefore not directly importable.

And secondly, even if these small protocols are made available (by copying and pasting the code from the .pyi or by maintaining those small protocols ourselves), it would still be clumsy to use, having to define a dedicated Protocol just to type hint a file argument of a particular function:

class FooFile(SupportsRead, SupportsNoArgReadline):
    pass

def foo(file: FooFile):
    if (first_line := file.readline()).startswith('#!'):
        return first_line + file.read()

And then those who use a type checker then needs to find the definition of FooFile in order to understand that foo expects a file-like object that provides read and readline methods.

Wouldn’t it be more convenient and clearer to allow type hinting with an ad-hoc intersection of protocols in this case?

def foo(file: SupportsRead & SupportsNoArgReadline):
    if (first_line := file.readline()).startswith('#!'):
        return first_line + file.read()

hauntsaninja · March 8, 2024, 10:25am

You can directly import many of these protocols from GitHub - hauntsaninja/useful_types: Useful types for Python . There are folks working on a draft proposal to add intersections to the type system, I agree that easy intersection of protocols is a great use case.

Viicos · March 8, 2024, 1:12pm

I also encountered these protocols imported directly in library code:

from __future__ import annotations

if TYPE_CHECKING:
    from _typeshed import SupportsRead

I don’t know if this is a good practice? Seems to be fine as type checkers usually bundle a local version of typeshed

chepner · March 8, 2024, 4:11pm

This is documented at typeshed/stdlib/_typeshed at main · python/typeshed · GitHub.

My understanding is that _typeshed is “technically” an implementation detail for type checkers, but that code protected by if TYPE_CHECKING is similarly “lifted” into the type checker (since it’s there for use by the type checker, not your script at runtime).

blhsing · March 11, 2024, 2:50am

Victorien:

I also encountered these protocols imported directly in library code:
from __future__ import annotations

if TYPE_CHECKING:
    from _typeshed import SupportsRead
I don’t know if this is a good practice? Seems to be fine as type checkers usually bundle a local version of typeshed

Thanks. The problem is that doing this only helps provide type checking in type checkers, while producing a NameError at runtime because TYPE_CHECKING being false at runtime leaves SupportsRead undefined.

After some experimentation I found a workaround that works both for type checkers and at runtime:

try:
    from _typeshed import SupportsRead, SupportsWrite
except ModuleNotFoundError:
    from unittest.mock import Mock
    SupportsRead = SupportsWrite = Mock()

class SupportsReadWrite(SupportsRead, SupportsWrite):
    pass

chepner · March 11, 2024, 12:46pm

Ben Hsing:

After some experimentation I found a workaround that works both for type checkers and at runtime:

try:
    from _typeshed import SupportsRead, SupportsWrite
except ModuleNotFoundError:
    from unittest.mock import Mock
    SupportsRead = SupportsWrite = Mock()

class SupportsReadWrite(SupportsRead, SupportsWrite):
    pass

I don’t think you need Mock; making SupportsRead and SupportsWrite aliases for object would probably be sufficient.

if TYPE_CHECKING:
    from _typeshed import SupportsRead, SupportsWrite
else:
    SupportsRead = SupportsWrite = object

(or the try/except version thereof).

blhsing · March 11, 2024, 1:55pm

Clint Hepner:

I don’t think you need Mock; making SupportsRead and SupportsWrite aliases for object would probably be sufficient.
if TYPE_CHECKING:
    from _typeshed import SupportsRead, SupportsWrite
else:
    SupportsRead = SupportsWrite = object

object was actually the first thing I tried too, but it would produce:

TypeError: duplicate base class object

And if I did:

    SupportsRead = SupportsWrite = object()

I’d get:

TypeError: object() takes no arguments

So I figured Mock() was the most convenient object because it can be called with any arguments, although one can also do:

class M:
    def __new__(*args):
        return object.__new__(M)
try:
    from _typeshed import SupportsRead, SupportsWrite
except ModuleNotFoundError:
    SupportsRead = SupportsWrite = M()

class SupportsReadWrite(SupportsRead, SupportsWrite):
    pass

At any rate this feels like an ugly workaround, having to repeat the name of every protocol in use. Better use @hauntsaninja’s useful_types even though it’s an additional dependency (or make it part of stdlib maybe?).