Constructing a `TypedDict` or structural record without importing its precise type

mmcgill · April 10, 2024, 10:21pm

Hi,

I’m working with a codebase with a lot of structured data being passed around, and I’m looking for a type-safe way to pass data to functions without having to import a bunch of types at every call site. I’m currently using TypedDicts, since it’s possible to call functions expecting TypedDicts with dictionary literals, but as soon as you store a literal-constructed dictionary in a variable, the type checker complains.

For example:

from typing import TypedDict

class CupcakeOrder(TypedDict):
    flavor: str
    spice_level: int

class RequestForCupcake(TypedDict):
    flavor: str
    spice_level: int

def bake(co: CupcakeOrder) -> None: ...

# OK
co = CupcakeOrder(flavor="Berry", spice_level=0)
bake(co)

# OK
rfc = RequestForCupcake(flavor="Berry", spice_level=0)
bake(rfc)

# OK
bake({"flavor": "Berry", "spice_level": 0})

# Causes a type error
untyped_dict = {"flavor": "Berry", "spice_level": 0}
bake(untyped_dict)

Is there any way to extend the type checker’s data shape tracking when variables get involved, either for TypedDicts or NamedTuples or some other structural-record-like category of types? e.g., something like:

pastry_petition = type_me_as_if_i_was_frozen(flavor="Berry", spice_level=0)
bake(pastry_petition)

erictraut · April 10, 2024, 10:40pm

You can use a type declaration for the local variable, but that requires importing the TypedDict type.

untyped_dict: RequestForCupcake = {"flavor": "Berry", "spice_level": 0}

I don’t see a way to avoid importing the type symbol unless you want to re-declare the full type in each file where it’s used.

I’m curious why you’re reluctant to import the types into your code. A smart editor like VS Code or PyCharm will automatically insert the import statements for you, so it’s pretty painless — at least I find it to be so.

mmcgill · April 11, 2024, 12:01am

Thanks for the response. And yeah the explicit need to import structural types is what I’m trying to avoid.

As for why, I’d say part of it comes down to the same reasons that people generally don’t go out of their way to add superflouous type annotations to variables storing objects in Typescript, or (as far as I know) records in OCaml. For example, I often have a dozen or so short scripts whose only purpose is to construct a configuration object for some simulation or analysis pipeline, and then run it. For these, using dictionaries or from types import SimpleNamespace as Ns is more convenient and (arguably) more readable, but then you lose type checking.

Another reason is that I’m looking into the feasibility of procedurally generating typed Python bindings for a shared library based on a C header that I’d parse using the cffi library. And in this case, some of the generated types wouldn’t have the most straightforward names.

For example, this C API

typedef struct {
  float duration;
  float timestep_size;
  float squirrel_density;
  int *output_buffer;
} SimulationSpec;

extern void run_simulation(SimulationSpec s);

could be wrapped like this:

import ctypes, typing
import numpy as np

class SimulationSpec(ctypes.Structure):
    # How the attributes are stored
    _fields_ = [
        ("duration", ctypes.c_float),
        ("timestep_size", ctypes.c_float),
        ("squirrel_density", ctypes.c_float),
        ("output_buffer", ctypes.POINTER(ctypes.c_int)),
    ]

    # What you actually get back from attribute reads
    duration: float
    timestep_size: float
    squirrel_density: float
    output_buffer: "ctypes._Pointer[ctypes.c_int]"

class _ConvertibleToSimulationSpec(typing.TypedDict):
    duration: float
    timestep_size: float
    squirrel_density: float
    output_buffer: np.ndarray

def run_simulation(arg0: _ConvertibleToSimulationSpec) -> None:
    # Convert `arg0` to a `SimulationSpec` and then call the C implementation.
    ...