Allow extracting/referring type from things like dictionary, dataclasses & Pydantic models

jd-solanki · August 7, 2024, 11:29am

Hi

I’m a full-stack developer regularly using FastAPI and Pydantic alongside TypeScript. One thing I’ve noticed while working with data-focused applications in Python is the need to duplicate type definitions. This becomes particularly evident when defining types in classes or dataclasses and then repeating the same types in function signatures. Here’s a simple example to illustrate:

from dataclasses import dataclass

@dataclass
class Person:
    name: str

def get_by_name(name: str) -> Person: ...

In this example, we’re duplicating the type for name:

Source: Defined in the Person dataclass.
Duplicate: Repeated in the get_by_name function signature.

As a TypeScript developer, I appreciate the ability to reference existing types directly, like Person['name'], which avoids duplication. This not only reduces the need to remember where a type is used but also helps prevent errors. For example, a developer unfamiliar with the context might mistakenly alter the type in the function signature without realizing it’s intended to correspond to the name attribute in the Person class.

Moreover, establishing a direct relationship between the function argument type and the existing class attribute enhances code clarity. It communicates that the function’s argument type is not arbitrary but is inherently tied to a specific entity, promoting better understanding and maintainability.

This issue often arises in API development. For instance, if the Person class were a Pydantic model, and I needed to define a query parameter in a FastAPI endpoint, I would prefer to reference the name type from the Person model directly, like this:

@app.get("/person/get")
async def get_by_name(name: Person['name']): 
    pass

Benefits of Implementing This Feature

Reduces Redundancy: By allowing developers to refer to existing types, Python can eliminate the need for repetitive type definitions, leading to cleaner and more concise code.
Enhances Code Clarity: Direct references to existing types make the codebase easier to understand, especially for new developers or when maintaining large projects.
Minimizes Errors: Avoiding duplication reduces the likelihood of introducing inconsistencies, such as accidentally mismatching types in different parts of the code.
Improves Maintainability: If a type needs to be updated, it only needs to be changed in one place, ensuring that the change is reflected wherever the type is used.
Boosts Developer Productivity: Developers can spend less time managing types and more time focusing on core functionality, improving overall productivity.

I believe introducing this feature would significantly enhance Python’s type system, aligning it more closely with the flexibility and convenience that TypeScript developers are accustomed to.

Looking forward to hearing your thoughts on this!

Best regards,
JD Solanki

tmk · August 7, 2024, 2:26pm

There was this proposal: Proposal: KeyType and ElementType for TypedDicts (also, Map) · python/typing · Discussion #1412 · GitHub to bring this feature from TypeScript to Python, though it was only proposed for TypedDict and not dataclasses. Your example would have been written like this:

from typing import ElementType, Literal, TypedDict

class Person(TypedDict):
    name: str

def get_by_name(name: ElementType[Person, Literal["name"]]) -> Person: ...

Writing it as Person['name'] is not really possible, because square brackets are used for generics.

beauxq · August 7, 2024, 5:08pm

A way that I would consider going if I wanted to link these types is to make a new nominal type.

from dataclasses import dataclass


class PersonName(str):
    """ name of a Person """


@dataclass
class Person:
    name: PersonName


def get_by_name(name: PersonName) -> Person: ...

That makes more work in some places:

person = Person(PersonName("Joe"))

instead of

person = Person("Joe")

But I think that doesn’t hurt error checking and code clarity.

bschubert · August 7, 2024, 5:40pm

FYI, instead of subclassing, you can use NewType to make a new nominal type:

PersonName = NewType("PersonName", str)

This has all the static typing benefits while none of the runtime issues^[1] that creating an actual subclass can introduce.

Of course if you don’t want PersonName to be distinct from str there’s also type aliases/TypeAlias’s which may be closer to the behavior the OP is looking for:

type PersonName = str  # 3.12+
# or 
PersonName: TypeAlias = str

like unexpected overhead (did you remember to handle __slots__ correctly?) or misbehaviors (e.g. breaking code that depends on type(name) is str) ↩︎

jd-solanki · August 8, 2024, 8:49am

This is just example. Assume I’ve 10 fields then I don’t prefer creating 10 types. Whole application will be mess.

If you consider the typescript example it’s super simple, minimal and DRY.

UltimateLobster · August 8, 2024, 5:03pm

I also tried to propose a simillar thing here though I didn’t made any progress since then (I wanted to approach other package and type-checker maintainers to see if they would be willing to support this. Unfortunately, I was abit busy and didn’t get back to this).

I also drafted a basic PEP with my version of this proposal.

Here are a few problematic points you may want to consider:

If the syntax uses literal strings, it may present some problems with regards to stringified annotations. We need to be able to distinguish between "name" as the literal name of the field, and "name" which can potentially can refer to a type (A class which may be called name and is available in the same scope)
We need to consider what should be available and on what conditions. Person["name"] may refer to either an attribute or an item. A class may support both ways, it may define methods in its class body. Should they be accessible as well? What about properties? There are a lot of things to consider here.

I think if we’re able to answer these features thoroughly, this has the potential to give a massive boost to type-checkers and package maintainers.

Dutcho · August 9, 2024, 8:41pm

Would def get_by_name(name: fields(Person).name.type) -> Person: ... be an acceptable syntax?

That could be accomplished by

class FieldsTuple(tuple[dataclasses.Field, ...]):
    def __getattr__(self, field_name: str, /) -> dataclasses.Field:
        return next(field for field in self if field.name == field_name)

def fields(cls) -> FieldsTuple:
    return FieldsTuple(dataclasses.fields(cls))

Ideally, this version of fields() could be a change to dataclasses, but I don’t know if fields() returning a specialised subclass of tuple would introduce any issues for backward compatibility.

And of course, type checkers wouldn’t automatically recognise the same.