How to verify if a type is part of a type annotation?

baderdean · August 8, 2023, 3:31pm

In JSON-LD specs, a field could be basically: set | set[*] where * could be a str, HttpUrl and many other stuff. When I’m parsing such JSON, to add a new value to a field, I first need to know if the field is a a set or something else like a string. When it’s a string, basically I transform this field into a set before adding a new value. When it’s already a set, I add a new value. When the new value is a set itself, I updates the old field. So basically first, I’ll have to check for the field type before doing anything.

I decided to go simple and use TypedDict to modelize this object. Here an example of the Person type:

class Person(TypedDict, total=False):
    name: str
    email: EmailStr | set[EmailStr]
    homeLocation: str | set[str]
    alternateName: str | set[str]
    description: str | set[str]
    familyName: str
    givenName: str
    identifier: str | set[str]
    image: HttpUrl | set[HttpUrl]
    jobTitle: str | set[str]
    knowsLanguage: (
        constr(pattern=RE_LANGUAGE) |
        set[constr(pattern=RE_LANGUAGE)]
        )
    nationality: (
        constr(pattern=RE_COUNTRY) |
        set[constr(pattern=RE_COUNTRY)]
        )
    OptOut: bool
    sameAs: HttpUrl | set[HttpUrl]
    url: HttpUrl
    workLocation: str | set[str]
    worksFor: str | set[str]

First issue: it’s impossible to add methods to TypedDict (while I could inherit from dict and do so, so I don’t understand why). So I created a simple function to do it. I already feel that I’ve done something dirty:

def person_set_field(person: Person, field: str, value: str | set) -> Person:
    """Set while transform field into set when the value or the dest is not set
    WARNING: only works with set/str

    Args:
        person (Person): person's dict
        field (str): field name to set
        value (str | set): value

    Returns:
        Person: person's dict
    """
    # quirky hack to check if one of annotation could be a set of something
    is_dest_set = RE_SET.match(str(Person.__annotations__[field]))
    if is_dest_set:
        if field not in person:
            person[field] = set()
        elif not type(person[field]) is set:
            person[field] = {person[field], }
        if type(value) is set:
            person[field] |= value
        else:
            person[field] |= {value, }
    else:
        person[field] = value

    return person

Here comes the most surprising thing: I find it horribly difficult to verify if fields annotations that are set, set | str, set[str] | str, str | set[HttpUrl] could be a set. So I coined this REGEXP:

RE_SET = re.compile(r"(\s|^)set\W")

Here I feel that I’ve done something very unpythonic, shame on me! So here my question, what was the pythonic to do what I intend to do?

kknechtel · August 8, 2023, 11:57pm

Because TypedDict is magical. When you instantiate Person, it actually creates an ordinary base dict, which doesn’t have the methods you tried to add. They’re left behind in the Person class itself, which is disconnected from instances you create.

Well, the first thing is that if you’re trying to use type annotations at all, you’re already fighting the type system; if you then want to be able to check the runtime type of the input and coerce it, to accommodate types that don’t match your annotations… then you’re fighting against your previous decision, too.

That said: after retrieving the annotation with Person.__annotations__[field], if it’s a types.UnionType (which you get from combining like str | set[str]) then it will have a __args__ that lists the underlying types that were combined. Then, set[str] is a types.GenericAlias, which has an __origin__ of set.

baderdean · August 9, 2023, 10:06am

Hello,

thanks for the explanation about TypedDict. So what’s a good alternative: Dataclass or creating my own custom Dict class?

About the types thing, I’m not fighting against the type system, that’s quite the opposite. I’m trying to guess based on the field’s annotation and real value if I should do either of those:

field = value
field.add(value)
field.update(value)
field = {field, value, }
field = {field, } | value

I tried using __args__ and __origin__ but it’s tons of if/else, the RE match that basically check if this type could be a set? was much simpler to build, read and maintain. What I would have loved is something like set in field.__annotations__

kknechtel · August 9, 2023, 10:20am

The type system is not designed, in the first place, to even care that an annotation exists. That is what I mean by “fighting against the type system”. Python is a dynamically typed language.

baderdean · August 9, 2023, 11:56am

Thank you for your patience. I understand that I’m doing something unpythonic. I’m just looking the way to do it the proper way! Because obviously, I’m not able to change the schema.org JSON-LD schema that enable a field to be str | set[str| but also HttpUrl | set[HttpUrl] and so on. So I need a way to modify those data the proper way.

jamestwebber · August 9, 2023, 1:47pm

This schema seems badly designed.nDoes it allow for a set with zero or one element? If so, I would just restrict to a subset of the valid schema that makes more sense: always make it set[str] or set[HttpUrl], so you are always adding an element. That’s a much simpler design and far easier to work with.

baderdean · August 9, 2023, 2:19pm

I agree with you that this schema is not convenient in Python yet I’m not the owner of the schema.org spec, so in the real world some platform provides a JSON with a field wich is a string, and sometimes it could be a set of string/HttpUrl. It’s the spec!

jamestwebber · August 9, 2023, 2:53pm

I guess it depends what you are trying to do with it–if you need to parse this stuff into a model you can use, I would just always convert to a set and make life easier. If you need to reproduce whatever you were given, you have to deal with the design as is.