[Pyndantic, classes] How should I design/structure these classes with type hints and validation?

lmaurelli · July 23, 2024, 8:31am

I’m learning fastapi and pydantic these days at work, and I’m interested in solving this design problem I have:

I have a general class MyEntry whose aim is to hold some values (in particular two important ones: a and b). For instance:

from pydantic import BaseModel
class MyEntry(BaseModel):
    # naive attempt to define type hints and some sort of validation
    # I wish to be more granular with the hints and to provide custom validatio based on the the type
    a: str | int | bool | list | None = None
    b: = str | int | bool | list | None = None

    def is_filled(self):
        return ((self.a is not None) and (self.a != "")) or (
            (self.b is not None) and (self.b != "")
        )

Now I wish to define a custom entity/model for some business objects, for instance:

class MyPerson(BaseModel):
     name: MyEntry = MyEntry()
     age: MyEntry = MyEntry()

The problem is that I wish to have more specialized type hints and validation logics, for instance name should be of type str and should be checked to be less than 10 chars, and the age should be of type int and should be checked to be positive.
The class variables a and b both thold the same “business values” but a will be written by an user, while b will be parsed by an LLM.
I wish to design and structure this rightly at the start in order to be flexible later. For instance, I might enforce the validation on the creation of such entities only on the attribute a at first, and b later, or viceversa. How can I accomplish that with a nice pythonic code?

One idea of mine was to solve the problem like this, but I don’t know if I should define so many classes and stuff or I can be smarter with some pythonic concepts.

class Name(BaseModel):
     value: str | None = None

class NameEntry(MyEntry):
    a: Name = Name()
    b: Name = Name()

    def is_filled():
    # custom validation logic
    # still neeed to learn pyndatic methods and concepts (:
         pass

ajoino · July 23, 2024, 12:17pm

I don’t fully understand the problem you’re trying to solve here so I will try to help with what I can.

If you want a and b to be class variables you should wrap their type in typing.ClassVar to tell pydantic that these are class variables:

from typing import ClassVar
from pydantic import BaseModel

class MyEntry(BaseModel):
    a: ClassVar[str | int | bool | list | None] = None
    b: ClassVar[str | int | bool | list | None] = None

Regarding custom validators I think you can write your own, like in this page, but I haven’t really tried that myself so I can’t say if it works well or not. @samuelcolvin is fairly active on this forum so maybe he has something to add?

hansgeunsmeyer · July 23, 2024, 12:35pm

Jacob - I don’t think that’s a good idea in this case. Those variables are intended to function as instance variables, not class variables. I don’t know if it would break things to add ‘ClassVar’, but (in the best case) it can only add unnecessary complications to the code. Pydantic classes (BaseModel) already have code in place to deal correctly with the type hints - they work similar to dataclasses.

ajoino · July 23, 2024, 12:49pm

Yeah I was not sure if that’s what OP meant, but the post explicitly said “class variables” so I based the reply on that.

hansgeunsmeyer · July 23, 2024, 1:20pm

Personally I would be strongly in favor of your last idea. I think that is a pythonic way of tackling this problem

The problem is mainly a design problem. As I understand it, you mainly want to balance the (sometimes) conflicting values of simplicity (and clarity) versus flexibility (keeping options open for change later).
Other things being equal, I think it’s best to let simplicity win, since you never know if you will really need those extra options (or what exactly they may be). Simplicity also directly impacts code maintainability – which is a terribly important aspect often forgotten by people.

In this case, I think, you also want to offload as much work as possible to pydantic. Which means that you (should) want to minimize any custom validation code. Which means that if you need it, it may be best to compartementalize it as much as possible (in separate helper classes derived from BaseModel, exactly as you did at the end). Even a relatively simple function like “is_filled” in your initial MyEntry code seems too complex to me, too “messy” (it also doesn’t deal with all the possible types for ‘a’ and ‘b’; moreoever, it defeats the purpose of letting pydantic do the base validations). So, as I see it, those extra helper classes (which you considered as alternative) may add some extra bulk and some extra layers to the code, but ultimately they will make the code simpler (and just because of that also easier to maintain and change later).

franklinvp · July 23, 2024, 1:38pm

These can be done with

from typing_extensions import Annotated
from typing import Optional

from pydantic import BaseModel, StringConstraints, ValidationError, Field

class ShortStr(BaseModel):
    bar: Annotated[Optional[str], StringConstraints(max_length=10)] = None


class PositiveInt(BaseModel):
     bar: Annotated[Optional[int], Field(strict=True, gt=0)] = None

if __name__ == '__main__':
    foo_0 = ShortStr()
    foo_1 = ShortStr(bar='aaa')
    try:
        foo_2 = ShortStr(bar='aaaabbbbccc')
    except ValidationError as _:
        print(_)

    foo_3 = PositiveInt()
    foo_4 = PositiveInt(bar=1)
    try:
        foo_5 = PositiveInt(bar=0)
    except ValidationError as _:
        print(_)```

I didn’t understand what you are trying to do with the a and b, but guessing from your is_filled maybe what you want is to impose some conditions that relate both of these fields. This can be done with model_validator, for example

from typing import Optional

from pydantic import BaseModel, ValidationError, model_validator

class AtLeastOneSet(BaseModel):
    a: Optional[int] = None
    b: Optional[str] = None

    @model_validator(mode='after')
    @classmethod
    def _a_or_b_set(cls, values):
        if values.a is None and values.b in (None, ''):
            raise ValueError(
                f'At least one of `a` or `b` must be set.'
            )
        return values

if __name__ == '__main__':
    ab_1 = AtLeastOneSet(a=111, b='222')
    ab_2 = AtLeastOneSet(a=111)
    ab_3 = AtLeastOneSet(b='222')
    try:
        ab_4 = AtLeastOneSet(b='')
    except ValidationError as _:
        print(_)
    try:
        ab_5 = AtLeastOneSet()
    except ValidationError as _:
        print(_)

hansgeunsmeyer · July 23, 2024, 1:46pm

This is exactly what I had in mind - except I was too lazy to look up the technical details (and it has been too long ago for me to remember this) I think this is the way to base a custom validation model on pydantic and offload the grunt work to pydantic.

lmaurelli · July 24, 2024, 8:20am

Good morning!
@hansgeunsmeyer @ajoino I meant instance variables, sorry if I used a wrong terminology.

@franklinvp Thank you for your insights.

What is left is how should I design the base class MyEntry. To given more context, this class aims to hold two instance variables: a and b. Depending on the context, the values of a and b maybe be ShortStr or PositiveInt or some other custom field class with a custom validation logic and a custom is_filled method.

I wish now to define a “base” class that abstract these cases in order to define a “is_filled” method and other shared “protocols”. Moreover I wish to be able to enable/disable the validation logic on an instance of this class (the idea is that today I wish to enforce the validation on a but not on b or viceversa (for example, if a holds a human-edited value I wish to enforce the logic, otherwise if I know the value is parsed from some LLM I wish not to do it, for now.).

For instance, should I design like this:

from typing_extensions import Annotated
from typing import Optional
from pydantic import BaseModel, StringConstraints, ValidationError, Field

class ShortStr(BaseModel):
    value: Annotated[Optional[str], StringConstraints(max_length=10)] = None
    def is_filled(self):
        return self.value is not None and self.value is not ""
         
class PositiveInt(BaseModel):
     value: Annotated[Optional[int], Field(strict=True, gt=0)] = None
     def is_filled(self):
         return self.value is not None

class MyEntry(BaseModel):
    # is this needed if I want to overwrite the "types" in the below classes?
    a: ShortStr | PositiveInt | bool | list | None = None
    b: ShortStr | PositiveInt | bool | list | None = None

   def is_filled():
       return a.is_filled() or b.is_filled()

# does something like this exist?
@pyndatic.enable_validation("a")
class ShortStrEntry(MyEntry)
    # a needs validation, b does not
    a: ShortStr | None = None
    b: ShortStr | None = None
   
     # can I remove this since the logic is taken from the parent class?
     def is_filled():
         return a.is_filled() or b.is_filled()

# does something like this exist?
@pyndatic.enable_validation("b")
class PositiveIntEntry(MyEntry):
    # b needs validation, a does not
    a: PositiveInt | None = None
    b: PositiveInt | None = None
    
    # can I remove this since the logic is taken from the parent class?
    def is_filled():
        return a.is_filled() or b.is_filled()

ajoino · July 24, 2024, 9:13am

No worries, easy to get them mixed up when working with pydantic, dataclasses, and similar tools