PEP 712: Adding a "converter" parameter to dataclasses.field

PEP 712 has been merged.

It adds a converter parameter to dataclasses.field which should be helpful in eliminating boiler-plate when conversion semantics are warranted, and adds additional consistency and type-correctness for people using attrs and pydantic.

The two-ish large areas of discussion I expect are:

  1. The PEP proposes to run the converter on the default value. Whether to convert the default or not has really good use-cases for either choice. This way matches attrs semantics and means the default value looks no different than user-provided values.
  2. The impact on typing

Thanks for reading, discussing, and reviewing!

5 Likes

This is an issue that has annoyed me as well in the past, and I’m glad it is getting some attention. However, I would want to suggest a counter-proposal: add a @setattr decorator.

class Foo:

    @setattr
    def bar(self, obj: int | tuple[int, ...], /) -> tuple[int, ...]:
        return (obj,) if isinstance(obj, int) else tuple(obj)

which hooks into Foo.__setattr__ and, whenever key=bar, passes the value through this function instead, i.e. the above should be roughly equivalent to:

class Foo:
    def __setattr_bar(self, obj: int | tuple[int, ...], /) -> tuple[int, ...]:
        return (obj,) if isinstance(obj, int) else tuple(obj)

    def __setattr__(self, key, value):
        if key == "bar":
            super().__setattr__(key, self.__setattr_bar(value))
            return
        super().__setattr__(key, value)

A @dataclass decorated class could simply gobble up all @setattr decorated methods and bake them into the __init__. The advantage is that this would be multipurpose and could be used outside the context of dataclasses as well.

In particular, even for the current proposal one likely wants the converters to hook into __setattr__ anyway, in order to ensure that the conversion takes place when updating attributes later on.

I’m not sure what the right thing to do is, but some differences between the @setattr approach and the bare function approach:

  • bare functions don’t accept self. Having self is useful in case you want your converter to access member variables or even class methods (which could be very useful).
  • bare functions specify all of the field attributes in one place rather than having extraneous methods laying around.
  • the setattr approach allows the converter to be overridden in a derived class without re-specifying the entire field.

I think we want to have access to self if possible.

I’m mainly against set_attr approach for not being normal attrs way of doing it. I want dataclasses way to stay compatible with attrs as it allows evolving to attrs to be easier and makes dataclass_transform more useful. Having new set_attr way when other data class like libraries tend to go with converter approach hurts type standardization.

3 Likes

setattr doesn’t work for frozen dataclasses, unfortunately, as the internals must use object.__setattr__ as __setattr__ in the class raises exceptions.

Alternative to the setattr idea, you could just use properties (albeit at a cost).

I think in the end, the complexity likely won’t find a natural home in dataclasses.dataclass

I’m +1 on this, but one note:

Under motivation, it doesn’t mention __post_init__ – that’s where I do this kind of thing now. Not a big deal, it’s also awkward :slight_smile:

Also – I’m a heavy dataclass user, but not a static type hint user. And I think this proposal is a big plus for run-time type setting / enforcing.

I’m usually a big duck-typer, but there are times when I really need a particular type inside my function / class, but I don’t want to require user code to do the conversion first – most common for me is numpy arrays – I have a lot of code that does this:

input_arr = np.asarray(input_arr)

It’s be pretty cool to do that sort of thing automagically with a converter function in dataclasses.

2 Likes

Under “Impact on Typing”,

A converter must be a callable that accepts a single positional argument, and the parameter type corresponding to this positional argument provides the type of the the synthesized __init__ parameter associated with the field.

What if the converter’s positional argument is not typed? That is,

In other words, the argument provided for the converter parameter must be compatible with Callable[[T], X] where T is the input type for the converter and X is the output type of the converter.

can T be specified without providing a typed function (something that can’t be done with a lambda expression)?


The PEP does not explicitly address the interaction of a converter and a default_factory; I assume they should be composable, such that

x: Foo = field(default_factory=df, converter=cv)

should be equivalent to

x: Foo = field(default_factory=lambda a: cv(df(a)))

I’m still not convinced, though, that the converter should be applied to either the default or the default factory’s return value. Doing so changes the semantics of default and default_factory from providing field values to providing function arguments when a converter is present.

Having the converter applied to a default is a convenience that one could live without, but also something that could be difficult to work around. Consider existing code like

start: datetime = field(default=datetime.now())
stop: datetime = field(default=datetime.fromisoformat("2023-12-31")

If the converter is not applied to the default, the class designer can still easily write, at the cost of a slight repetition,

cv = datetime.fromisoformat
start: datetime = field(default=datetime.now(), converter=cv)
stop: datetime = field(default=cv("2023-12-31"), converter=cv)

but if the converter is applied, there’s a problem finding a suitable default for start, and one has to write a more complicated converter.

cv = datetime.fromisoformat
# start: datetime = field(default="???", converter=cv)
start: datetime = field(default=None, converter=lambda x: datetime.now() if x is None else cv(x))
stop: datetime = field(default="2023-05-03", converter=cv)

Under the “Rejected Ideas” section,

  1. Compatibility with attrs. Attrs unconditionally uses the converter to convert the default value.

As someone who doesn’t use the attrs package, I don’t find this a strong argument against leaving default values unconverted.

  1. Simpler defaults. Allowing the default value to have the same type as user-provided values means dataclass authors get the same conveniences as their callers.

As demonstrated in my example above, I don’t think the convenience of having a converter automatically applied when feasible outweighs the consequences of being unable to avoid the converter.

1 Like

In the case of a lambda, the arguments are assumed to be of type Any, so that would hold true here as well.
However, you could do something like converter=cast("Callable[[str], int]", lambda x: 0). In which case, the argument type must be a string. (Checked with Pyright version 1.1.306, which contains provisional support for this PEP)


Good flag. I’ll stew on that one. I suspect you are correct that we should be internally consistent. Also attrs exhibits this behavior:

>>> @attrs.define
... class C:
...     x = attrs.field(factory=lambda: "0", converter=int)
... 
>>> C()
C(x=0)

In your example, I suspect start’s field really should be field(default_factory=datetime.now) otherwise you’ll get the now() at construction time, right? Regardless, I see your point about the bug (and for posterity, this snippet is why it’s an issue:

>>> from datetime import datetime
>>> datetime.fromisoformat(datetime.now())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: fromisoformat: argument must be str

I personally started without default being converted because I like the type-security. Honestly it really seems damned-if-you-do, damned-if-you-don’t. I do lean towards converting the default for simplicity.

Ultimately though, the default must already be of the right type in Python today. Maybe we keep that trend?

So in that case, in order to not just absolutely hose attrs (which we should address, since it is very widely used even if you or I don’t use it), I think I’m going to suggest that we don’t convert the default for consistency. BUT we add additional semantics to dataclass_transform which allows library authors to declare whether their field_specificers convert default value (including those generated by default_factory) (

@erictraut Do you think that last paragraph is doable? Any ideas on whether it’s likely to be generally approved or rejected from a council perspective?

1 Like

I’ve had second thoughts. I may eat these words.

I assumed that a survey was done of existing dataclass-like libraries (including attrs) to understand the behavior of their converter parameter and how it interacts with other features. Was such survey completed? If not, I think someone should do that work before this PEP is finalized and submitted for consideration. (And yes, that probably falls on your shoulders, Joshua, since you’re the PEP’s author.) If you do this, you may want to include the results of this survey in an appendix (similar to what I did for PEP 695). I think this would bolster the case for the PEP.

If existing libraries (especially one as popular as attrs) have established a precedent for the behavior of converters and how they interact with default factories, then I think this PEP should follow that precedent unless there’s a really compelling reason not to. Creating an incompatibility would undermine most of the value that this PEP claims to provide. Adding options to dataclass_transform to support different behaviors seems like an unnecessary complexity if we can avoid introducing such an incompatibility in the first place.

4 Likes

Yeah, admittedly that should’ve been done prior to this, even though I don’t use any of them. My bad.

attrs/pydantic + SQLAlchemy/Django will be surveyed. Any I’m forgetting?

FWIW I’m actually more sympathetic to the not-convert-the-default argument, since it changes the semantics when just converter is added. E.g. if the user had x: str | None = field(default=None) and added converter=str, the behavior is now that x would be the string "None", which very well could surprise some who aren’t familiar with attrs (reminder that I wrote this PEP coming from a codebase that doesn’t use attrs).

I’m also going to throw out for discussion that attrs does convert attributes through __setattr__ semantics, so at the very least it should be addressed in this PEP. I’ll include that in the survey.

My recommended way of searching. Look over libraries that use dataclass_transform and pick 5 or so biggest ones

1 Like

As I expressed in the previous thread I’m not a fan of the default getting special treatment to avoid the converter. In my opinion if you have a default value, then providing that default value explicitly should give the same output as if you did not provide it.

Using your field example:

@dataclass
class C:
    x: str | None = field(default=None, converter=str)

I would expect C() and C(x=None) to have the same output [1]. This is useful in places where you have another function to indirectly create instances (like how field itself creates instances of Field).


  1. Both converting None to 'None' which is probably not useful, but is consistent in general - I don’t think str is a good converter in this case. ↩︎

(So I think we’re getting dangerously close to colliding with the overall discussion of PEP 661).

In Python there’s really nothing you can specify as a default that can’t also be specified by the user. I suppose it’s both a boon and a bane.

In this case, I expect the semantics that the author intended was “if no value was provided” by the user. And in Python that largely translates to arg=None. So although the user can provide None as a value, it doesn’t translate to the intent of the class author, who wanted a sentinel “if nothing was provided” value.

I think we ought to try and make it easy(-er) for code authors to translate intent into code, and this is one way to accomplish that.

I’m fully aware that there are good counter-points and examples of converting the default (the pathlib one resonates very well with me). I truly believe there is no “right” choice. Just a which-is-less-wrong one. Luckily not converting the default leaves type-checkers to help folks who wrote code assuming their default gets converted.

This all could be for not, though, if the survey shows all other usages of converters in libraries agree on converting the default. So let’s just hold our breath a bit and see what comes out of the other side.

1 Like

As defaults go, yes, that’s correct. Although you can get around it with *args and a forged signature.

I would argue that passing arg=None can be intended as a user to represent no value provided in the same way as the author. Otherwise the user has to check if a value is None or some other sentinel when using the class in order to create it correctly.

For example:

def make_classes(lx: list[SomeType | None]) -> list[C]:
   return [C(x=x) for x in lx]

Would have to be:

def make_classes(lx: list[SomeType | None]) -> list[C]:
    return [C(x=x) if x is not None else C() for x in lx]

Or something even more complicated when there are multiple arguments with converters as you have to check each one.

OK I did my best to bang on attrs, pydantic, django, and SQLAlchemy

Attrs Pydantic Django SQLAlchemy
Supports Converters :white_check_mark: :white_check_mark:* :x: :x:
Converts the “default” :white_check_mark: :x: - -
Converts the result of “default factory” :white_check_mark: :x: - -
Converts on attribute setting :white_check_mark: :x: - -

(* The conversion is implicit via Pydantic scraping the annotation)

So the only library with an explicit user-provided conversion is attrs, which does conversion of all possible values.

I don’t like that the conversion semantics of Pydantic aren’t modeled by this PEP, but they are most likely out-of-scope (unless we want to start doing acrobatics for dataclass_transform)

1 Like

So with that in mind, the clear path forward is unconditional conversion semantics of every value going into the attribute. That includes the default value, the value produced by default_factory, and values specified by the caller in both __init__ and attribute setting.

Although I am sympathetic to not converting default values, the internal consistency makes this simple to remember and understand, and the compatibility with attrs is a big bonus.

I’ll update the PEP ASAP.

2 Likes

OK PR posted: PEP 712: Now convert all incoming values by thejcannon · Pull Request #3152 · python/peps · GitHub

The only thing I think that’s possibly a point of debate is that attribute setting is subject to conversion.
This makes the typing analogous to a property with a typed setter, where the argument type is deduced from the converter.

This goes out of the window if the user is specifying their own __setattr__, but:

  • That should be rare
  • This issue isn’t unique to this PEP. The interaction between property setters and __setattr__ is generic for any class. Therefore typecheckers already handle this (or don’t) in their own way.