I would find this useful. I have an application that takes JSON data and converts them to data classes and I do type conversions from JSON representations to more specific Python representations in __post_init__
, e.g. ISO date string to datetime
and literal 0 which comes through as an int
to float
.
Maybe Iām mis-understanding things, but I donāt think attrs has the behavior of āconverts on attribute settingā:
>>> @attr.s
... class C:
... x: int = attr.ib(converter=int)
...
>>> c = C("10")
>>> c
C(x=10)
>>> c.x = "20"
>>> c
C(x='20')
I couldnāt find this referenced in the attrs documentation, but maybe I wasnāt looking in the right place.
It does for the new-style APIs (define
, frozen
). I remember this being pointed out in the docs but they mustāve deteriorated. We strongly encourage new-style APIs.
Ah, thanks @Tinche. Hereās an example:
>>> from attrs import define, field
>>> @define
... class C:
... x: int = field(converter=int)
...
>>> c = C("10")
>>> c
C(x=10)
>>> c.x = "20"
>>> c
C(x=20)
I think the PEP should be updated to note this, and save someone whoās not as familiar with attrs (like me!) the time to research it.
Nothing like a miniature panic attack early on the morning I knew I tested it!
Iām happy to edit the PEP (assuming thatās kosher)
Another thing that occurs to me is interactions with pattern matching. What happens here?:
@dataclass
class Point:
x: int = field(converter=int)
y: int
match Point(x="0", y=0):
case Point(x="0", y=0):
print("Origin")
case Point():
print("Somewhere else")
case _:
print("Not a point")
Naively, Iād expect it to print āOriginā, not āSomewhere elseā. I realize the Point
in the case statement isnāt creating an object, but clearly itās meant to parallel object creation, at least visually.
This should be mentioned in the PEP: the match statement ignores any converter
. But it does, for example, respect init=False
.
I think class patterns have a pretty clear rule: they match attributes of the subject. Dataclasses, with or without converters, are not special here ā any more than custom user classes in which __init__
parameters donāt correlate with instance attributes [1].
As I user, whenever I see a class pattern in a match statement, I mentally translate it to a sequence of checks like āisinstance + hasattr + the attr value matches the subpatternā. In your example, I believe that case Point(x="0"): ...
in general should be read as a shorthand for:
elif (
isinstance(subject, Point)
and hasattr(subject, 'x')
and subject.x == "0"
):
...
This means that the behavior for dataclasses with converters should be clear enough: the result of accessing the attribute x
on the instance Point(x="0")
will be matched to the string "0"
. Since accessing the attribute returns an integer and 0!='0'
, the pattern will not match.
As far as I understand, dataclasses are only special in that they automatically generate a
__match_args__
, which I donāt think is relevant for this discussion ā©ļø
I disagree. The fact that the match statement uses syntax that looks like the initialiser is the important thing for me (and I believe it was a deliberate design choice as well). The discrepancy here wouldnāt be a complete disaster, but Iām pretty sure it would be a source of confusion and possibly bugs, so I think @ericvsmith is right and this should be explicitly discussed in the PEP.
Although this trip-up can happen with any class/object that converts inputs during initialization. e.g. a contrived example:
match int('0'):
case int(real='0'):
print("I'm zero")
case int(real=0):
print("No, I'M zero")
case _:
print("another int")
It isnāt ideal but itās also not really new behavior, but maybe this PEP would make such bugs more common by making this type of code more tempting to write.
I believe this is a less contrived example that still exhibits the same unwanted behavior:
match int('0'):
case int('0'): # will not match
print("I'm zero")
case _:
print("Another int") # will be printed
Specifically, the case int('0')
does not raise an exception: itās just a pattern that wonāt match, because what is means is the check isinstance(subject, int) and subject == '0'
. I believe that this behavior of the class pattern is a ship that has already sailed.
I think whatās at stake here is: should Python start avoiding __init__
arguments that donāt correlate with instance attributes just because match
is now part of the language? I donāt think so, but thatās definitely just my opinion.
All anyone is asking for is for it to be discussed in the PEP. Itās a question for the āhow do we teach thisā section, at a minimum, as that is precisely where non-intuitive behaviour should be called out explicitly.
In the PEP discussion, it might be worth pointing out that most type checkers should be able to catch this kind of error. For example,
match int('0'):
case int('0'): # PyRight says: pattern will never be matched for subject type "int"
print("I'm zero")
case _:
pass
Iāve just come across this PEP.
Overall I like it! I have two very nitty comments.
First of all, for reference: we use frozen dataclasses basically everywhere in Equinox, and we already have an extension to field
that adds a converter
argument. I think weāre basically doing the same thing as this PEP in all cases.*
- The interaction with
__post_init__
isnāt specified. From experience weāve found that conversion before__post_init__
is most useful. - Whether to have it run inside
cls.__init__
ortype(cls).__call__
is not discussed. From experience having it run insidecls.__init__
is most useful, as this makes it substantially easier for runtime type checking libraries ā weāve developed jaxtyping ā to perform their checks.
(*
Actually, ever-so-technically there is one discrepancy: in custom __init__
methods on frozen dataclasses, we allow the self.foo = bar
syntax, and in this case and unlike this PEP, we do perform conversion. Weāve found this an important for usability, and is our sole divergence from standard dataclasses, which normally mandate the use of object.__setattr__
. I donāt think this discrepancy really counts here, as this is already somewhere weāve made a concious choice to deviate from standard dataclasses.)
So, specifically the PEP says it runs during attribute assignment (or in __init__ for frozen data classes).
I think itās safe to assume attribute assignment happens in __init__ for non-frozen dataclasses. If it isnāt, I donāt think the PEP regarding value conversion would be the right place to specifically call out when it happens, since thatās a more generic behavior.
Glad you like the PEP though, and thanks for the suggestions!
In Pantsbuild,we used to have something similar (still using standard dataclasses), but I switched us to using it the way the docs suggest.
I miss the ergonomics of normal attribute assignment. Now you got me pondering a PEP for frozen_after_init
. Iāll probably run it in a new thread once this PEP is done.
Python Steering Council hat: Thanks for the well written PEP and thorough discussion here. We have reviewed and discussed this PEP and we are unfortunately not finding ourselves leaning towards accepting it today. (The question the SC would like to see answered to change our future selves mind is at the end)
Reasoning:
- A
dataclasses.field
converter adds complexity (additional spooky action at a distance and a concept going further than a ājust a structā mental model). - There are already multiple ways to do this even if they involve more lines (
__init__
, alternate constructors, etc). - For users who really want converters rather than the additional lines of code: They can already use third party dataclass-like libraries (presumably
attrs
) providing the feature today instead of waiting for CPython 3.13+.
One of our guiding general themes is that there is less reason for every feature to be done in the standard library now than in years past. Virtually all Python applications are built upon many third party packages from our ecosystem today.
Q: Is there a compelling reason for dataclass field converters to be in the standard library that weāre just missing?
It is good to see people piping up on this thread who do want the feature. It would be interesting to know how important that is and if you already do, or why you donāt, use something like attrs
today just to have it.
Weāll keep the steering-council pep-712 issue open for a while as a reminder to observe any further discussion (and let the next elected SC make the final decision).
-gps for the 2023 Python Steering Council
(Whereās the broken heart emoji reaction?)
Thanks for the response, and especially the explanation. Although itās a bummer, I think the answer is very fair and understandable.
I hope others do chime on on their specific use-case, especially since the ones that said theyād benefit already have chosen standard library dataclass
es over attrs
despite not having this feature (even though that means more hurdles and pain). Iāll chime in on ours (I donāt remember if I have already) in a separate comment.
They can already use third party dataclass-like libraries (presumably
attrs
) providing the feature today instead of waiting for CPython 3.13+.
Iāll say that since this PEP augments dataclass_transform
, I donāt think the choice is binary. On the project that motivated this PEP weād likely define a dataclass transform that simply handles conversion semantics and then forwards to dataclass
, with the expectation that once we upgrade to Python 3.13 weāll replace usage of the decorator with dataclass
. The ability to do so lies in the fact that type-checkers support typing_extensions
backports This was intentional on my part (maybe I shouldāve made it explicit on the PEP?)
And speaking of type-checkingā¦
One gap this decision remains open and unsolved is the type-semantics of any dataclass transform providing conversion semantics. This escapade started off as a question of augmenting dataclass_transform
to support this feature. Since type-checkers, without special-casing, donāt support attrs
ās conversion semantics when defined as a dataclass_transform
, it seems to me that if the answer to this PEP is āyou should probably just use attrs
ā, that we ought to plug the gap. Otherwise, we didnāt push the needle on any of the problems outlined in the āMotivationā section and told the community to use a sub-optimal solution.
So, can I ask for a gut-feel strawman from the (current) Steering Council on, if this PEP is likely to be rejected, a similar PEP scoped just to dataclass_transform
?
(Regardless, saying ānoā isnāt fun, but can certainly be necessary. Iām just glad the community is the way it is, from the strangers on these threads all the way up to the big bad Steering Council . Thanks for considering the PEP)
The use-case that motivated this PEP, pantsbuild, uses frozen dataclasses everywhere (current count is ~1200 instances). [1].
Although our application already has third-party dependencies, Iāve been arguing hard for whittling them away (ideally towards 0) for two reasons:
- Weāre a build tool, so security is objectively more important than other libraries/applications. Infect a build tool, and you can infect everything it builds.
- Every time our users want a new version, they are forced to download and install every one of our dependencies. Thatās wasteful
So the standard library is our friend.
As a side-note we use frozen dataclasses because we cache these objects to be re-used from multiple threads. PEP 703 is very very exciting to us! ā©ļø
Last thing Iāll say (for now, I promise ). Any rich type like this in the standard library quickly becomes a vocabulary type.
pathlib.Path
is my favorite example. Instead of APIs declaring they take paths as strings, and having to document this string represents a path, we now have a type that expresses that intent.
dataclass
es are such a powerful vocabulary type in themselves, and they allow you to further define very straightforward vocabulary types. Win-win. attrs
has the same semantics, but it being third-party means it isnāt in everyoneās vocabulary. As an API author, if Iām going to define an API, I want the most users to be able to easily understand and easily use, it doesnāt get better than standard library types.