At least in my use of dataclasses, using the field’s type must make up more than 95% of conversions, and it is also featured prominently in the PEP itself. I find the field: tp = field(..., converter=tp) notation a little unfortunate in how far it separates the repetition of the type, particularly if converter=... is going to be commonly put at the end of the field() keywords. (For example, I will almost surely change the field type without changing the converter.) Perhaps it’s worth thinking about converter accepting a boolean or callable, where converter=True means “use the field type”?
I agree it is quite repetitive, however it is also an incremental step and more PEPs can improve upon the design later.
I don’t want this useful feature to get stuck on the right way to reduce specifying the type twice, since how to do that has several meritful solutions.
So, all to this say, yeah I agree but let’s do that second
I think it’s generally a bad idea to conflate type annotations with callables.
The allowable types for that field (before conversion) are given by the converter’s parameter specification. In the simplest case of:
@dataclass
class D:
x: SomeClass = field(..., converter=SomeClass)
you can use SomeClass.__init__'s parameters. Using converter=True is reasonable here.
But consider what happens if you have instead a union or type variable:
@dataclass
class D[T]:
x: T
y: int | float
Neither of these is callable. (Type variables aren’t currently callable, and I think it would be a mistake to make them callable for reasons discussed in another thread.)
I think the redundancy is better than using a sentinel though. Python should cater to code readers—not writers. I think the point you raise about preventing an error where type is changed without changing the converter is fair, but I think that mental burden on readers remember what converter=True means is greater. Maybe if it were given a clearly named sentinel like converter=UseAnnotation it would be slightly clearer. I think simplicity is better here though.
While we’re on the subject, though. My 2c would be to deduce field type from the return type of the argument. I think it strikes the right balance and allows for flexible types (like int | None or tuple[str, ...]
I was actually thinking about that idea too You mean something like this, right?
class D:
x: FromConverter = field(..., converter=SomeCallable)
That avoids the problem of trying to call type annotations. My gut feeling is that it’s still too complex, but maybe someone will produce some motivating examples.
To make your idea less verbose, one could allow fields without type annotations if converter is specified. (In attrs, type annotations are optional, which makes me think that there is no runtime problem with doing this.)
That sounds like a good idea to me. It might be worth adding to the PEP since type checkers should anyway check that the converter’s return type is narrower or equal to the field type.
I think it’s worth noting that attrs has 2 modes, one where all fields must be annotated and one where every field must be declared with field(...). By default it uses the annotated mode unless there is an un-annotated field at which point it switches to requiring field.
So in this case using a plain field with a converter means all annotated fields require the use of field. Which is potentially a lot more boilerplate than putting the ‘type’ in twice for fields with converters.
Example showing this behaviour:
@define
class X:
annotated: str
not_annotated = field()
>>> X("test")
X(not_annotated='test')
My guess is this is at least partly due to there not being a combined method for working out ordering for mixed declarations of attributes. Annotated plain values don’t have the .counter used to sort field(...) attributes and un-annotated field(...) attributes don’t appear in __annotations__.
Is there even a good way to work out the order in the case of trying to support both declaration methods?
Either way, dataclasses currently only looks at __annotations__ (via inspect.get_annotations in 3.12) to find the fields so making it work for unannotated values would require rewriting the field detection logic. I’m not convinced that would be worth it if it’s only for field values with converters.
Yeah, I left it off the PEP because we don’t have to arrive at the perfect solution in one go. This change already cuts on boilerplate immensely, and trying to reduce the last 10% will likely cause much more discussion, hold up the PEP, and require additional code changes.
I can expand the rejected ideas section to lay this out explicitly
I don’t agree with that. It becomes substantially harder to implement the perfect solution, should there be one, or even just a good solution if it turned out to not be backwards-compatible with the first solution.
Given that the type hint and converter being one and the same is such a common occurence, it might be worth trying out a couple more options, and hearing a couple more thoughts. People are already bringing up good ideas here!
and that’s because a and c aren’t attributes of the class. They only exist in the annotations dict. Equivalently, bIS an attribute of the class, but doesn’t show in __annotations__.
So short of hacks, like tokenizing the class, or changing how the Python runtime exposes the annotated/unannotated attributes I’m not sure we even could make this work
Are the conversion arguments of attrsand others widely in use? The PEP only argues that other libraries have the capability, not whether anyone’s using them.
If they weren’t widely in use, would it change anything about the PEP? Trying to understand this in context. It seems like you might suggest something if not?
This PEP has downsides (added complexity in code and mental model, and in my opinion a step away from the simplicity of dataclasses), and I think significant usage is an argument against those downsides. The PEP itself doesn’t change, of course, but this information will inform the decision to accept the PEP
where distribution_parameter is a PEP 681 field specifier. The JaxRealArray type is just a jax.Array, which is similar to a Numpy array. By having such a narrow type, I can be sure that the parameters have shape and dtype.
However, it would be very convenient for calling code to be able to pass in Python floats and integers. So, this is the perfect use case for a converter.
We certainly can search, but I wouldn’t want us to incorrectly associate a low count with the behavior not being used/warranted. My own personal motivator for this doesn’t use attrs, but does need conversion semantics (we’d probably save a thousand lines of boilerplate code in one library). @NeilGirdhar just shared a similar case of not using attrs.
So I think it’d give us a lower bound, but not an upper one.
That being said 1.5k hits with the converter on the same line (not sure if a hit is a file or a line of code). I’m not sure how to search multiline.
The function str_or_none that is is used in one of the examples, is not defined anywhere. I can probably guess the implementation from the name, but I think it’s probably better to include it?