PEP 712: Adding a "converter" parameter to dataclasses.field

ntessore · June 10, 2023, 7:26pm

A comment on Rejected Ideas: Automatic conversion using the field’s type:

At least in my use of dataclasses, using the field’s type must make up more than 95% of conversions, and it is also featured prominently in the PEP itself. I find the field: tp = field(..., converter=tp) notation a little unfortunate in how far it separates the repetition of the type, particularly if converter=... is going to be commonly put at the end of the field() keywords. (For example, I will almost surely change the field type without changing the converter.) Perhaps it’s worth thinking about converter accepting a boolean or callable, where converter=True means “use the field type”?

thejcannon · June 10, 2023, 8:16pm

I agree it is quite repetitive, however it is also an incremental step and more PEPs can improve upon the design later.

I don’t want this useful feature to get stuck on the right way to reduce specifying the type twice, since how to do that has several meritful solutions.

So, all to this say, yeah I agree but let’s do that second

NeilGirdhar · June 10, 2023, 8:29pm

I think it’s generally a bad idea to conflate type annotations with callables.

The allowable types for that field (before conversion) are given by the converter’s parameter specification. In the simplest case of:

@dataclass
class D:
  x: SomeClass = field(..., converter=SomeClass)

you can use SomeClass.__init__'s parameters. Using converter=True is reasonable here.

But consider what happens if you have instead a union or type variable:

@dataclass
class D[T]:
  x: T
  y: int | float

Neither of these is callable. (Type variables aren’t currently callable, and I think it would be a mistake to make them callable for reasons discussed in another thread.)

I think the redundancy is better than using a sentinel though. Python should cater to code readers—not writers. I think the point you raise about preventing an error where type is changed without changing the converter is fair, but I think that mental burden on readers remember what converter=True means is greater. Maybe if it were given a clearly named sentinel like converter=UseAnnotation it would be slightly clearer. I think simplicity is better here though.

thejcannon · June 10, 2023, 9:58pm

While we’re on the subject, though. My 2c would be to deduce field type from the return type of the argument. I think it strikes the right balance and allows for flexible types (like int | None or tuple[str, ...]

NeilGirdhar · June 10, 2023, 10:17pm

I was actually thinking about that idea too You mean something like this, right?

class D:
  x: FromConverter = field(..., converter=SomeCallable)

That avoids the problem of trying to call type annotations. My gut feeling is that it’s still too complex, but maybe someone will produce some motivating examples.

tmk · June 12, 2023, 5:17pm

To make your idea less verbose, one could allow fields without type annotations if converter is specified. (In attrs, type annotations are optional, which makes me think that there is no runtime problem with doing this.)

NeilGirdhar · June 12, 2023, 10:12pm

That sounds like a good idea to me. It might be worth adding to the PEP since type checkers should anyway check that the converter’s return type is narrower or equal to the field type.

DavidCEllis · June 13, 2023, 9:45am

I think it’s worth noting that attrs has 2 modes, one where all fields must be annotated and one where every field must be declared with field(...). By default it uses the annotated mode unless there is an un-annotated field at which point it switches to requiring field.

So in this case using a plain field with a converter means all annotated fields require the use of field. Which is potentially a lot more boilerplate than putting the ‘type’ in twice for fields with converters.

Example showing this behaviour:

@define
class X:
    annotated: str
    not_annotated = field()

>>> X("test")
X(not_annotated='test')

My guess is this is at least partly due to there not being a combined method for working out ordering for mixed declarations of attributes. Annotated plain values don’t have the .counter used to sort field(...) attributes and un-annotated field(...) attributes don’t appear in __annotations__.

Is there even a good way to work out the order in the case of trying to support both declaration methods?

Either way, dataclasses currently only looks at __annotations__ (via inspect.get_annotations in 3.12) to find the fields so making it work for unannotated values would require rewriting the field detection logic. I’m not convinced that would be worth it if it’s only for field values with converters.

thejcannon · June 13, 2023, 12:53pm

Yeah, I left it off the PEP because we don’t have to arrive at the perfect solution in one go. This change already cuts on boilerplate immensely, and trying to reduce the last 10% will likely cause much more discussion, hold up the PEP, and require additional code changes.

I can expand the rejected ideas section to lay this out explicitly

ntessore · June 13, 2023, 1:42pm

I don’t agree with that. It becomes substantially harder to implement the perfect solution, should there be one, or even just a good solution if it turned out to not be backwards-compatible with the first solution.

Given that the type hint and converter being one and the same is such a common occurence, it might be worth trying out a couple more options, and hearing a couple more thoughts. People are already bringing up good ideas here!

ericvsmith · June 13, 2023, 2:56pm

Sorry I haven’t looked at the PEP draft, @thejcannon. It’s on my short list of things to do.

thejcannon · June 13, 2023, 5:08pm

Well to @DavidCEllis’ point, I’m not even sure it is technically feasible (without changing non-dataclass code).

Consider:

@dataclass
class Hmm:
    a: str
    b = field(...)
    c: str

the dataclass code needs to know the attribute names, in order, for the synthesized methods (like __init__).

Poking around the object you’ll get:

>>> print(Foo.__dict__)
{'__module__': '__main__', '__annotations__': {'a': <class 'str'>, 'c': <class 'str'>}, 'b': 1, '__dict__': <attribute '__dict__' of 'Foo' objects>, '__weakref__': <attribute '__weakref__' of 'Foo' objects>, '__doc__': None}

and that’s because a and c aren’t attributes of the class. They only exist in the annotations dict. Equivalently, b IS an attribute of the class, but doesn’t show in __annotations__.

So short of hacks, like tokenizing the class, or changing how the Python runtime exposes the annotated/unannotated attributes I’m not sure we even could make this work

EpicWink · June 13, 2023, 11:02pm

Are the conversion arguments of attrsand others widely in use? The PEP only argues that other libraries have the capability, not whether anyone’s using them.

A single GitHub code search should suffice

thejcannon · June 14, 2023, 2:12am

If they weren’t widely in use, would it change anything about the PEP? Trying to understand this in context. It seems like you might suggest something if not?

EpicWink · June 14, 2023, 7:07am

This PEP has downsides (added complexity in code and mental model, and in my opinion a step away from the simplicity of dataclasses), and I think significant usage is an argument against those downsides. The PEP itself doesn’t change, of course, but this information will inform the decision to accept the PEP

NeilGirdhar · June 14, 2023, 8:26am

I want to provide an example of where I would use the converter if it’s added.

In my Efax library, I have classes like this one. And it has fields like

    negative_rate: JaxRealArray = distribution_parameter(ScalarSupport())
    shape_minus_one: JaxRealArray = distribution_parameter(ScalarSupport())

where distribution_parameter is a PEP 681 field specifier. The JaxRealArray type is just a jax.Array, which is similar to a Numpy array. By having such a narrow type, I can be sure that the parameters have shape and dtype.

However, it would be very convenient for calling code to be able to pass in Python floats and integers. So, this is the perfect use case for a converter.

thejcannon · June 14, 2023, 11:32am

We certainly can search, but I wouldn’t want us to incorrectly associate a low count with the behavior not being used/warranted. My own personal motivator for this doesn’t use attrs, but does need conversion semantics (we’d probably save a thousand lines of boilerplate code in one library). @NeilGirdhar just shared a similar case of not using attrs.

So I think it’d give us a lower bound, but not an upper one.

That being said 1.5k hits with the converter on the same line (not sure if a hit is a file or a line of code). I’m not sure how to search multiline.

thejcannon · August 16, 2023, 8:48pm

This is the last round of discussion for PEP 712 – Adding a “converter” parameter to dataclasses.field | peps.python.org

The changes from PEP 712: Adding a "converter" parameter to dataclasses.field - #37 by thejcannon are mostly:

Impedance matching attrs by having all attribute assignment use the converter
- This includes the assignment inside of the synthesized __init__
- This also includes post-construction assignment (e.g. mydataclass.attrname = value)
cleaning up examples/prose
Mention of obfuscating the “input” type, and why that’s acceptable

Specific changes can be seen in the PR: PEP 712: Now convert all incoming values by thejcannon · Pull Request #3152 · python/peps · GitHub

tmk · August 16, 2023, 9:15pm

The function str_or_none that is is used in one of the examples, is not defined anywhere. I can probably guess the implementation from the name, but I think it’s probably better to include it?

thejcannon · August 16, 2023, 9:29pm

Fair! I don’t know it’s worth an entire PR, but I’ll definitely adjust this if I’m already making one.