Would anyone use an optional dataclass kw_only __post_init__?

I’m relatively new to a lot of Python features, and found myself updating an codebase to use dataclasses’ InitVar.

When looking into how to pass the args, I came across this old issue. TLDR: InitVars are passed to post_init by position, not keyword.

For backwards compatibility reasons, it’s probably too late to change the default behavior to pass by keyword. However, I excitedly implemented a simple way to tell dataclass to pass InitVars via keyword, using def __post_init__(self, *, initvar1, initvar2…

EDIT] (added example for better explanation)

# new ability, notice the * in def
# called as __post_init__(self, x=x, y=y)
@dataclass
class Foo:
    x: InitVar[int]
    y: InitVar[int]
    def __post_init__(self, *, y, x):
        print(f”{x=}, {y=}”)

# backwards compatible
# post_init without *arg collector 
# __post_init__(self, x, y)
@dataclass
class Foo:
    x: InitVar[int]
    y: InitVar[int]
    def __post_init__(self, my_x, my_y):
        print(my_x, my_y)

[\End Edit]

My question: would anyone actually use it, or is this the case of “leave good enough alone”?

On the plus side, you can safely reorganize your InitVars without worrying about modifying post_init. Another (small plus): throw a “*” in post_init and compare a debug print to make sure your args in the right order?

How important is that for your codebase?

1 Like

I didn’t make this point on the issue, but I should have: I think a better design would have been to always use named args instead of positional. But sadly the decision was made to use positional (note my weasel words using passive voice!). If we can support both without paying too large a performance penalty, that’s good. Plus backwards compatibility, of course. What happens if someone uses __post_init__ names that don’t match the field names?

Not at a computer to retest, but should be TypeError: __post_init__() got an unexpected keyword argument ‘keyword’

That’s a backward compatibility change, then.

Even if the user has to add “*” to their post_init?

Like I wouldn’t expect a regular function to work if I did:


def sum(*, a, b):
    pass

s = sum(a=1, c=2)

But also, it’s not like there are a ton of complaints about dataclasses current positional behavior, so I’m happy to accept it :slight_smile:

No, it wouldn’t be a concern if they added “*”. My concern is that you’d break existing code like:

@dataclass
class Foo:
    x: InitVar[int]
    y: InitVar[int]
    def __post_init__(self, my_x, my_y):
        print(my_x, my_y)

Im sorry if I wasn’t more clear earlier [added examples to top post]. Current behavior stays the same. Without the “*”, args are passed positionally.

It only passes then by keyword if:

  1. there are args to pass (other than class self)
  2. the second arg is KEYWORD_ONLY, and I believe that’s only possible if you soak up the args with an “*” after self in the post_init definition (otherwise their param.kind is POSITIONAL_OR_KEYWORD)

Hey there,
I’m sorry for the jump-up late.
I have been following this issue for a while since I faced this problem in my code, which very much relies on frozen dataclasses (and __post_init__ is highly useful to work with them).
When I read @ericvsmith 's comment, I felt this is a kind of ad-hoc solution and not very Python-ish.
I decided to implement another solution without, breaking backward compatibility. It’s also not perfect, but it felt like a lesser evil.
I would highly appreciate your opinion and comments there.
I am looking forward to hearing back from you.
Best wishes,
Sergei

Hacking on the dataclass lib is fun!

One of the reasons I chose a the def __post_init__(self, *, initvar1): convention, is it allows a very short diff, I didn’t need to change any tests, and (after re-measuring), I don’t notice super noticeable performance hit.

I ran python 3.10.9 dataclass (which just happens to be my ide’s version), against @szobov version, and my version using this super simple benchmark I found:

Benchmarking python 3.10.9 dataclass::

  • define: 516.48 μs
  • init: 0.51 μs
  • equality: 0.24 μs
  • order: 0.26 μs

Benchmarking @szobov dataclasses:

  • define: 640.93 μs
  • init: 0.99 μs
  • equality: 0.11 μs
  • order: 0.28 μs

Benchmarking @ssweber “*” kw-only version:

  • define: 522.71 μs
  • init: 0.52 μs
  • equality: 0.11 μs
  • order: 0.26 μs

Benchmarking @brandtbucher cached dataclasses:

  • define: 175.64 μs
  • init: 0.49 μs
  • equality: 0.22 μs
  • order: 0.24 μs