Questions about __post_init__ when unpickling

Hi, I would very much appreciate help to understand this behavior…

Why is __post_init__ called when unpickling Foo but not Bar?
Why is foo.a 0 in the __post_init__ after pickling a foo that already set a to 1 prior to pickling?
Here’s the output of the script that follows.

$ python test_dataclass_pickle.py

foo post_init self.a=0
foo post_init self.a=0
bar post_init self.a=0

from dataclasses import dataclass
import pickle

@dataclass
class Foo(Exception):
    a: int = 0
    def __post_init__(self):
        print(f'foo post_init {self.a=}')
        self.a = 1

@dataclass
class Bar:
    a: int = 0
    def __post_init__(self):
        print(f'bar post_init {self.a=}')
        self.a = 2

foo = pickle.loads(pickle.dumps(Foo()))
bar = pickle.loads(pickle.dumps(Bar()))

Thanks in advance!! I have been using dataclasses extensively, so I would like to keep learning more about how they work. If this is my first Python bug discovery, hooray! More likely, however, it’s explainable, which would be even better.

Thanks,
Pete

This actually isn’t related to dataclasses, it’s Exception. For some reason, __init__ gets called for subclasses of BaseException. dataclasses just puts a call to __post_init__ into that function. You can verify this with a simpler example:

class B(BaseException):
    def __init__(self, *args):
        super().__init__(*args)
        print("B.__init__", repr(self), args)

pickle.loads(pickle.dumps(B(1)))

I don’t know why this behavior exists, I would need to check the docs/source code for that.

1 Like

Thank you very much for the reply.
I’ll do more investigation.
It’s not just the call to __post_init__ that’s confusing - the order of things is also surprising; “post_init” seems to occur before initialization is complete. Here it gets the wrong (or at least surprising) value of 0 after unpickling; and the actual value of “a” is set later, presumably by the Exception’s methods after calling the dataclass machinery.
It would be helpful to get some guidance on the ins and outs when using dataclass with an Exception subclass. Anyway, for my purposes I think I know enough to handle the behavior in my own code now. Thank you.

@dataclass
class Foo(Exception):
    a: int = 0
    def __post_init__(self):
        print(f'foo post_init {self.a=}')
foo = Foo()
foo.a = 1
bar = pickle.loads(pickle.dumps(foo))
print(f'{bar.a=}')

output:
$ python test_dataclass_pickle.py
foo post_init self.a=0
foo post_init self.a=0
bar.a=1

Hi again. I added these methods to the dataclass subclass of Exception and now the unpickling proceeds without any surprises. Is there anything wrong, or anything to watch out for, with this approach?
Thank you.

@dataclass
class Foo(Exception):
    a: int = 0
    def __post_init__(self):
        print(f'foo post_init {self.a=}')
        self.a = 1

    # Provide reduce and reconstruct to prevent Python's Exception
    # from calling __init__ and __post_init__ again.
    # https://discuss.python.org/t/questions-about-post-init-when-unpickling/76828/3
    def __reduce__(self):
        return self._reconstruct, tuple([self.__dict__])

    @classmethod
    def _reconstruct(cls, state: dict):
        m = cls.__new__(cls)
        m.__dict__.update(state)
        return m

Maybe. I am not sure what kind of setup Exception/BaseException need to do.

In general, I would suggest to just not do this - don’t pickle exceptions and don’t make Exception subclasses dataclasses. Both of these require quite a bit of cooperation from the entire class hierarchy and you need to know what is going on.

So if you want to make sure that the above code is enough, you probably need to read the relevant CPython source code.

1 Like