Dataclasses - Make asdict/astuple faster by skipping deepcopy for objects where deepcopy(obj) is obj

Currently when you call asdict or astuple on a dataclass, anything it contains that isn’t another dataclass, a list, a dict or a tuple/namedtuple gets thrown to deepcopy. Whether this is desirable or not doesn’t really matter as changing it now will probably break things and is not my goal here.

There are a number of basic types for which deepcopy(obj) is obj is True. In particular this covers most of the non-container types that are supported by the JSON encoder by default (I don’t think it includes enums).

From the copy module:

For example this means:

from copy import deepcopy
x = 10**40  # Large enough to avoid interning
y = 10**40
print(x is y)  # False
print(deepcopy(x) is x)  # True

x = "a" * 5000  # Long enough to avoid interning
y = "a" * 5000
print(x is y)  # False
print(deepcopy(x) is x)  # True

To avoid the deepcopy overhead I’m proposing the function checks if the object is an instance of one of these types first and if so, returns the original object (as would be returned by deepcopy anyway).

Adding:

# Everything that uses _deepcopy_atomic directly
# This list could be reduced to the most common types if necessary
_ATOMIC_TYPES = {
    types.NoneType,
    types.EllipsisType,
    types.NotImplementedType,
    int,
    float,
    bool,
    complex,
    bytes,
    str,
    types.CodeType,
    type,
    range,
    types.BuiltinFunctionType,
    types.FunctionType,
    # weakref.ref,  # weakref is not currently imported by dataclasses directly
    property,
}

to the module and

if type(obj) in _ATOMIC_TYPES:
    return obj

to the start of _asdict_inner and _astuple_inner.

Checking that the type of the object is exactly one of those listed and not a subclass matches the behaviour of deepcopy so this shouldn’t change anything about the output of asdict.

Originally I was going to suggest putting the check at the end but it turns out that you get a much more noticeable benefit from checking these types before all of the other conditions (it halved the time taken in the best case).

On a ‘nice’ example where everything the dataclass contains is one of these types this change makes asdict significantly faster than the current implementation. In a mixed example containing various lists/dicts/dataclasses/ints/strs this was also faster but not by as much. As much as I’d love to say it was always faster, in the worst case where everything has to be deepcopied and nothing is a basic type it is very slightly slower due to the extra condition.

Code: Modification of dataclasses' `asdict` function to skip copying objects where deepcopy(obj) is obj · GitHub

Results (Python 3.11.2):

Best case asdict:
current=6.34s
new=2.63s
New method takes 41% of the time

Best case astuple:
current=5.67s
new=2.13s
New method takes 38% of the time

Worst case asdict:
current=3.01s
new=3.05s
New method takes 101% of the time

Worst case astuple:
current=2.96s
new=3.01s
New method takes 102% of the time

Mixed case asdict:
current=2.74
new=1.56
New method takes 57% of the time

I’d love some real world based benchmarks to use for a comparison but I don’t know of any I could use. (The mixed case example was based on the code orjson uses to claim it is 40-50x as fast for dataclasses.)

Given JSON serialization appears to have been the intended use case of asdict and the set of types that are unchanged includes most of the natively serializable types is this potentially a reasonable performance trade? Do I need more benchmarks first?


Note: I know there are other things that could improve performance to a greater degree but those have other caveats or things to consider. I also know other changes have been discussed before.

4 Likes

Didn’t realise changing the topic to more accurately represent the reason for doing this would move it up to the top.

In case it was missed this is intended to only skip deepcopy where it has no effect.

if deepcopy(obj) is obj is not guaranteed to be True this does not skip deepcopy.

The reason this could be useful is the set of these types include str/int/float/bool/None.

I did do some more testing and found that it’s noticeably faster to do the type check inside the construction loops for most of the containers. However this does make the code messier. Fork with changes. (While doing this I also noticed that constructing the dictionary directly if dict_factory=dict was also faster). The overall result was about 3x faster in the best case.

[Edit: Turns out there’s a ‘better’ best case of the only contents being a list of integers where this is closer to 10x faster. List of Decimal, ~3% slower].

Could make an issue/PR if this does seem like something that would be worth doing.

@ericvsmith @carljm You’re the currently listed experts on dataclasses. Thoughts?

IMO the speed tradeoff here is worth it. The worst-case regression is very small and the speedups for common cases are large.

As you noted, the speed benefit here is mostly providing a shortcut fast path for some common types for whom we know none of the other cases apply (including the last-resort fallback of deepcopying). I’d prefer to frame it that way, and avoid framing it as “copy the entire list of types that use _deepcopy_atomic”, potentially leading to future useless PRs to try to make/keep the list an exact match. Thus e.g. I don’t think it’s worth adding an import of weakref. It’s probably fine to include all the other types, though.

1 Like

Yes, I agree it makes sense for the common types - I just didn’t really have a good reason to not include the others.

Edit: Side note that while not direct, you do currently depend on weakref transitively through copy (so there would be no performance impact from the import) - but I agree on not including it. (Similarly you transitively depend on recursive_repr and reprlib through functools.)

I’ve made an issue for this:

2 Likes

Those are pretty impressive speedups. However, it does feel slightly strange to me to special-case deepcopying atomic types in dataclasses — if this is slowing down dataclasses, it’s surely slowing down other users of the copy module in the same way. If copy.py’s algorithm for deepcopying atomic types is slower than it needs to be, maybe we should just concentrate on optimising copy.py instead of special-casing the use of deepcopy in dataclasses?

Note that we have an open PR for optimising parts of deepcopy in C — it might be interesting to see whether that has an impact on the performance of asdict/astuple: https://github.com/python/cpython/pull/91610. (The PR is however currently stalled, as there is a sizeable number of open bug reports to do with the copy module, and it’s thought to be unwise to reimplement the module in C before those bugs are resolved.)

1 Like

Although looking at your branch, it looks like your optimisations might be more dataclasses-specific than I realised from your description above (you’re avoiding the recursive call to _asdict_inner as well as the call to deepcopy), in which case my comment above is moot.

1 Like

Yes, part of it is just skipping the dispatch machinery deepcopy uses, but the other major part is skipping the recursive call and all of the other checks.

The real reason it uses the list from deepcopy is because that’s what currently hits everything, and in these cases it’s possible to skip the call without changing the output.

My original thinking was about skipping things that were interned and hence guaranteed to be the same object. I then noticed that deep returned the same objects even if this wasn’t the case, which lead me to finding the set that gets sent to _deepcopy_atomic which simply returns the object.

There may be some optimisation for deepcopy by getting the dispatcher first and returning in the case that it’s _deepcopy_atomic but I haven’t investigated that.

2 Likes

I had the same initial reaction, and then the same realization :slight_smile: I think this optimization is really not as much about deepcopy as it is about just providing a fast path for some known common types in _as{dict,tuple}_inner – skipping the deepcopy call is just one part of that.

1 Like

Maybe put the list of “atomic” types in the copy module, but the logic to use it in dataclasses. I could see this info being useful elsewhere. It doesn’t need to be public for starters.

I suppose it’s possible to construct _ATOMIC_TYPES from copy

Something like:

_ATOMIC_TYPES = {
    typ 
    for typ, func in copy._deepcopy_dispatch.items() 
    if func is copy._deepcopy_atomic
}

Either inside the copy module or in dataclasses. I don’t know if the maintainers of copy want to export a list to use directly? (We would probably still need to make a set).

This does get weakref in, although it is a bit more opaque? Constructing the set directly makes the goal a bit clearer although I guess that could just be a comment.

(I did just realise I made a mistake and put complex and bytes under the JSON serializable comment when they aren’t. Will fix that if we keep the set in dataclasses.)

The astuple and asdict methods benefit from the deepcopy improvements in #91610, but the proposal here is still worthwhile. Some numbers (same benchmark as the OP, new is the implementation with the _ATOMIC_TYPES check inlined, simple is the implementation with the _ATOMIC_TYPES on top of the _as_dict_inner):

Best case asdict:

current=1.98s
new=0.68s
simple = 1.41s

PR 91610=1.58s
PR 91610 + new=0.69s
PR 91610 + simple=0.95s

Side note: any help from a core dev to make progress on the issues blocking PR 91610 would be appreciated

Yes, for cases where deepcopy had no effect this change made more of a difference than removing deepcopy from the function entirely. Deepcopy improvements would be a nice benefit for more complex cases though.