Dataclasses - Make asdict/astuple faster by skipping deepcopy for objects where deepcopy(obj) is obj

DavidCEllis · March 9, 2023, 10:12pm

Currently when you call asdict or astuple on a dataclass, anything it contains that isn’t another dataclass, a list, a dict or a tuple/namedtuple gets thrown to deepcopy. Whether this is desirable or not doesn’t really matter as changing it now will probably break things and is not my goal here.

There are a number of basic types for which deepcopy(obj) is obj is True. In particular this covers most of the non-container types that are supported by the JSON encoder by default (I don’t think it includes enums).

From the copy module:

github.com

python/cpython/blob/58d761e5b5f253892a697941c71bac0159a1d74e/Lib/copy.py#L172-L189


      
          def _deepcopy_atomic(x, memo):
              return x
          d[types.NoneType] = _deepcopy_atomic
          d[types.EllipsisType] = _deepcopy_atomic
          d[types.NotImplementedType] = _deepcopy_atomic
          d[int] = _deepcopy_atomic
          d[float] = _deepcopy_atomic
          d[bool] = _deepcopy_atomic
          d[complex] = _deepcopy_atomic
          d[bytes] = _deepcopy_atomic
          d[str] = _deepcopy_atomic
          d[types.CodeType] = _deepcopy_atomic
          d[type] = _deepcopy_atomic
          d[range] = _deepcopy_atomic
          d[types.BuiltinFunctionType] = _deepcopy_atomic
          d[types.FunctionType] = _deepcopy_atomic
          d[weakref.ref] = _deepcopy_atomic
          d[property] = _deepcopy_atomic

For example this means:

from copy import deepcopy
x = 10**40  # Large enough to avoid interning
y = 10**40
print(x is y)  # False
print(deepcopy(x) is x)  # True

x = "a" * 5000  # Long enough to avoid interning
y = "a" * 5000
print(x is y)  # False
print(deepcopy(x) is x)  # True

To avoid the deepcopy overhead I’m proposing the function checks if the object is an instance of one of these types first and if so, returns the original object (as would be returned by deepcopy anyway).

Adding:

# Everything that uses _deepcopy_atomic directly
# This list could be reduced to the most common types if necessary
_ATOMIC_TYPES = {
    types.NoneType,
    types.EllipsisType,
    types.NotImplementedType,
    int,
    float,
    bool,
    complex,
    bytes,
    str,
    types.CodeType,
    type,
    range,
    types.BuiltinFunctionType,
    types.FunctionType,
    # weakref.ref,  # weakref is not currently imported by dataclasses directly
    property,
}

to the module and

if type(obj) in _ATOMIC_TYPES:
    return obj

to the start of _asdict_inner and _astuple_inner.

Checking that the type of the object is exactly one of those listed and not a subclass matches the behaviour of deepcopy so this shouldn’t change anything about the output of asdict.

Originally I was going to suggest putting the check at the end but it turns out that you get a much more noticeable benefit from checking these types before all of the other conditions (it halved the time taken in the best case).

On a ‘nice’ example where everything the dataclass contains is one of these types this change makes asdict significantly faster than the current implementation. In a mixed example containing various lists/dicts/dataclasses/ints/strs this was also faster but not by as much. As much as I’d love to say it was always faster, in the worst case where everything has to be deepcopied and nothing is a basic type it is very slightly slower due to the extra condition.

Code: Modification of dataclasses' `asdict` function to skip copying objects where deepcopy(obj) is obj · GitHub

Results (Python 3.11.2):

Best case asdict:
current=6.34s
new=2.63s
New method takes 41% of the time

Best case astuple:
current=5.67s
new=2.13s
New method takes 38% of the time

Worst case asdict:
current=3.01s
new=3.05s
New method takes 101% of the time

Worst case astuple:
current=2.96s
new=3.01s
New method takes 102% of the time

Mixed case asdict:
current=2.74
new=1.56
New method takes 57% of the time

I’d love some real world based benchmarks to use for a comparison but I don’t know of any I could use. (The mixed case example was based on the code orjson uses to claim it is 40-50x as fast for dataclasses.)

Given JSON serialization appears to have been the intended use case of asdict and the set of types that are unchanged includes most of the natively serializable types is this potentially a reasonable performance trade? Do I need more benchmarks first?

Note: I know there are other things that could improve performance to a greater degree but those have other caveats or things to consider. I also know other changes have been discussed before.

DavidCEllis · March 23, 2023, 9:21am

Didn’t realise changing the topic to more accurately represent the reason for doing this would move it up to the top.

In case it was missed this is intended to only skip deepcopy where it has no effect.

if deepcopy(obj) is obj is not guaranteed to be True this does not skip deepcopy.

The reason this could be useful is the set of these types include str/int/float/bool/None.

I did do some more testing and found that it’s noticeably faster to do the type check inside the construction loops for most of the containers. However this does make the code messier. Fork with changes. (While doing this I also noticed that constructing the dictionary directly if dict_factory=dict was also faster). The overall result was about 3x faster in the best case.

[Edit: Turns out there’s a ‘better’ best case of the only contents being a list of integers where this is closer to 10x faster. List of Decimal, ~3% slower].

Could make an issue/PR if this does seem like something that would be worth doing.

CAM-Gerlach · March 23, 2023, 11:00am

@ericvsmith @carljm You’re the currently listed experts on dataclasses. Thoughts?

carljm · March 23, 2023, 3:18pm

IMO the speed tradeoff here is worth it. The worst-case regression is very small and the speedups for common cases are large.

As you noted, the speed benefit here is mostly providing a shortcut fast path for some common types for whom we know none of the other cases apply (including the last-resort fallback of deepcopying). I’d prefer to frame it that way, and avoid framing it as “copy the entire list of types that use _deepcopy_atomic”, potentially leading to future useless PRs to try to make/keep the list an exact match. Thus e.g. I don’t think it’s worth adding an import of weakref. It’s probably fine to include all the other types, though.

DavidCEllis · March 23, 2023, 3:27pm

Yes, I agree it makes sense for the common types - I just didn’t really have a good reason to not include the others.

Edit: Side note that while not direct, you do currently depend on weakref transitively through copy (so there would be no performance impact from the import) - but I agree on not including it. (Similarly you transitively depend on recursive_repr and reprlib through functools.)

DavidCEllis · March 24, 2023, 12:14pm

I’ve made an issue for this:

github.com/python/cpython

Dataclasses - Improve the performance of asdict/astuple for common types and default values

opened 12:09PM - 24 Mar 23 UTC

DavidCEllis

type-feature

# Feature or enhancement Improve the performance of asdict/astuple in common …cases by making a shortcut for common types that are unaffected by deepcopy in the inner loop. Also special casing for the default `dict_factory=dict` to construct the dictionary directly. The goal here is to improve performance in common cases without significantly impacting less common cases, while not changing the API or output in any way. # Pitch In cases where a dataclass contains a lot of data of common python types (eg: bool/str/int/float) currently the inner loops for `asdict` and `astuple` require the values to be compared to check if they are dataclasses, namedtuples, lists, tuples, and then dictionaries before passing them to `deepcopy`. This proposes to special case and shortcut objects of types where `deepcopy` returns the object unchanged. It is much faster for these cases to instead check for them at the first opportunity and shortcut their return, skipping the recursive call and all of the other comparisons. In the case where this is being used prepare an object to serialize to JSON this can be quite significant as this covers most of the remaining types handled by the stdlib `json` module. Note: Anything that skips deepcopy with this alteration is already unchanged as`deepcopy(obj) is obj` is always True for these types. Currently when constructing the `dict` for a dataclass, a list of tuples is created and passed to the `dict_factory` constructor. In the case where the `dict_factory` constructor is the default - `dict` - it is faster to construct the dictionary directly. # Previous discussion Discussed here with a few more details and earlier examples: https://discuss.python.org/t/dataclasses-make-asdict-astuple-faster-by-skipping-deepcopy-for-objects-where-deepcopy-obj-is-obj/24662 # Code Details ## Types to skip deepcopy This is the current set of types to be checked for and shortcut returned, ordered in a way that I think makes more sense for `dataclasses` than the original ordering copied from the `copy` module. These are known to be safe to skip as they are all sent to `_deepcopy_atomic` (which returns the original object) in the `copy` module. ```python # Types for which deepcopy(obj) is known to return obj unmodified # Used to skip deepcopy in asdict and astuple for performance _ATOMIC_TYPES = { # Common JSON Serializable types types.NoneType, bool, int, float, complex, bytes, str, # Other types that are also unaffected by deepcopy types.EllipsisType, types.NotImplementedType, types.CodeType, types.BuiltinFunctionType, types.FunctionType, type, range, property, # weakref.ref, # weakref is not currently imported by dataclasses directly } ``` ## Function changes With that added the change is essentially replacing each instance of ```python _asdict_inner(v, dict_factory) ``` inside `_asdict_inner`, with ```python v if type(v) in _ATOMIC_TYPES else _asdict_inner(v, dict_factory) ``` Instances of subclasses of these types are not guaranteed to have `deepcopy(obj) is obj` so this checks specifically for instances of the base types. # Performance tests Test file: https://gist.github.com/DavidCEllis/a2c2ceeeeda2d1ac509fb8877e5fb60d Results on my development machine (not a perfectly stable test machine, but these differences are large enough). ## Main Current Main python branch: ``` Dataclasses asdict/astuple speed tests -------------------------------------- Python v3.12.0alpha6 GIT branch: main Test Iterations: 10000 List of Int case asdict: 5.80s Test Iterations: 1000 List of Decimal case asdict: 0.65s Test Iterations: 1000000 Basic types case asdict: 3.76s Basic types astuple: 3.48s Test Iterations: 100000 Opaque types asdict: 2.15s Opaque types astuple: 2.11s Test Iterations: 100 Mixed containers asdict: 3.66s Mixed containers astuple: 3.28s ``` ## Modified [Modified Branch](https://github.com/DavidCEllis/cpython/blob/faster_dataclasses_serialize/Lib/dataclasses.py): ``` Dataclasses asdict/astuple speed tests -------------------------------------- Python v3.12.0alpha6 GIT branch: faster_dataclasses_serialize Test Iterations: 10000 List of Int case asdict: 0.53s Test Iterations: 1000 List of Decimal case asdict: 0.68s Test Iterations: 1000000 Basic types case asdict: 1.33s Basic types astuple: 1.28s Test Iterations: 100000 Opaque types asdict: 2.14s Opaque types astuple: 2.13s Test Iterations: 100 Mixed containers asdict: 1.99s Mixed containers astuple: 1.84s ```

AlexWaygood · March 24, 2023, 12:37pm

Those are pretty impressive speedups. However, it does feel slightly strange to me to special-case deepcopying atomic types in dataclasses — if this is slowing down dataclasses, it’s surely slowing down other users of the copy module in the same way. If copy.py’s algorithm for deepcopying atomic types is slower than it needs to be, maybe we should just concentrate on optimising copy.py instead of special-casing the use of deepcopy in dataclasses?

Note that we have an open PR for optimising parts of deepcopy in C — it might be interesting to see whether that has an impact on the performance of asdict/astuple: https://github.com/python/cpython/pull/91610. (The PR is however currently stalled, as there is a sizeable number of open bug reports to do with the copy module, and it’s thought to be unwise to reimplement the module in C before those bugs are resolved.)

AlexWaygood · March 24, 2023, 1:11pm

Although looking at your branch, it looks like your optimisations might be more dataclasses-specific than I realised from your description above (you’re avoiding the recursive call to _asdict_inner as well as the call to deepcopy), in which case my comment above is moot.

DavidCEllis · March 24, 2023, 2:14pm

Yes, part of it is just skipping the dispatch machinery deepcopy uses, but the other major part is skipping the recursive call and all of the other checks.

The real reason it uses the list from deepcopy is because that’s what currently hits everything, and in these cases it’s possible to skip the call without changing the output.

My original thinking was about skipping things that were interned and hence guaranteed to be the same object. I then noticed that deep returned the same objects even if this wasn’t the case, which lead me to finding the set that gets sent to _deepcopy_atomic which simply returns the object.

There may be some optimisation for deepcopy by getting the dispatcher first and returning in the case that it’s _deepcopy_atomic but I haven’t investigated that.

carljm · March 24, 2023, 2:17pm

I had the same initial reaction, and then the same realization I think this optimization is really not as much about deepcopy as it is about just providing a fast path for some known common types in _as{dict,tuple}_inner – skipping the deepcopy call is just one part of that.

ericvsmith · March 24, 2023, 4:06pm

Maybe put the list of “atomic” types in the copy module, but the logic to use it in dataclasses. I could see this info being useful elsewhere. It doesn’t need to be public for starters.

DavidCEllis · March 24, 2023, 5:01pm

I suppose it’s possible to construct _ATOMIC_TYPES from copy

Something like:

_ATOMIC_TYPES = {
    typ 
    for typ, func in copy._deepcopy_dispatch.items() 
    if func is copy._deepcopy_atomic
}

Either inside the copy module or in dataclasses. I don’t know if the maintainers of copy want to export a list to use directly? (We would probably still need to make a set).

This does get weakref in, although it is a bit more opaque? Constructing the set directly makes the goal a bit clearer although I guess that could just be a comment.

(I did just realise I made a mistake and put complex and bytes under the JSON serializable comment when they aren’t. Will fix that if we keep the set in dataclasses.)

eendebakpt · March 24, 2023, 8:49pm

The astuple and asdict methods benefit from the deepcopy improvements in #91610, but the proposal here is still worthwhile. Some numbers (same benchmark as the OP, new is the implementation with the _ATOMIC_TYPES check inlined, simple is the implementation with the _ATOMIC_TYPES on top of the _as_dict_inner):

Best case asdict:

current=1.98s
new=0.68s
simple = 1.41s

PR 91610=1.58s
PR 91610 + new=0.69s
PR 91610 + simple=0.95s

Side note: any help from a core dev to make progress on the issues blocking PR 91610 would be appreciated

DavidCEllis · March 24, 2023, 9:10pm

Yes, for cases where deepcopy had no effect this change made more of a difference than removing deepcopy from the function entirely. Deepcopy improvements would be a nice benefit for more complex cases though.