Improving dataclasses startup performance

ericvsmith · May 2, 2022, 11:16pm

I’m posting this here because I don’t really want wider feedback yet. Please let me know your thoughts. I don’t have hard numbers yet on how slow dataclasses actually are to create, but I hear people complain about it regularly.

Motivation:

The @dataclass decorator is run every time a dataclass (the class, not an instance) is created. This slows down startup time. It would be “better” (for some definition of “better”) if the dataclass result could be “baked in” (for some definition of “baked in”) to the bytecode.

This problem isn’t worth solving: do nothing, status quo wins. This is always the easiest thing to do!

Note: In all of these following options, one big win would be that
__slots__ could be added to the class without requiring the @dataclass
decorator to create a new class just to add __slots__.

Write an external tool that generates the methods. This would
operate like Argument Clinic, in that you could re-run it to modify a
source file in place, in case the dataclass definition changed.
Pros: No impact to python implementations.
Cons: Clunky to use. Not very discoverable. Requires an extra build
step.
Use PEP 638 (syntactic macros) to modify the class’s AST to add
dataclass-generated methods. Of course, this would require PEP 638 to
be accepted and implemented first.
Pros: dataclasses would not be a special case.
Cons: Requires PEP 638, might open a can of worms. Implications for
other implementations.
Teach the compiler to recognize @dataclass and modify the class’s
AST to add the dataclass-generated methods. Cinder does essentially
this (although during bytecode generation).
Pros: Syntax looks the same as it currently does. Other implementations
could do something similar, or not.
Cons: dataclasses becomes a special case.
Create new syntax to specify a dataclass. For the sake of argument,
the syntax could be:

dataclass C:

would be equivalent to:

@dataclass
class C:

and

dataclass(repr=False) C:

would be equivalent to

@dataclass(repr=False)
class C:

Similarly:

dataclass(repr=False) C(BaseClass, metaclass=MyMetaClass):

would be equivalent to

@dataclass(repr=False)
class C(BaseClass, metaclass=MyMetaClass):

And

@fancy_decorator
dataclass C:

would be equivalent to

@fancy_decorator
@dataclass
class C:

Pros: dataclasses gets special treatment.
Cons: dataclasses gets special treatment. New syntax. Need to do
something about dataclasses.field and other module-level
functions.

brettcannon · May 2, 2022, 11:20pm

Why the inlining of the argument? If you did:

dataclass C(repr=False):

then it feels more like a “special” class via a different keyword versus some completely new thing.

ericvsmith · May 2, 2022, 11:23pm

Just so I don’t mingle dataclass arguments with base classes or with metaclass arguments. But I’m not married to any syntax.

steve.dower · May 2, 2022, 11:30pm

I like the external tool idea. It doesn’t even strictly have to be in the standard library, and could allow people to “backport” their dataclasses to earlier versions if that’s what they want (or even “target” a certain Python version’s implementation of dataclasses).

Might be worth looking at how people have used the ability to get the code for a given namedtuple. If there’s any tweaking at all done after converting it to a concrete class, then a tool that provides editable code as an output is a clear winner.

If it’s purely about performance, then I think option #3 is the best one for us: make the recommended patterns faster without needing code changes.

ericvsmith · May 2, 2022, 11:32pm

Yes, I would certainly do this outside of CPython, if only to decouple the releases.

taleinat · May 3, 2022, 6:38am

Seems relevant, there already exists an external tool that does at least some of this: Cython’s “cdef dataclasses”:

Docs: Extension Types — Cython 3.0.0a10 documentation
Example: cython/dataclass.pyx at master · cython/cython · GitHub
Original PR: cdef dataclasses by da-woods · Pull Request #3400 · cython/cython · GitHub

malemburg · May 3, 2022, 7:44am

Would it be possible to rewrite the whole dataclass logic in C ?

Compared to other data storage mechanisms in Python (e.g. namedtuples, slots, regular class instances), it’s currently one of the slowest - and that is not even taking into account the added startup time.

ericvsmith · May 3, 2022, 8:02am

I think the runtime aspects are a separate concern that should also be looked at.

And really the whole thing needs to be benchmarked first. Is it the fact that it imports and uses re or inspect or something else that’s the issue?

I’m already working on timing things. I’ll report back on what I find, but it will take a while.

malemburg · May 3, 2022, 8:16am

We had a talk comparing the performance of various data storage mechanisms in our user group recently and dataclasses ranked on the slow end. For the mixed use case (create, access 5 times), it was slower than regular class, slots and namedtuples.

(The video of the talk is available here, but it’s in German)

storchaka · May 3, 2022, 8:25am

There was a similar issue with namedtuple. It was significantly optimized since, although there is yet one call of slow eval(). It waits a hero for re-implementing in C.

What are narrow places in dataclass? Could they be optimized without making drastic changes?

stoneleaf · May 3, 2022, 3:32pm

If it imports re then it also imports enum, which is still written in Python and known to have a slow startup time.

nas · May 3, 2022, 3:53pm

When decorators were first being designed, I was hoping that they would work at compile time, taking an AST object as input and producing a modified AST as the result. That would make them more similar to Lisp or Scheme macros. For dataclasses, I thought it might be better rather than generating code to pass to eval() it would be better to generate an AST and pass to compile().

I think it would be nice if we optimized dataclasses somehow but it would be even better if it wasn’t by giving them special treatment. E.g. attrs should ideally be able the use the same enhancements. If we have a way to do evaluation before .pyc generation, re.compile could be another use.

Spit-ball idea: could we introduce a variation of the decorator syntax that works how I suggest (AST->AST), e.g.:

@@dataclass
class A:
   ...

The dataclass function would get passed the AST tree containing the class definition. The returned AST would be passed to compile(). For re.compile, perhaps something like:

UPPER_PAT = @@re.compile(r'[A-Z]')

An alternative to using different decorator syntax would be to use different import syntax for the decorator function. That might make more sense since the compiler has to do the import and so it’s special. Then you could just use the existing @ syntax for the decorator since the compiler knows it is a “compile time” decorator function.

Prototyping with a special .pyc compiler, like I do with Quixote’s PTL, would be a good way to prototype these kinds of things.

ericvsmith · May 3, 2022, 4:34pm

That sounds basically like PEP 638, Syntactic Macros PEP 638 – Syntactic Macros | peps.python.org

nas · May 3, 2022, 6:00pm

Yeah PEP 638 would do pretty much what I was thinking. It would make it easier to implement the macros, which might actually be a bad thing, IMHO. I think there’s a real risk of people going nuts with a macro feature and making incomprehensible DSLs. Some people think that style is a good idea, e.g. Forth. I think it’s bad if you are cooperating with others and not what we want for Python. OTOH, it seems good if we could implement things like f-strings, dataclasses, namedtuples, maybe even match…case without modifying the grammar and without doing eval() at runtime.

hynek · May 10, 2022, 4:45am

For obvious reasons, we’ve been thinking about this topic over at attrs for a long time.

I think practically speaking, it’s not worth to think about micro optimization around what modules get imported etc, because those numbers are comparatively miniscule.

The real problems only start once you have to build many classes within one process and have to eval their code very often, on every startup.

The standard CS solution here would be to have a way of caching the resulting classes like we do in pycs, but that’s currently not possible. attrs even already does a bit of caching, but can’t persist it and the gains are only noticeable if you have some weird use case of creating many identical classes.

It would be nice, if we could think about providing hooks for this kind of work first, before trying to transform dataclasses into…not classes. I’m sure there would be a lot of adjacent use cases that people could come up with.

guido · May 10, 2022, 5:29am

We should definitely do some (micro)benchmarking.

In the meantime, maybe David Beazley’s take can inspire? https://github.com/dabeaz/dataklasses

hynek · May 10, 2022, 5:45am

David does basically what attrs is doing (caching created methods), but it’s much faster because his code gen is only dependent on the number of arguments, because he’s only caring about positional arguments and names them f_0, f_1 etc:

github.com

dabeaz/dataklasses/blob/df31f4121dd7938a3933f89008a811dfd0b8520d/dataklasses.py#L20-L24


      
          @lru_cache
          def make_func_code(numfields):
              names = [ f'_{n}' for n in range(numfields) ]
              exec(func(names), globals(), d:={})
              return d.popitem()[1]

That makes the case of how often a method matches his cache a lot more frequent, than if you have to care about the names of arguments. That’s clever, but not great in production code.

guido · May 10, 2022, 1:59pm

An observation, from Mark Shannon: starting the compiler is slow, so exec() has a high fixed overhead. We could combine all the exec() calls together (per class) and get some of that overhead back.

—Guido

ericvsmith · May 10, 2022, 3:08pm

That’s a good idea. I’ll look at it and do some timing. It will require a bit of a rewrite on how methods are added.

brandtbucher · May 10, 2022, 3:11pm

I decided to take a look at this yesterday.

I’ve gotten a very rough version of code-caching working for dataclasses (all tests are passing). As a preliminary benchmark, the dataclasses test suite is running about twice as quickly… I haven’t timed @dataclass generation time specifically yet, but I suspect that it’s even better than that (the tests are doing a lot more than just generating dataclasses, and the generated code hasn’t changed at all).

Something like 95% of generated methods don’t actually go through exec, but just use code.replace to patch a few names and constants. I also expect that memory use has improved as well. I’ll be doing more cleanup and experiments on my branch today.