I’m posting this here because I don’t really want wider feedback yet. Please let me know your thoughts. I don’t have hard numbers yet on how slow dataclasses actually are to create, but I hear people complain about it regularly.
Motivation:
The @dataclass decorator is run every time a dataclass (the class, not an instance) is created. This slows down startup time. It would be “better” (for some definition of “better”) if the dataclass result could be “baked in” (for some definition of “baked in”) to the bytecode.
This problem isn’t worth solving: do nothing, status quo wins. This is always the easiest thing to do!
Note: In all of these following options, one big win would be that __slots__ could be added to the class without requiring the @dataclass
decorator to create a new class just to add __slots__.
Write an external tool that generates the methods. This would
operate like Argument Clinic, in that you could re-run it to modify a
source file in place, in case the dataclass definition changed.
Pros: No impact to python implementations.
Cons: Clunky to use. Not very discoverable. Requires an extra build
step.
Use PEP 638 (syntactic macros) to modify the class’s AST to add
dataclass-generated methods. Of course, this would require PEP 638 to
be accepted and implemented first.
Pros: dataclasses would not be a special case.
Cons: Requires PEP 638, might open a can of worms. Implications for
other implementations.
Teach the compiler to recognize @dataclass and modify the class’s
AST to add the dataclass-generated methods. Cinder does essentially
this (although during bytecode generation).
Pros: Syntax looks the same as it currently does. Other implementations
could do something similar, or not.
Cons: dataclasses becomes a special case.
Create new syntax to specify a dataclass. For the sake of argument,
the syntax could be:
@dataclass(repr=False)
class C(BaseClass, metaclass=MyMetaClass):
And
@fancy_decorator
dataclass C:
would be equivalent to
@fancy_decorator
@dataclass
class C:
Pros: dataclasses gets special treatment.
Cons: dataclasses gets special treatment. New syntax. Need to do
something about dataclasses.field and other module-level
functions.
I like the external tool idea. It doesn’t even strictly have to be in the standard library, and could allow people to “backport” their dataclasses to earlier versions if that’s what they want (or even “target” a certain Python version’s implementation of dataclasses).
Might be worth looking at how people have used the ability to get the code for a given namedtuple. If there’s any tweaking at all done after converting it to a concrete class, then a tool that provides editable code as an output is a clear winner.
If it’s purely about performance, then I think option #3 is the best one for us: make the recommended patterns faster without needing code changes.
Would it be possible to rewrite the whole dataclass logic in C ?
Compared to other data storage mechanisms in Python (e.g. namedtuples, slots, regular class instances), it’s currently one of the slowest - and that is not even taking into account the added startup time.
We had a talk comparing the performance of various data storage mechanisms in our user group recently and dataclasses ranked on the slow end. For the mixed use case (create, access 5 times), it was slower than regular class, slots and namedtuples.
(The video of the talk is available here, but it’s in German)
There was a similar issue with namedtuple. It was significantly optimized since, although there is yet one call of slow eval(). It waits a hero for re-implementing in C.
What are narrow places in dataclass? Could they be optimized without making drastic changes?
When decorators were first being designed, I was hoping that they would work at compile time, taking an AST object as input and producing a modified AST as the result. That would make them more similar to Lisp or Scheme macros. For dataclasses, I thought it might be better rather than generating code to pass to eval() it would be better to generate an AST and pass to compile().
I think it would be nice if we optimized dataclasses somehow but it would be even better if it wasn’t by giving them special treatment. E.g. attrs should ideally be able the use the same enhancements. If we have a way to do evaluation before .pyc generation, re.compile could be another use.
Spit-ball idea: could we introduce a variation of the decorator syntax that works how I suggest (AST->AST), e.g.:
@@dataclass
class A:
...
The dataclass function would get passed the AST tree containing the class definition. The returned AST would be passed to compile(). For re.compile, perhaps something like:
UPPER_PAT = @@re.compile(r'[A-Z]')
An alternative to using different decorator syntax would be to use different import syntax for the decorator function. That might make more sense since the compiler has to do the import and so it’s special. Then you could just use the existing @ syntax for the decorator since the compiler knows it is a “compile time” decorator function.
Prototyping with a special .pyc compiler, like I do with Quixote’s PTL, would be a good way to prototype these kinds of things.
Yeah PEP 638 would do pretty much what I was thinking. It would make it easier to implement the macros, which might actually be a bad thing, IMHO. I think there’s a real risk of people going nuts with a macro feature and making incomprehensible DSLs. Some people think that style is a good idea, e.g. Forth. I think it’s bad if you are cooperating with others and not what we want for Python. OTOH, it seems good if we could implement things like f-strings, dataclasses, namedtuples, maybe even match…case without modifying the grammar and without doing eval() at runtime.
For obvious reasons, we’ve been thinking about this topic over at attrs for a long time.
I think practically speaking, it’s not worth to think about micro optimization around what modules get imported etc, because those numbers are comparatively miniscule.
The real problems only start once you have to build many classes within one process and have to eval their code very often, on every startup.
The standard CS solution here would be to have a way of caching the resulting classes like we do in pycs, but that’s currently not possible. attrs even already does a bit of caching, but can’t persist it and the gains are only noticeable if you have some weird use case of creating many identical classes.
It would be nice, if we could think about providing hooks for this kind of work first, before trying to transform dataclasses into…not classes. I’m sure there would be a lot of adjacent use cases that people could come up with.
David does basically what attrs is doing (caching created methods), but it’s much faster because his code gen is only dependent on the number of arguments, because he’s only caring about positional arguments and names them f_0, f_1 etc:
That makes the case of how often a method matches his cache a lot more frequent, than if you have to care about the names of arguments. That’s clever, but not great in production code.
An observation, from Mark Shannon: starting the compiler is slow, so exec() has a high fixed overhead. We could combine all the exec() calls together (per class) and get some of that overhead back.
I’ve gotten a very rough version of code-caching working for dataclasses (all tests are passing). As a preliminary benchmark, the dataclasses test suite is running about twice as quickly… I haven’t timed @dataclass generation time specifically yet, but I suspect that it’s even better than that (the tests are doing a lot more than just generating dataclasses, and the generated code hasn’t changed at all).
Something like 95% of generated methods don’t actually go through exec, but just use code.replace to patch a few names and constants. I also expect that memory use has improved as well. I’ll be doing more cleanup and experiments on my branch today.