Improving dataclasses startup performance

Just so I don’t mingle dataclass arguments with base classes or with metaclass arguments. But I’m not married to any syntax.

1 Like

I like the external tool idea. It doesn’t even strictly have to be in the standard library, and could allow people to “backport” their dataclasses to earlier versions if that’s what they want (or even “target” a certain Python version’s implementation of dataclasses).

Might be worth looking at how people have used the ability to get the code for a given namedtuple. If there’s any tweaking at all done after converting it to a concrete class, then a tool that provides editable code as an output is a clear winner.

If it’s purely about performance, then I think option #3 is the best one for us: make the recommended patterns faster without needing code changes.

1 Like

Yes, I would certainly do this outside of CPython, if only to decouple the releases.

2 Likes

Seems relevant, there already exists an external tool that does at least some of this: Cython’s “cdef dataclasses”:

Docs: Extension Types — Cython 3.0.0a10 documentation
Example: cython/dataclass.pyx at master · cython/cython · GitHub
Original PR: cdef dataclasses by da-woods · Pull Request #3400 · cython/cython · GitHub

2 Likes

Would it be possible to rewrite the whole dataclass logic in C ?

Compared to other data storage mechanisms in Python (e.g. namedtuples, slots, regular class instances), it’s currently one of the slowest - and that is not even taking into account the added startup time.

1 Like

I think the runtime aspects are a separate concern that should also be looked at.

And really the whole thing needs to be benchmarked first. Is it the fact that it imports and uses re or inspect or something else that’s the issue?

I’m already working on timing things. I’ll report back on what I find, but it will take a while.

1 Like

We had a talk comparing the performance of various data storage mechanisms in our user group recently and dataclasses ranked on the slow end. For the mixed use case (create, access 5 times), it was slower than regular class, slots and namedtuples.

(The video of the talk is available here, but it’s in German)

1 Like

There was a similar issue with namedtuple. It was significantly optimized since, although there is yet one call of slow eval(). It waits a hero for re-implementing in C.

What are narrow places in dataclass? Could they be optimized without making drastic changes?

1 Like

If it imports re then it also imports enum, which is still written in Python and known to have a slow startup time.

1 Like

When decorators were first being designed, I was hoping that they would work at compile time, taking an AST object as input and producing a modified AST as the result. That would make them more similar to Lisp or Scheme macros. For dataclasses, I thought it might be better rather than generating code to pass to eval() it would be better to generate an AST and pass to compile().

I think it would be nice if we optimized dataclasses somehow but it would be even better if it wasn’t by giving them special treatment. E.g. attrs should ideally be able the use the same enhancements. If we have a way to do evaluation before .pyc generation, re.compile could be another use.

Spit-ball idea: could we introduce a variation of the decorator syntax that works how I suggest (AST->AST), e.g.:

@@dataclass
class A:
   ...

The dataclass function would get passed the AST tree containing the class definition. The returned AST would be passed to compile(). For re.compile, perhaps something like:

UPPER_PAT = @@re.compile(r'[A-Z]')

An alternative to using different decorator syntax would be to use different import syntax for the decorator function. That might make more sense since the compiler has to do the import and so it’s special. Then you could just use the existing @ syntax for the decorator since the compiler knows it is a “compile time” decorator function.

Prototyping with a special .pyc compiler, like I do with Quixote’s PTL, would be a good way to prototype these kinds of things.

2 Likes

That sounds basically like PEP 638, Syntactic Macros PEP 638 – Syntactic Macros | peps.python.org

2 Likes

Yeah PEP 638 would do pretty much what I was thinking. It would make it easier to implement the macros, which might actually be a bad thing, IMHO. I think there’s a real risk of people going nuts with a macro feature and making incomprehensible DSLs. Some people think that style is a good idea, e.g. Forth. I think it’s bad if you are cooperating with others and not what we want for Python. OTOH, it seems good if we could implement things like f-strings, dataclasses, namedtuples, maybe even match…case without modifying the grammar and without doing eval() at runtime.

2 Likes

For obvious reasons, we’ve been thinking about this topic over at attrs for a long time.

I think practically speaking, it’s not worth to think about micro optimization around what modules get imported etc, because those numbers are comparatively miniscule.

The real problems only start once you have to build many classes within one process and have to eval their code very often, on every startup.

The standard CS solution here would be to have a way of caching the resulting classes like we do in pycs, but that’s currently not possible. attrs even already does a bit of caching, but can’t persist it and the gains are only noticeable if you have some weird use case of creating many identical classes.

It would be nice, if we could think about providing hooks for this kind of work first, before trying to transform dataclasses into…not classes. I’m sure there would be a lot of adjacent use cases that people could come up with.

3 Likes

We should definitely do some (micro)benchmarking.

In the meantime, maybe David Beazley’s take can inspire? https://github.com/dabeaz/dataklasses

1 Like

David does basically what attrs is doing (caching created methods), but it’s much faster because his code gen is only dependent on the number of arguments, because he’s only caring about positional arguments and names them f_0, f_1 etc:

That makes the case of how often a method matches his cache a lot more frequent, than if you have to care about the names of arguments. That’s clever, but not great in production code.

1 Like

An observation, from Mark Shannon: starting the compiler is slow, so exec() has a high fixed overhead. We could combine all the exec() calls together (per class) and get some of that overhead back.

—Guido

That’s a good idea. I’ll look at it and do some timing. It will require a bit of a rewrite on how methods are added.

I decided to take a look at this yesterday.

I’ve gotten a very rough version of code-caching working for dataclasses (all tests are passing). As a preliminary benchmark, the dataclasses test suite is running about twice as quickly… I haven’t timed @dataclass generation time specifically yet, but I suspect that it’s even better than that (the tests are doing a lot more than just generating dataclasses, and the generated code hasn’t changed at all).

Something like 95% of generated methods don’t actually go through exec, but just use code.replace to patch a few names and constants. I also expect that memory use has improved as well. I’ll be doing more cleanup and experiments on my branch today.

5 Likes

Here’s my prototype. Reviews welcome:

Implement dataclass code caching by brandtbucher · Pull Request #92650 · python/cpython (github.com)

6 Likes

I’ve tried this out on a microlevel for attrs (going all-in requires significant refactorings) and got speed improvements of 19 us between compiling the default methods one by one vs all one once. That’s approximately a 10% speedup: bench_eval.py · GitHub

However the whole class takes 461 us so the 19 us are only ~4%. Might still be worth it, depending on the amount of complexity the refactoring would cause.

2 Likes