How to measure performance impact when changing CPython Lib (.py files)

MatanyaStroh · January 2, 2024, 2:59pm

While going over cPython/Lib files, I noticed few places where a simple change in the way it’s written in Python can improve performance.
One of the example is in Lib/urllib/robotparser.py (but many more cases were found):

ret = []
        for agent in self.useragents:
            ret.append(f"User-agent: {agent}")

That can be change to:

        ret = [ f"User-agent: {agent}" for agent in self.useragents ]

From my knowledge, and based on timeit, it’s “~20%” improvement.

>>> import timeit
>>> def list_append():
...     a = range(0, 10000)
...     b = []
...     for x in a:
...         b.append(x)
...
>>> def list_comp():
...     a = range(0, 10000)
...     b = [ x for x in a ]
...
>>> timeit.timeit(list_append, number=10000)
1.5432341250000263
>>> timeit.timeit(list_comp, number=10000)
1.2767687499999738

So I was thinking of change/fix few cases that I found.

My question is as follow, how can I check the overall change (hopefully improvement ) of changes that I’m doing? Is there a way to measure performance impact of a commit in cPython?
Is there a way to benchmark the entire cPython? benchmark for specific library?

jamestwebber · January 2, 2024, 4:12pm

You might look at the faster-cpython team’s bench_runner repo. Note that you’d need to set up your own Github Action for this, they don’t provide free compute.

That benchmark suite is for testing improvements to the interpreter overall–speeding up one method by a little bit is going to be lost in the noise. But maybe you can adapt a piece of it for your needs.

I’d definitely open these one at a time, as no one wants to review a PR that spans many different files (and therefore touches code maintained by many people).

There might be some resistance to changing stable code for something like this–20% seems like a big speed-up but if it’s not in something that gets called very frequently it’ll be inconsequential to users.

guido · January 2, 2024, 4:59pm

Honestly, I don’t think it’s usually worth nitpicking performance in the stdlib like this.

Unless you have actually, in production, run into a situation where you hit one of these slow spots, I don’t see why we should fix them. In many cases it takes considerable care to ensure that a small change like this doesn’t accidentally break some other use case.

barry-scott · January 3, 2024, 11:22am

Also consider the cost on peoples time to review, test and approve any changes vs the benefit.

jeanas · January 3, 2024, 11:33am

You can also read @rgommers’ The cost of an open source contribution