How to measure performance impact when changing CPython Lib (.py files)

While going over cPython/Lib files, I noticed few places where a simple change in the way it’s written in Python can improve performance.
One of the example is in Lib/urllib/robotparser.py (but many more cases were found):

ret = []
        for agent in self.useragents:
            ret.append(f"User-agent: {agent}")

That can be change to:

        ret = [ f"User-agent: {agent}" for agent in self.useragents ]

From my knowledge, and based on timeit, it’s “~20%” improvement.

>>> import timeit
>>> def list_append():
...     a = range(0, 10000)
...     b = []
...     for x in a:
...         b.append(x)
...
>>> def list_comp():
...     a = range(0, 10000)
...     b = [ x for x in a ]
...
>>> timeit.timeit(list_append, number=10000)
1.5432341250000263
>>> timeit.timeit(list_comp, number=10000)
1.2767687499999738

So I was thinking of change/fix few cases that I found.

My question is as follow, how can I check the overall change (hopefully improvement :slight_smile: ) of changes that I’m doing? Is there a way to measure performance impact of a commit in cPython?
Is there a way to benchmark the entire cPython? benchmark for specific library?

You might look at the faster-cpython team’s bench_runner repo. Note that you’d need to set up your own Github Action for this, they don’t provide free compute.

That benchmark suite is for testing improvements to the interpreter overall–speeding up one method by a little bit is going to be lost in the noise. But maybe you can adapt a piece of it for your needs.

I’d definitely open these one at a time, as no one wants to review a PR that spans many different files (and therefore touches code maintained by many people).

There might be some resistance to changing stable code for something like this–20% seems like a big speed-up but if it’s not in something that gets called very frequently it’ll be inconsequential to users.

Honestly, I don’t think it’s usually worth nitpicking performance in the stdlib like this.

Unless you have actually, in production, run into a situation where you hit one of these slow spots, I don’t see why we should fix them. In many cases it takes considerable care to ensure that a small change like this doesn’t accidentally break some other use case.

7 Likes

Also consider the cost on peoples time to review, test and approve any changes vs the benefit.

You can also read @rgommersThe cost of an open source contribution

2 Likes