About speed.python.org and aarch64

Hello all,

Diego from Arm Ltd here. I’m assessing the status of CPython and its ecosystem on aarch64 and during my investigation I bumped into speed.python.org and pyperformance package.

I really like integration between pyperformance and codespeed and I was able to setup an internal instance within Arm with aarch64 results.

Being Arm architecture present on all mayor cloud providers and Python a critical programming language for cloud workflows, it would be great if we can integrate aarch64 performance results on speed.python.org.

Pointless to say that we are more than happy to help out with both development and by providing infrastructure (in some form) to achieve this goal.

Cheers,

Diego

7 Likes

Hi Diego,

Happy New Year! I had in the back of my mind to write you about how ARM could help Python, and I got overwhelmed with other things on my TODO list. It’s great that you managed to set up an ARM instance of speed.python.org. Is there a chance you can make that public? It doesn’t have to be on a *.python.org URL for now. I’d like to see if we can learn something by comparing the ARM results to the x64 results.

I’d also like to point you to Mike Droettboom’s efforts, which you can follow here: GitHub - faster-cpython/benchmarking-public: A public mirror of our benchmarking runner repository (and also to some extent in GitHub - faster-cpython/ideas).

2 Likes

Guido, happy new year too!

Apart from the performance point of view, is there anything else you have in mind we could help with? I just want to gather feedback on this.

I will try my best to make it happen. What’s the rationale behind having a separate instance instead of integrating directly to speed.python.org?

That’s useful indeed, thanks for the pointers!

Mostly just that speed.python.org has exactly one maintainer and he’s not got a lot of time for it, so anything that would require coordinating with him would just be extra red tape that I’d like to save you from.

1 Like

Guido, understood, thanks for explaining. I see what I can do and update here as soon as I have news (it might take some time)

1 Like

Just a quick update. Things are progressing well but before making anything public, we’ve been running some internal CI infrastructure to cross validate some data.
I will be off for the next three weeks but as soon as I come back, I’ll continue working on this.

3 Likes

Hello everyone, here I am with some exciting news :slight_smile:

First of all I want to apologise the delay: I’ve been busy with work, EuroPython 2023 (organising it, sponsor it and presenting) and I took annual/parental leaves on the way. So the whole thing took more than expected.

I’m happy to announce that I was able to replicate the benchmark infrastructure that shows CPython performance across Arm CPUs: Neoverse-N1, Neoverse-N2, and Neoverse-V1.

The full address is https://speed.python.arm.com/. We are tracking the following branches:

  • 3.11
  • 3.12
  • main

Older branches should be more stable in terms of metrics so we preferred to focus on newer ones. If you think we need to include older, let us know and we will do it.

We’ve had this infra in place for more than a month now, (since August 15th) because we needed to solve a few of issues with the infrastructure. Until September 14th commits benchmarked on machines might be different depending on the machine: this because the machines were available at different times (with a few hours difference) hence picked the latest commit of the branch at the time of execution of the job.

Since September 14th you should see more consistent data: we first pick the commit and then we schedule the job on different machines with this specific commit. Data might be uploaded at different times but when it arrives it is consistent with other machines.

The web server and the database are running on an aarch64 machine sponsored by the Works on Arm project and the benchmarks are running on physical machines behind the Arm firewalls. Jobs are managed by our own Jenkins instance.

I think that’s it for now, have a look and please share your thoughts.

11 Likes

Hi Diego,

Thanks for doing this!

I was glancing at the timelines, and the graph for the coverage benchmark struck me: CPython on aarch64 : Timeline

Between August 19 and August 27 the time this took shot up from 100ms to 800ms. Almost everything else was pretty stable. I didn’t see a similar jump for this benchmark on speed.python.org: Python Speed Center : Timeline

1 Like

@guido, that was a good catch! When selecting “Display all in a grid” it’s impossible to spot it.

Anyway I’ve investigated this further and I saw the same behaviour in our internal infrastructure as well. The interesting fact was that the issue happened across all our machines (both aarch64 and x86).

Here for example happening on an Intel Skylake:

I was able to restrict the window of the change to August 21st. Around that day we didn’t change anything in our infrastructure and we have pyperformance pinned to 1.0.8.

When looking at the logs of the build I noticed though a difference on how the coverage package was built. Before the 21st there was something like this:

[2023-08-19T06:38:45.800Z] 2023-08-19 06:38:45,660: Building wheels for collected packages: coverage
[2023-08-19T06:38:45.800Z] 2023-08-19 06:38:45,662:   Building wheel for coverage (setup.py): started
[2023-08-19T06:38:48.283Z] 2023-08-19 06:38:47,779:   Building wheel for coverage (setup.py): finished with status 'done'
[2023-08-19T06:38:48.292Z] 2023-08-19 06:38:47,780:   Created wheel for coverage: filename=coverage-6.4.1-cp313-cp313-linux_aarch64.whl size=216910 sha256=8cd103bb7a9c0017c85fa19b03a3730ef05d45f71383366c1737e08cf91b01e7

After the 21st:

[2023-08-23T00:35:50.189Z] 2023-08-23 00:35:50,087: Building wheels for collected packages: coverage
[2023-08-23T00:35:50.204Z] 2023-08-23 00:35:50,089:   Building wheel for coverage (setup.py): started
[2023-08-23T00:35:51.292Z] 2023-08-23 00:35:51,000:   Building wheel for coverage (setup.py): finished with status 'done'
[2023-08-23T00:35:51.300Z] 2023-08-23 00:35:51,001:   Created wheel for coverage: filename=coverage-6.4.1-py3-none-any.whl size=176625 sha256=81073f3f8773eeb93b90da2d4457bf1303426b5fa73a180fbe055f531c021747

Notice the difference in the wheel size and the platform tag: one is cp313-cp313-linux_aarch64 and the other is py3-none-any.
The wheel with the cp313-cp313-linux_aarch64 platform tag has a shared object file tracer.cpython-310-aarch64-linux-gnu.so whilst the other one no. This explains the difference in size and in the platform tag.

Coverage provides two implementations of the tracer: the ctracer (faster) and the python based one (official documentation)
Hence since August 21st we see that coverage switched to using the python implementation of the tracer of coverage.

The next question was to understand why this happens. So I went and built myself with a version of cpython before and after August 21st to replicate the issue. The one before the 21st with with no problems whilst the one after raised an error when trying to build the ctracer. Here is the error:

  creating build/temp.linux-aarch64-cpython-313/coverage/ctracer                                                                                                                                                                                             
  gcc -fno-strict-overflow -Wsign-compare -DNDEBUG -g -O3 -Wall -fPIC -I/home/ent-user/ci-scripts/venv/cpython3.13-b51db007c75b-compat-e3aaa88db39d/include -I/home/ent-user/ci-scripts/tmpdir/prefix/include/python3.13 -c coverage/ctracer/datastack.c -o b
uild/temp.linux-aarch64-cpython-313/coverage/ctracer/datastack.o                                                                                                                                                                                             
  In file included from coverage/ctracer/util.h:19,                                                                                                                                                                                                          
                   from coverage/ctracer/datastack.c:4:                                                                                                                                                                                                      
  /home/ent-user/ci-scripts/tmpdir/prefix/include/python3.13/internal/pycore_frame.h:8:4: error: #error "this header requires Py_BUILD_CORE define"                                                                                                          
      8 | #  error "this header requires Py_BUILD_CORE define"                                                                                                                                                                                               
        |    ^~~~~                                                                                                            
  In file included from /home/ent-user/ci-scripts/tmpdir/prefix/include/python3.13/internal/pycore_frame.h:13,                                                                                                                                               
                   from coverage/ctracer/util.h:19,                                                                           
                   from coverage/ctracer/datastack.c:4:                                                                       
  /home/ent-user/ci-scripts/tmpdir/prefix/include/python3.13/internal/pycore_code.h:8:4: error: #error "this header requires Py_BUILD_CORE define"                                                       
      8 | #  error "this header requires Py_BUILD_CORE define"                                                                
        |    ^~~~~                                                                                                            
  **                                                                                                                                                                                                                                                         
  ** Couldn't install with extension module, trying without it...                                                             
  ** BuildFailed: command '/usr/bin/gcc' failed with exit code 1                                                              
  **                                                                                                                          
  running bdist_wheel 

The I looked through the list of commits of August 21st and found out that the commit gh-108220: Internal header files require Py_BUILD_CORE to be defined … · python/cpython@21c0844 · GitHub by @vstinner is the cause of the failure.

The next thins to understand is: why isn’t speed.python.org behaving in the same way? Here I can only speculate on the reason: in our infrastructure (both speed.python.arm.com and our internal one) we blow the environment for every run of pyperformance hence every time we reinstall everything.

This is important because we don’t carry the pip cache across experiments and packages get installed every time and this is true for the coverage package as well. Since August 21st it fails to build the ctracer and reverts back to the python implementation.

If the cache is shared across experiments instead, pip will pick the cached version of coverage (which presumably is coverage-6.4.1-cp313-cp313-linux_aarch64.whl) until we bump the version of it.

Can anyone on speed.python.org shed some light on how the experiments are executed?

I’ve tested never versions of coverage and it seems that 7.3.2 works with 3.13.
Coverage 7.3.1 is broken with 3.13.

What’s the best way forward here? Do we need to bump coverage version from 6.4.1 to 7.3.2 in pyperformance? Anything else?

Apologies for the long reply but it was a fun an interesting problem to debug :slight_smile:

2 Likes

Is there a benchmark using coverage? Apparently, on Python 3.13, coverage should define Py_BUILD_CORE if it wants to use the internal C API.

Good investigation!

Here’s a PR to bump coverage to 7.3.2: python/pyperformance#317.

2 Likes

Won’t that still run into the problem that it uses the Python version in 3.13 but the C version in 3.12 and below?

This is how @nedbat has solved it: refactor: don't access frame structs directly · nedbat/coveragepy@1ea3907 · GitHub

This commit is indeed included in coverage==7.3.2 and it works with cpython main (after your commit)

1 Like

So the updated C version works everywhere? Then let’s do that.

The PR on GitHub to bump the coverage version was merged yesterday. I’ve raised an internal PR pinning pyperformance on that specific commit (for now). We should see the coverage benchmarks coming back to the original timing.

Even though the coverage version has been bumped, at the moment is impossible to test it properly.

In our CI we install pyperformance from the latest commit but when pyperformance gets executed it creates virtual environments for running benchmarks. In these venvs it installs itself based on the version of the “parent” pyperformance. Even it is an installation from a commit, the version it reports is still 1.0.9 hence it installs the latest released version.

I’ve seen that pyperformance tries to understand if it has been installed in editable mode but currently this mechanism is broken. I’ve raised the issue on GitHub: Editable installation isn't working properly · Issue #319 · python/pyperformance · GitHub

One of the following should unblock the “coverage” benchmark to use the ctracer instead of the python implementation:

  • release a new version of pyperformance
  • fix pyperformance for the editable installation

Ideally both should be done :slight_smile:

@guido just for completeness, there are a few comments here from @nedbat about coverage and cpython 3.13.

Thanks for the update. Maybe this could be solved by doing another pyperformance release?

I’m also CC’ing @eric.snow who IIRC last overhauled this logic.

A couple of updates around benchmarks.

New release of pyperformance has been done and currently I’m testing it in our internal CI. As soon as it finishes I will merge the change and show that coverage will be back to original values.
There is still thing to understand though: why didn’t the speed.python.org infra pick the change?

Last week I had a productive discussion with @ambv and we found a way forward to have aarch64 benchmarks in the official codespeed speed.python.org.
The machine I used for speed.python.arm.com is actually an Arm machine (bare metal Ampere Altra, Neoverse-N1 80 cores, 256GB Ram, 960GB SSD) and we can repurpose it for running pyperformance benchmarks.
We (as Arm) have the green light to use that machine for this purpose and Łukasz and I still need to tackle a few technical issues (e.g.: optimise its usage, run the same commit of the x86 machine to have comparable results, run benchmarks on commit, etc…) before deploying it.
This means that speed.python.arm.com will cease to exist. I will give notice before taking it down.

Thanks all for your support!

6 Likes

Thanks to @corona10 we have a new pyperformance release. We have updated our infrastructure code to use it and in fact coverage results went back to “normal” values. Here a screenshot of the coverage benchmark:

Also with this message I’m giving notice that speed.python.arm.com will be shut down in the next few days as this Friday (10th) @ambv and I will add it as benchmark machine to the official speed.python.org.

4 Likes

Thanks to @corona10 we have a new pyperformance release.

I can not say, there was an amazing event. :slight_smile:

1 Like