How is CPython tested?

illume · September 5, 2019, 7:24am

How do people test CPython?

What tools, infrastructure and methods are used?

ammaraskar · September 5, 2019, 7:35am

This page is a good place to start:
https://devguide.python.org/runtests/

Essentially, you’ll find a large number of files under the Lib/test directory that are dedicated to running unit and integration tests for most of CPython. These use the built-in unittest module in Python, and a general convention is that given a module name like time, it’s associated tests are held-in test_time.py.

As far as infrastructure, take a look at this part of the devguide: https://devguide.python.org/buildbots/#continuous-integration
For continuous integration, there are about a hundred bots on a wide variety of OS and hardware configurations that build and run the tests. Aside from this, every pull request is run through Travis CI, Azure Pipelines and AppVeyor to ensure tests pass before changes are merged.

The coverage of the test suite is also automatically monitored and can be viewed on codecov: https://codecov.io/gh/python/cpython

illume · September 5, 2019, 8:02am

Thanks!

Are there any other C tools used? Static analyzers, linters?
There seems to be no C level unit tests?
Manual, or performance tests?
Security testing?

How does testing happen during the different stages of the release process?

Is there a measurement of how useful the different testing methodologies are?

ammaraskar · September 6, 2019, 8:03pm

None that are run on the CI as far as I know, but CPython does support building with sanitizers (ASan, MSan, UBSan) and these are occasionally used to track down some errors.

There are some but they are rare: cpython/Modules/_testcapimodule.c at main · python/cpython · GitHub
You’ll note on the coverage report that most lines of C are covered, the general methodology is to call into the C modules through Python to test them.

None, as far as I know.

GitHub - python/pyperformance: Python Performance Benchmark Suite is the main project for performance testing, results are pushed onto https://speed.python.org/

There is some fuzz testing but as far I know, there aren’t any other direct security tests. cpython/Modules/_xxtestfuzz at main · python/cpython · GitHub

I don’t think anyone has looked into this.

illume · September 9, 2019, 1:39pm

Thanks again

I noticed that some python libraries are fuzzed in the Google OSS fuzzing project. See: https://github.com/google/oss-fuzz/tree/master/projects/python3-libraries It makes use of the ASan, and MSan you mentioned.

Are there any check lists used for PRs or releases?
Is the release process documented anywhere?
I see people are asked to “test the beta releases”, and to do that with TravisCI, but I can’t find any other test instructions, or test plan. Are there testing instructions for people before each release?
Is there any usability or accessibility testing done?

aeros · September 14, 2019, 3:11am

Specifically for PRs, when a PR is opened, the currently active CI (Travis, Azure Pipelines, and AppVeyor) are used in a status check for the PR to test the changes. Effectively, the entire suite of tests contained in Lib/tests is ran using several different environments. It’s typically a requirement for all of the tests to pass before merging the PR, unless it can be determined that the failure is unrelated to the PR itself (which can happen occasionally).

During the process of merging the PR against one of the main branches, the buildbot fleet is utilized. For more information on the buildbots, there’s a dedicated section in the devguide.

This is somewhat explained in the devguide. However, there may be other resources that I’m not aware of, I have not personally been especially involved with the release processes as much. Perhaps @nad and/or @ambv would be able to clarify on this.

I can imagine it would be difficult to provide exact test instructions, because “test the beta releases” is usually targeted primarily towards library and package maintainers, to see if the beta releases are causing any issues for them. The exact steps would vary from project to project.

From a user perspective, it can also be helpful for them to to run the suite of tests locally on their own devices, to ensure there aren’t environment specific problems that were missed. This process is explained in the devguide.

By usability testing, do you mean having a group of end-users to test the latest releases? If so, we do not as far as I’m aware. However, we do encourage feedback to be submitted. Good places to submit feedback would be in either Python Help - Discussions on Python.org or in the mailing lists.

If it’s a clearly defined problem, a new issue can be opened for it on bugs.python.org.

Edit: Clarifications made to the PR question regarding the usage of the buildbots.

matrixise · September 14, 2019, 3:44am

For the PRs, the buildbot is not used. It’s Travis, Azure and AppVeyor for Linux, macOS and Windows.

We only use Buildbot when a PR is merged into one of the final branches (master, 3.8, …)

aeros · September 14, 2019, 3:58am

Good to know, thanks for the clarification. For some reason, I was under the impression that the CI were each interfacing with parts of the buildbot fleet, rather than being their own independent units. As a triager, I’ve spent some time helping to debug PRs from the CI logs before they were merged, but I’ve never dealt with the process of merging into a main branch.

I’ll edit my post accordingly to avoid confusion.

matrixise · September 14, 2019, 4:08am

No problems.
And one time per day, there is a big build on Buildbot where we try to find the leaks (~6h for the tests)
Travis is mainly for Linux and the docs. Azure for Linux and Windows and AppVeyor (not sure) Windows and macOS.
Also Buildbot can run the tests on a matrix of operating systems/hardwares.
Example, yesterday we merged a PR into master, there was a crash with Solaris but not with Linux. Detected via a Buildbot worker and the master branch.

aeros · September 14, 2019, 4:30am

When the buildbot crash occurs from one that is typically stable, is there a general process that is followed to handle the issue? The devguide mentions:

The rule is that all stable builders must be free of persistent failures when the release is cut. It is absolutely vital that core developers fix any issue they introduce on the stable buildbots, as soon as possible.

But it does not explain the actual timeline, or the typical steps that core developers take to rectify buildbot failure. I seem to recall hearing something about giving the core developer that merged the PR one day to rectify the issue before the commit is reverted, but I was unable to find that in the devguide.

Also, are those who aren’t core developers able to help out with directly fixing buildbot related issues? I’ve helped before with fixing issues that were causing failures in the tests (most recently on a PR for test_asyncio), but I would also be interested in helping with the buildbots more directly at some point. I recently subscribed to the Python-Buildbots mailing list upon recommendation from Victor.

lys.nikolaou · September 14, 2019, 1:23pm

Note that, although not about the general process itself, there is also a PEP with the release schedule for each Python version. For example, for 3.8 it is PEP 569.

aeros · September 14, 2019, 5:38pm

Ah yes, I’m aware of PEP 569. There’s also PEP-602 being currently discussed. I was more so referring to a resource that would apply universally, but since the release process evolves over time and differs between versions that may not exist (outside of the already existing parts of the devguide that describe it loosely).

ambv · September 15, 2019, 8:44am

The movements of a single release build are described in PEP 101. It’s a hairy PEP but see if it includes what you’re looking for.

illume · September 15, 2019, 9:12am

@matrixise cool. What tools are used in the “find the leaks 6h” build? Is this for C level code? I see something in the devguide about leaks here: https://devguide.python.org/runtests/?highlight=leaks#running Is this what you were talking about?

I see Coverity analysis is used here: https://devguide.python.org/coverity/#coverity-scan However I don’t see any recent merged PRs with “coverity” text? Is this tool run very often, and is it still useful?

I found this review checklist for Triaging: https://devguide.python.org/triaging/?highlight=review#checklist-for-triaging Also found “How to review a Pull Request” https://devguide.python.org/pullrequest/#how-to-review-a-pull-request Are there any other code review guidelines or checklists?

aeros · September 15, 2019, 8:21pm

Thanks, that’s exactly what I was looking for. I wasn’t aware of the existence of PEP 101.