In the dev sprint last week the issue of auto-formatting came up again, despite nobody having planned to discuss or work on the topic in advance. Several participarts started a discussion, where we reconsidered this from a fresh perspective. After getting positive feedback on the concept from several core devs, @ammaraskar, @isidentical, @corona10 and I have worked up a draft for a PEP proposing automatically applying and enforcing formatting of the CPython code (see below).
This is still in an early stage; It’s not an actual PEP, yet. We’d like to get feedback on the idea and high-level approach, as well as hear about additional issues or problems that we may have missed, before proceding to make a PEP for this.
With that in mind, please take a look at the following draft!
The CPython codebase consists of three major languages: Python, C and reStructuredText . For each of these languages there is a set of written code style conventions that are followed by contributors and core developers. Learning, applying, and enforcing these conventions are time-consuming tasks which impede our workflow.
In recent years many software projects have adopted workflows where code formatting is entirely automated. This PEP outlines the usage of such tools in the CPython development process.
The aim of this proposal is to completely automate the applying and enforcing of the existing code style guidelines. In a wider context, this is one of several initiatives aimed at addressing the large backlog (see below) of the CPython project. It would achieve this both directly, by making code reviews more efficient, and indirectly, by making contribution easier and friendlier, thus attracting more developers to work on the project.
At the time of this PEP’s writing, there are over 1,300 open pull requests (“PRs”) on the CPython Github repository , and over 3,000 open bug tracker issues with a patch or a PR . This large “backlog” has been acknowledged as a serious problem, as can be seen, for example, by the Python Steering Council discussing this in November 2019 . The most significant reason for this is a lack of core developer time to review and handle these patches and PRs, as noted in the Python Developer’s Guide  and many times on the python-dev mailing list  and discuss.python.org .
In recent years many aspects of the CPython development workflow have been improved via automation, such as automatically running tests for PRs and automating the backporting of PRs to maintenance branches. Like testing and backporting, applying and enforcing code formatting styles is a time-consuming process that is natural to automate.
At this point in time (late 2020) the authors believe that this aspect of the workflow is prime for automation, thanks to the growing popularity and maturity of automatic formatting tools. Some prominent examples of such tools are clang-format, rustfmt, gofmt, prettier and standardjs. In the Python ecosystem, the most popular options are black and yapf. In contrast to earlier tools which would highlight or print warnings on style violations, this new generation of tools automatically format code to conform to a specific style. This is the foremost reason we are reconsidering this idea, which was previously discussed and rejected in 2016 .
Besides making the PR process more efficient, adopting auto-formatting would “lower the bar” for new contributors. The current workflow requires contributors to read PEP 7, PEP 8 and the documentation style guide , and to understand the subtleties of when to apply them or conform to the style of existing code. The status-quo also often results in reviews asking that PR authors fix code style issues, which introduces additional delays into the PR review process and sometimes causes frustration [12, 17, 18].
Finally, constantly thinking about code style is a distraction from more significant aspects of code, such as correctness and readability. Automatic formatting allows everyone working on a project to almost never think about how to style their code.
Other Notable Projects Using Automatic Formatting
In addition to the evolution of tooling and benefits outlined above, there are also a wide variety of large open source projects that have adopted some form of automation for their formatting. Some prominent projects are:
- The Rust programming language uses rustfmt to keep their standard library and compiler written in Rust formatted .
- The Linux kernel uses clang-format to keep their C/C++ code automatically formatted .
- NodeJS uses clang-format to format their C/C++ code. https://github.com/nodejs/node/blob/master/.clang-format
- The LLVM compiler uses clang-format. https://github.com/llvm/llvm-project/blob/master/.clang-format
- The Go programming language uses gofmt. https://github.com/golang/go/wiki/CodeReviewComments#gofmt
- Django formats its code with Black.
For each language (C, Python and ReST), choose an automated code style checker and formatter. Configure and adapt the chosen tools to our needs, such as the desired styles and support for re-formatting only new and changed lines.
Require all new/changed code from a certain point in time to be checked and formatted using these tools. From that point in time, enforce this with CI checks.
Announce well in advance (just after this PEP being accepted) when this will take place.
Just before “flipping the switch” making this a requirement, reformat our entire codebase with the chosen auto-formatters. This will be done exactly once, in a single commit, on each of the active branches (e.g. master, 3.9 and 3.8).
From this point on, auto-formatting will be applied only to new and changed lines of code, as reported by git. We’ll need to ensure the chosen formatters all support this.
Supply tools to make local application of formatting simple and painless. Document how to use these tools in common workflows and environments.
Supply tools and instructions to simplify merging patches and PRs from before the reformatting. Likewise for updating down-stream patches.
Potential Problems and their Solutions
This section outlines some pitfalls that can arise from the usage of auto-formatting tools, and the solutions we propose to avoid or overcome these problems.
Language Syntax Bootstrapping
Problem: When the Python language acquires new syntax, the formatting tool will need to be updated to be able to format Python code using this new syntax. Code using this syntax, such as for tests, will need to be added to the codebase before the formatting tool could be updated.
Solution: Code files, blocks and/or lines will be able to be marked for exclusion from auto-formatting. For new syntax, these exclusions will be marked as temporary with specific comments to make finding them and removing them easy when support for the new syntax is added to the formatter.
(Rejected Solution: Keep the formatter’s source code in the codebase, and be forced to always update it alongside any language syntax change.)
Existing Patches and PRs
Problem: We have a large set of patches on bugs.python.org and pending PRs on Github made against the old, unformatted, code. These pull requests will have merge conflicts once the code has been reformatted.
Solution: These can be fixed (semi-)automatically by: (1) merging the patch with the commit just before the codebase-wide-reformatting; (2) applying the new formatting; (3) merging with the head of the relevant branch. We will supply scripts for this purpose.
Problem: Aside from our own pending patches and pull requests, many downstream maintainers of Python such as Linux packaging folks have their own set of patches they apply against CPython .
Solution: These can be updated using the same process and tools as for existing patches and PRs (see above). The tools can be distributed in the CPython repo for downstream maintainers to use.
Problem: Once this PEP is applied, it will make it harder to apply backport patches.
Solution: Apply the new formatting to all active branches simultaneously. Thus, manual fixing should only be needed when applied to branches accepting only security fixes.
Auto Generated Files and Vendored Files
Problem: Automatically generated code files, such as those generated by argument clinic and codec generators, should not be checked or automatically formatted. Vendored files, such as those for libmpdec, libffi_osx, _sha3, and _blake2, should be ignored as well.
Solution: A list of excluded paths. A configuration file will be made that manages the exclusion list for the formatters.
Negatively affecting ‘git blame’ and similar features
Problem: Codebase-wide changes touching on many lines of code get in the way of inspecting the history of specific lines or blocks of code, such as when using
Solution: git has gained the ability to ignore certain commits when performing such operations. This is done through the use of a
.git-blame-ignore-revs file that many other large open source projects have adopted [9, 10, 11]. Initially, this solution will not resolve this problem when using the “blame” feature on GitHub; for that, we will need to bring this up with GitHub and hope that they provide a solution in the future .
No reStructuredText Auto-Formatter
Problem: reStructuredText is not a widely used document format, and to our knowledge there are no existing auto-formatters for it.
Solution: Write a ReST formatter! Ammar Askar, an author of this PEP has declared his willingness to do so.
Requiring More Developer Tooling
Problem: Requiring auto-formatting for three different languages will require most contributors to set up three more tools in their local environment and update them occasionally. Worse, for work on maintenance branches, different versions of the Python formatter (and possibly the ReST formatter) may be required.
Solution: A GitHub action which not only checks formatting, but also makes formatting fix suggestions directly on the PR, which could be easily applied. Also, make local installation and updating simple, with clear instructions for different platforms. Finally, with automerging and backporting mostly done by Miss Islington, updating her to apply formatting automatically with the correct versions of tools should mostly eliminate the need to keep different versions installed locally for non-core devs.
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.