PEP proposal: Automatically Formatting the CPython Code

pf_moore · November 7, 2020, 10:18am

Hang on, I was responding to a comment that said that autoformatters helped address the problem where people wrote a PR that got rejected by code format checking CI! I’m so used to pip’s CI that I seem to have misremembered - you’re saying that core CPython doesn’t have CI checks for code style right now?

So in the absence of code format checks (i.e., where CPython’s codebase is right now) the only reason PRs get blocked for code style is because some core developer felt that they wanted to request style changes? Why is this even a problem? Do we have core devs who style check code purely on the basis of a mechanical application of the PEP 8 rules, rather than by judging that something isn’t readable? Then we should be discussing with them whether their review approach is helpful. Do we have PR submitters who refuse to address reasonable readability improvement requests? Then we should discuss with them and educate them, not hit them with the hammer of “the rules say so”. Do we have core developer arguments over what’s best style? If so then we should lighten up, and/or agree a consensus (or check PEP 8 to see if we already have a consensus )

Re-reading the “Motivation” section of the proposal with my new understanding that we don’t currently enforce CI checks for style, I see very little explanation of why we need any automated validation that PEP 7/8 are followed in the first place. It reads as if mechanically checking is self-evidently a good thing, and concentrates on “not making core devs do that mechanical check” as the benefit. But the premise there (that mechanical application of the style rules is required) is never justified, and is, in my view, both wrong and in violation of PEP 8’s comment that human judgement should always override blindly applying rules.

steve.dower · November 7, 2020, 10:55am

There’s a trailing whitespace check, but that’s it. Which is why most of the rationale has focused on how we interact with contributors. A technical solution to a people problem

Now if we extend the problem to include all of us having to think about formatting, it’s possible to justify on other reasons. But I suspect those won’t carry.

aeros · November 7, 2020, 11:00am

In my experience, it usually comes down to limited bandwidth on the side of core developer PR reviews. Common style issues (e.g. I’ve ran into docstring formatting issues quite a few times) add “one more thing” to address which can result in substantially increased review periods. Even more so if the PR has been open for a few weeks or more and there are delays on both sides (from both the reviewer and author). So my understanding is that if we have some form of autoformatting tool and CI check, it can possibly reduce review time spent on minor formatting issues.

On an individual PR, it probably won’t make a huge difference, but I think it could add up significantly when multiplied across all PR reviews if we could avoid having to address some common style issues (resulting in more time to review other PRs or work on more interesting issues than code style feedback).

pf_moore · November 7, 2020, 11:05am

And my view is that we shouldn’t be bothering with minor formatting issues if we have such a big backlog that we are concerned about taking too long over individual reviews.

We may just have to agree to differ.

malemburg · November 7, 2020, 11:36am

The analogy is not quite correct. Making code formatters happy is
not part of creating a working Python patch. It’s merely a style
check.

Sphinx is a tool, just like Python, which runs the input, so if this
doesn’t work, you have a broken patch: the documentation output
cannot be produced.

If you write code which a code formatter complains about, you still
have working code. You’re just not adhering to whatever the author
of that tool believes is well formatted Python code.

I’m -1 on adding automatic code formatting to the CPython code base,
since it creates more noise than anything else. We are a very diverse
set of programmers and all of us have their own particular coding
style. As long as it’s readable, that’s fine. There’s no need to be
pedantic about coding style.

ammaraskar · November 8, 2020, 8:51am

Just for posterity, here’s a sampling of pull requests I came across when researching this proposal. (These aren’t meant to single anyone out or point blame, just some examples of situations that might be completely eliminated with a system like this.)

https://github.com/python/cpython/pull/21685#pullrequestreview-460410937 (Style-only changes requested followed by no re-review.)
https://github.com/python/cpython/pull/21643#pullrequestreview-456177303 (Style-only change, no delay in pull request because author is a core dev)
https://github.com/python/cpython/pull/20796#pullrequestreview-438271505 (Style only changes on a ready-to-merge pull request)
https://github.com/python/cpython/pull/1130#pullrequestreview-32801297 (Style only changes on a ready-to-merge pull request)
https://github.com/python/cpython/pull/2144#pullrequestreview-45527659 (Style suggestion applied by merging core dev)
https://github.com/python/cpython/pull/1732#pullrequestreview-39608379 (One day delay in merge from style only changes)
https://github.com/python/cpython/pull/4879#pullrequestreview-83855889 (Style only changes on a ready-to-merge pull request)
https://github.com/python/cpython/pull/5851#pullrequestreview-102190408 (Style only changes on a ready-to-merge pull request)
https://github.com/python/cpython/pull/6149#pullrequestreview-141488592 (Style change causes a little back-and-forth)
https://github.com/python/cpython/pull/7911#pullrequestreview-135499252 (Style only change on a ready-to-merge pull request)
https://github.com/python/cpython/pull/11600#pullrequestreview-193947675 (Style only change on a ready-to-merge pull request)
https://github.com/python/cpython/pull/6095#discussion_r186260023 (Back and forth over how to fit a line)
https://github.com/python/cpython/pull/8561#pullrequestreview-141421527 (Formatting only review, with some caveats about following existing precedent)
https://github.com/python/cpython/pull/4238#discussion_r155028345 (A little back-and-forth over the application of PEP7)

There are some more muddier examples, such as where style/formatting based comments are interspersed with comments about the actual code which I didn’t include here, this is mostly focused on pull requests that potentially had extra work or discussion added to them over formatting nits.

Ultimately, I’d like to reiterate that this is a balancing act. I completely understand the wariness with how this would look like in the workflow process and giving up control for those situations where the formatter doesn’t make things clearer. I’ve personally come to the conclusion that it would be worth it just to cut out this entire class of comments and delays but I see why others might not see it that way. I would just ask that you keep an open mind even if you feel like you personally don’t run into these situations yourselves.

I do think a more human-oriented solution is also possible here whereby core devs just apply formatting nits they care about themselves, utilize the github suggestions feature as much as possible and try to avoid smaller nits.

steven.daprano · November 8, 2020, 10:31am

Steve Dower said:

“In my opinion, spreading things out more is good for readability
anyway, compared to squeezing even more into a single line.”

I don’t deny that sometimes this is true, and that it is possible for
code to be too terse and compact.

But if it were generally true, we would say that comprehensions hurt
readability and we should go back to spreading it out over multiple
lines with a for loop. We would reject code with enumerate in favour of
spreading the same code over multiple steps and multiple lines.

Do you remember the Decorate-Sort-Undecorate idiom back in 1.5 and
early 2.x days? It was so common that it was (and still is) an actual
FAQ:

I must have written a million DSUs:

# Decorate, sort, undecorate.
decorated = [(employee.paygrade, i, employee) for i, employee in enumerate(employees)]
decorate.sort()
employees = [t[2] for t in decorate]

This was even more verbose and spread out in Python 1.5, before
comprehensions. I’ll leave that as an exercise for the reader. Hands up
anyone who thinks that this spreadout code is an improvement over the
modern idiom?

employees.sort(key=lambda employee: employee.paygrade)

# Alternative version without lambda.
employees.sort(key=operator.attrgetter('paygrade'))

I will argue that, to the degree that we will have to work around the
limitations of the auto style formatter by spreading the code out, that
is evidence that auto code formatting makes code worse, not better.

steven.daprano · November 8, 2020, 10:54am

Thanks for doing the leg work to dig those out. That’s 14 examples out
of how many hundreds or thousands of PRs in the same period?

I expect you have missed many more. Let’s say you found only one in ten
examples, so 140 examples of PRs delayed over style issues, and let’s
assume that every one of those could have been avoided by sticking to an
automatic style checker.

Let’s round it up to 200 examples. What proportion of the total PRs is
that?

If avoidable delays over style happens, say, 1 out of 5 PRs, I think we
would all agree that this is a serious problem that needs attention. If
it’s 1 out of 5000 PRs, I hope we can agree that it is an insignificant
problem that we can ignore.

We can argue about proportions in between those extremes, but if we have
no idea what the proportion is at all, how do we know this is a problem
that needs solving and what level of inconvenience is worth paying in
order to solve it?

steven.daprano · November 8, 2020, 11:10am

Wait, I’m confused. You seem to be saying that we already have these
code formatting checks in place. (Aside from the “no trailing
whitespace” rule, which I trust we all agree is both non-invasive and
useful.)

Gregory:

“Ex: today I can create PRs that then go onto CI actions and fail
various whitespace or formatting checks with high latency after my
attention has shifted away. Doh! That is an antipattern.”

What sort of style checks are done by the CI system?

If we already have these checks in place, then I agree with your
analysis that high latency in the failure is an antipattern.

But if we already have these checks in place, doesn’t that undercut the
argument that reviewers are spending a lot of time manually checking
style?

brettcannon · November 9, 2020, 10:09pm

None beyond what make patchcheck does which makes sure indentation is uniform and there’s no trailing whitespace.

pradyunsg · November 11, 2020, 5:57pm

I have no horses in this race, so I’m gonna skirt around the entire discussion about the pros/cons of automated code formatting.

I do have one suggestion - please adopt pre-commit.com for managing the linters/code formatters, if you decide to use them.

That tool unifies the management of “which files to ignore” by providing a single place to configure ignore rules. It provides enables careful control of the exact linter versions used, a consistent invocation of linters across projects while respecting project-specific configuration, automated environment management for the linters and more. It’s also possible to opt-in (on a per-contributor basis) to a shorter feedback loop, by setting up the optional git hook that’ll run the linters locally when you make a commit, to ensure you never push a misformatted commit (by running pre-commit install once).

Overall, it’s a capable tool that makes managing automation around code formatting easier and consistent.

I’ve been using it on nearly all projects (Python and non-Python) that I’m involved in, and it’s been a genuine workflow improvement. It accommodates for everyone - from folks who want to run it on every save to folks who don’t care as long as things pass on the CI.

rhettinger · December 4, 2020, 6:12am

FWIW, something that has worked for me is running “black -S -diff”, examining the proposed changes, keeping the ones the make sense, and rejecting the ones that impair readability.

Black typically makes some improvements, but it also makes some things worse.

Here are some of the problematic cases:

When there is a block comment before a function or class, Black adds two line spaces between the comment and the object being commented. This creates a visual separation where there was supposed to be a visual connection.

See this block comment in the statistics module. This comment explains the _ss() function that follows, so the two should be visually connected.
Black respaces in-line comments in a way that smushes the comment against the code it is commenting, rendering both of them less readable. Here are three examples, one from descriptor howto guide, another from the source for ChainMap, and another for OrderedDict:
- Comments showing the order that cases are handled.
- In-line comments add parenthetical information
  that don’t belong in the main flow of reading the code.
- Vertically aligned inline comments create a separate parallel column meant to show high level overview.

Black can wreak havoc on some kinds of numeric code. For example, this Padé approximation in the statistics module has been carefully formatted by hand:

  # Hash sum: 55.88319_28806_14901_4439
  num = (((((((2.50908_09287_30122_6727e+3 * r +
               3.34305_75583_58812_8105e+4) * r +
               6.72657_70927_00870_0853e+4) * r +
               4.59219_53931_54987_1457e+4) * r +
               1.37316_93765_50946_1125e+4) * r +
               1.97159_09503_06551_4427e+3) * r +
               1.33141_66789_17843_7745e+2) * r +
               3.38713_28727_96366_6080e+0) * q
  den = (((((((5.22649_52788_52854_5610e+3 * r +
               2.87290_85735_72194_2674e+4) * r +
               3.93078_95800_09271_0610e+4) * r +
               2.12137_94301_58659_5867e+4) * r +
               5.39419_60214_24751_1077e+3) * r +
               6.87187_00749_20579_0830e+2) * r +
               4.23133_30701_60091_1252e+1) * r +
               1.0)

PEP 8 says,

If operators with different priorities are used, consider adding whitespace around the operators with the lowest priority(ies).

We use this technique extensively to improve readability, allowing each term to be visually processed as a single chunk. Here an example from the pdf() method in statistics:
```
exp((x - self._mu)**2.0 / (-2.0*variance)) / sqrt(tau*variance)
```
Black throws away this effort, rendering the result harder to read accurately.

Sometimes Black’s string reformatting is way-off. In this named tuple example, Black is confused by the joining of an f-string with a non-f-string:

-    replace.__doc_ = (f'Return a new {typename} object replacing specified '
-                       'fields with new values')
+    replace.__doc_ = (
+        f'Return a new {typename} object replacing specified ' 'fields with new values'
+    )

In general, I’m a big fan of Black. When code is in shambles, Black makes valiant efforts to restore a semblance of order from chaos. But for mature code, the results are more hit and miss.

So, my recommendation is to look at what Black suggests on a line by line basis. Keep anything that improves readability. Ignore anything that impairs readability. We want the tool to serve us, not the other way around.

pitrou · December 4, 2020, 9:46am

Thanks for the excellent examples @rhettinger !

pganssle · December 16, 2020, 4:12pm

The question is really just what’s the default. Very few people are going to actually do this, and it will either lead to friction for contributors, who would need to temporarily run black on their code, then revert all the changes they don’t like, every time they make changes.

On the other hand, you can easily disable black auto-formatting in-line in places where you want to hand-craft a more readable output. That is a fairly strong case for using black or some other formatter and adding formatter directives in the rare deviations where black does the wrong thing (this also limits the places where you have to check for formatting problems in the same way that Rust’s unsafe hatch tells you where to look for memory safety issues).

That said, I used to be a much more enthusiastic proponent of this sort of thing until a recent unpleasant experience with the C compiler wherein some compilers were issuing overzealous warnings and it was impossible to turn those off without causing other compilers to start issuing warnings about unnecessary warning-disabling code. Many of the arguments for auto-formatting are also valid arguments for adding linters and other automatic fixup tools — we must be very cautious that we don’t get into a situation where satisfying these automated code checking tools doesn’t lead to more difficulties, or the introduction of more fragile code, as it has in the case with the C compiler warnings.

asears · December 31, 2020, 4:59pm

Code style issues are low-hanging fruit when reviewing a pull-request. Remove the fruit to get at the deeper review items, and increase faster PR merges or rejections. Auto-fixing won’t solve this with current state of black or autopep8.

To improve workflow and speed-to-approve-or-reject metric, suggest ‘optional’ black (and/or autopep8) formatting. Scan modified code using a bot to “suggestion” commit changes. Allow formatting packages to address formatter issues and evolve the code.

Would bedevere be a good candidate for this improvement macro?

Some suggestions for PR improvements here:

Opt-in to automated suggestions made in a pull request. Create macros to gloss over the process of formatting code in a semi-supervised fashion. This allows control over the final output.
Allow the approval/rejection and voting on formatting suggestions. This would provide metrics for black, autopep8 and others to use for targeting improvements. AI-based formatting will solve these opinionated formatting problems in future.

The formatter bot and reviewers can use a suggestion keyword in comments to fix code. This would increase the velocity of the contributor’s PR. Give them a fix they can act and learn with, rather than saying “go away and come back with black.”

Provide autoformatted output of the commit and the file.
Provide visibility into complexity & other code metrics.
Provide PEP8, spelling, grammar and formatting suggestions inline of the PR.

Adding black, autopep8 or other auto-formatters automatically to some people’s workflows is probably just going to introduce more cultural issues and problems with code… opt-in seems to be best approach.

ssbarnea · January 2, 2021, 8:23am

Finally someone had the courage to open this sensitive subject. I will be more than happy to see an auto-formatter applied, even if is not perfect.

As others noted; that is about removing the subjectivity from code reviews, so reviewers can focus on what really matters and avoid the bitter low hanging fruits.

My personal experience enabling black on few dozen projects was more than happy: after the initial noise I forgot about having to care about formatting in reviews.

Think again; that is not about that you personally “like”, this is about letting this go and trusting a tool to do it. It may be counterintuitive for some, but that is putting the project benefits before your own subjectivity. I am all for it!

steven.daprano · January 2, 2021, 9:48am

Sorin:

“I will be more than happy to see an auto-formatter applied”

You are welcome to use one. Nobody is stopping you. Please do, if you

aren’t already. But this is proposal is not about YOU, it is about

requiring everyone else to use one.

“removing the subjectivity from code reviews”

There are a bazillion subjective judgements about a code review,

starting with the most fundamental of all: "do we want this patch at

all?". At best, auto-formatting will reduce a few minor disagreements

about formatting, many of which we would ignore anyway.

It certainly won’t eliminate subjective judgement on whether features

should exist at all, the best name for variables, choices of wording of

documentation, implementation details of algorithms, choice of LBYL

versus EAPF, etc.

For what it’s worth, I’m always suspicious of claims that code

formatting doesn’t matter, so we should just delegate formatting

decisions to a bot. If formatting doesn’t matter, why do we need

formatting standards?

I think that, say, 99% of the time we don’t have to think about

formatting, but that last 1% is critical, and that’s where having a bot

make those decisions is problematic.

Others have made the point that reviews sometimes have to send a

perfectly serviceable PR back because the formatting is poor, and this

adds friction to the process and can be demoralising to new developers.

But we shouldn’t need to give up all control over formatting to avoid

this. Reviewers should be able to say to a PR author "Your code is great

but the formatting doesn’t meet our standards, here is a reformatted

version of your code. Are you happy for us to use that, or would you

prefer to reformat it yourself or argue for the existing version?"

We should be able to use code formatters as a tool without having to

give up all control.

steve.dower · January 2, 2021, 10:06am

Any formatting tool would be 100x more pedantic than any CPython code review I’ve ever been involved in when it comes to formatting.

I’m sorry some people have apparently had poor experiences with core devs blocking their PRs for pointless formatting issues. Feel free to direct whoever is doing that to the section of PEP 8 that gives them permission to ignore its “rules” (they’re more like hints) when providing reviews. That isn’t the culture of review that we want here.

I’m pretty sure we also don’t want new contributors to have to install yet another tool, or run yet another tool, or be told what to do by yet another bot, before getting to interact with a person. What we need is a better experts list and better PR notifications to get eyes on them, which probably means paying triagers and taking the time to figure out who is able to handle which bits of the codebase.

In other words, we need to use people to solve an inherent people-problem, not more technology.

pitrou · January 2, 2021, 10:50am

I find it a bit weird to suggest that subjectivity only applies to code style and not other judgement factors in a code review. My experience is that I’m judgemental in other areas as well

pganssle · January 13, 2021, 3:34pm

Yes, but it will fix the problem for you, so there’s no additional latency involved, and no friction for the core dev merging.

Maybe some people are having an obviously bad experience from this, but I think that even if this solved zero problems with new contributor friction, an auto-formatter would still be a net positive. I don’t think I’ve ever been blocked on formatting issues, but I personally like to keep to a project’s style when I provide a new contribution, and it does add friction for me to figure out the right style in every new project I contribute to, even if no one blocks on it. Also, you may not even realize that your PR got blocked on style issues, because instead of a core dev saying, “Oh, LGTM, time to press the merge button”, it becomes, “Oh, there are minor style issues and I don’t want to bother the person with it, when I have a minute I’ll pull their branch, push style changes directly, then merge”, which adds friction, which ends up silently blocking PRs.

I’ll note that I have an attachment to a clean git history in a way that a lot of people here seem attached to having human-curated code formatting. For years in dateutil, I’ve been re-writing the histories of non-trivial commits because I want a nice history for git blame and git bisect, but it is way more work, and it’s caused me to fall way behind in a way that squash merges don’t. I think we’ve adopted squash merges for similar reasons — yes, you can have a cleaner commit history, but in practice you have a messier one if you don’t do it that way.

I don’t think this attitude is inconsistent with the idea of using an auto-formatter; it really depends on how it’s implemented. We have many options:

Work to do some unification of tools, e.g. using tox or pre-commit, so that the end user installs one tool which manages virtual environments that handle things like auto-formatting, etc.
Have a bot apply the formatting for you. The end user doesn’t need to do anything and nothing will complain to them (though maybe there will need to be a red X to prevent merges — not sure if it’s possible to indicate clearly that the PR is otherwise ready to merge). We have squash merges anyway, so it’s simple to push an additional commit fixing all style issues at the end of a PR.

This is not one of those “people problems” that technology can’t solve. Having a well-formatted codebase is a good thing. It is not difficult to automate large swathes of this work. The problem is not that people feel turned off by having to do this work, it’s that we’re wasting people’s time (our own, our contributors’) on something that a computer will do.