currently CPython development does not have mandatory code reviews. Any core developer is free to push a commit without review by another core dev. Reviews by domain experts (CODEOWNERS) are not enforce as well.
I’m considering to write a PEP to make code reviews mandatory even for core developers. It’s my strong believe that reviews increase code quality and act as knowledge transfer mechanism to reduce the infamous bus factor.
The gist of my PEP idea is:
PR must be reviewed and approved by another core developer.
If a file has one or more experts (code owners), than one expert must approve the PR.
If no expert responds in reasonable time (days to weeks depending on urgency), then either two core devs or a release manager can approve. In general this should only be done for bug fixes and not for new features.
What do you think?
PS: @Mariatta has reminded me that the automergebot requires review by another core devs. My proposals boils down to: always use the bot and make good use of experts.
FWIW the bots already respects this. PRs by core dev now defaults to “awaiting core review” stage, and it cannot be automerged unless it received review from another core dev. (to change the state from “awaiting core review” to “awaiting merge”)
The CODEOWNERS thing is buggy. For example, the line:
**/*sha* @python/crypto-team @tiran
matches the Lib/multiprocessing/shared_memory.py file.
Also, I’m strongly opposed to the idea of code owners. It breeds the wrong idea of core development. There are experts whom you can contact to get an opinion (and ideally a proper review). But they should not be gatekeepers.
I like codeowners as a way to automatically request review from certain group but I agree that they should not be gatekeepers. If another core dev felt confident with the change even if they are not one of the codeowners, then they should be allowed to merge it.
I don’t like the term “code owner” either. That’s the term GH uses for automation (as @Mariatta already explained). Let’s stick to our term “expert”.
The CPython interpreter and stdlib is complex. Thankfully we have experts for most topics, areas, and modules. I believe that we should make good use of our experts and rely on their expertise. Even small and apparently trivial changes can have unforeseen side effects. Therefore I propose that domain experts should be given the chance and time to review a PR.
Do you have any numbers on the fraction of code that is committed without reviews or approval by other core devs?
I definitely think that in the ideal world, every commit would be reviewed. I certainly always try to get mine reviewed and as far as I can tell I have always succeeded in this. That said, it seems like we are already very bottlenecked on code reviews, and if we’re going to double or triple the number of reviews necessary to get things merged, that could pretty dramatically slow our velocity (granted — if we can’t get adequate reviews for all the code we’re changing, that’s a problem all its own).
On dateutil there aren’t too many people to do reviews and I do often find that I need to self-merge unreviewed code just to get things done. We have a decent number of self-merged commits on setuptools as well. It is sometimes a necessary evil in resource-constrained projects.
That said, I’m not really opposed per se, I just think it would be worth looking at the numbers to see how big a change this is. It might make it easier to ask for reviews once it’s a requirement, and it might shift the culture a little bit to encourage people to review other core devs’ code more readily even if the person is a subject matter expert.
Would make sense to allow approval by a triager to be considered enough stamp of approval (at least as an intermediate step between “review not required” and “reviews mandatory”)? I think most triagers are on the “core dev” track anyway, and I personally think doing code reviews is much stronger signal of how good of a core dev a person will be than submitted code anyway.
Might be hard to organize this, but it might be good for an “expert” to be able to explicitly delegate authority, or at least say, “Conceptually this is sound, I don’t have time to review this now but I’m OK with another core dev doing the review”. I feel like that’s an option I would probably avail myself of (especially as I feel myself being stretched thinner and thinner and don’t want to end up the blocker on a bunch of trivial datetime PRs).
Any change should involve two people that are familiar with CPython.
Experts should get a change to get involves, either on the BPO or PR.
The second person doesn’t have to be a core dev. It could be a trusted reviewer, an aspirant, or an established expert from our community. On some occasions I have asked engineers for a review on SSL related stuff that don’t even use Python.
The expert doesn’t have to do the entire review. “Conceptual review” is the term I was looking for, thanks!
There’s some noise on the “merged without approval” metric that I think is biased against approvals, in that it includes things that are submitted by one core dev and merged by another without formally clicking “approve” or where another core dev comments “LGTM” or something, but doesn’t use the “Approval” button. Some are also manual backports. Still, ~50% is an unfortunately high number, and my spot checks make me think that the number is at least on the order of 20-50%.
Thanks for the numbers @Mariatta. Would be good to hear from people for whom merge without approval is a critical part of their workflow. I’d definitely be unreservedly on board if the general consensus among those folks is, “I self-merge because no one ever asked me to do anything else.” Otherwise we may want to consider strategies to mitigate the impact on productivity (or at least explicitly consider the trade-offs between productivity and correctness).
about exponential execution time in fnmatch.fnmatch(). If I had to wait for a review, it would still be open .
Who would review it? Anyone could dispute, e.g., the choice of variable names, or whether to use single or double quotes … but arguing about style issues is mostly a waste of time. A technical review would require deep knowledge of regular expression implementation limitations and practical workarounds, and that’s rare.
As is, my unreviewed patch triggered a regression: it turned out that coverage.py died when run on the urllib3 project, due to pasting together regular expression strings returned by fnmatch.translate(). Which the docs don’t say can be done, but happened to work before, and which I didn’t anticipate. A reviewer may have anticipated that, although I think it unlikely (even if they knew at least one project relied on this undocumented behavior, would they also have guessed the regexp transformation the patch implemented could break it?).
I didn’t wait for a review on the repair for that either, and so the bug report opened against the coveragepy project by the urrlib3 project (using an alpha 3.9 Python) was closed in less than a day after it was reported.
Timely resolution is also of real value, for all 3 projects involved in this one.
I have no problem with encouraging reviews, and would have been happy to get one for either of the patches above. But, at least for bug fixes, it’s not unusual for a core dev to know at least as much as anyone else on Earth about what’s at issue. If they think it’s dicey enough that a review could be a real help, fine, they can ask for one. Else I trust them.
This is a fine aspiration. Mandatory review definitely works well when all contributing parties are being paid. I share the hesitation of Paul and Tim, though. CPython is like 100 dateutil projects smashed together. Many CPython components don’t even have one expert much less an expert who is currently motivated to do the demanding volunteer work of code review. For better or worse, CPython is a doacracy, where those who have the time can decide what the project does.
I am personally in favor of the theme of the main points, but would change must to should. As has been stated above, it’s presently impractical for some areas of CPython to have changes gated on an approval from another core developer or any qualified reviewer for that matter.
Instead, I would like to suggest an alternative: without an approval, any non-critical code change PRs (not including reverts) should be open for at least a week prior to merging if there are no approving reviews from a core dev or triager. In the case of more complex changes that would take a significant amount of time to review, this period could be extended. Although that decision would have to be made on a case-by-case basis, using best judgement.
I think this would at least somewhat help to give others time to provide a review before the changes are applied, but would avoid the problem of changes being indefinitely gated on an approving review.
I didn’t know about this term “doacracy” until now. An excellent word to define the process of CPython development.
On the topic of reviews, not making it a “must”, but stating it as “preferable” and leaving it to the judgment of the core-developer might be beneficial (basically status quo). I didn’t realize that this post had a poll. Just cast my vote in this democratic doacracy.
I have similar reservations to others. The key problem we have is timely handling of PRs, and adding mandatory code reviews increases the number of blockers to things getting merged. It is certainly preferable, but judgement is needed and that’s why we trust our core developers.
Reviews to help knowledge transfer are also a fine idea, but again, I’m not convinced that the benefit is worth the cost here.
In the past (not sure whether this happens anymore), we used to have voluntary post-merge reviews via the checkins ML, ie. other core devs would see the patches and comment on them, perhaps asking for changes. That was before PRs and websites such Github which enable interactive reviews.
Making reviews mandatory is really only reasonably possible in paid environments, IMO. They tend to add too much delay in non-paid environments, can cause frustration and are not always effective in making things better (e.g. waiting 5 days to get a LGTM doesn’t really say anything about the quality of the review). For time critical changes, they may even result in security issues staying open for too long or preventing tasks further down the line.
That said, voluntary code reviews are certainly a good idea. For patches which are not time critical or where the submitter is not 100% sure, asking for a review should be recommended.
Heh, mandatory reviews slow things down even in paid environments But generally only by hours, in my experience, rather than weeks.
As someone else said above, we trust our core devs. Everyone else needs a mandatory review. Not sure this needs fixing, but if it does, is it just the trust level in core devs not breaking things? Or not changing designs without warning?