Workflow change coming: mandatory reviews on new features

The Steering Council asked me to implement a change to our core workflow on python/cpython specifically that will prevent us landing PRs that are new features without a review from another core developer.

Implementation

I have the following approach in mind:

  • the “check labels / DO-NOT-MERGE / unresolved review” check on PRs will now require at least one approving core review (with a green checkmark) for any PR with a type-feature label.

To ensure this label gets applied, Bedevere will be updated to do the following:

  • if we open a PR,
  • and that PR touches code (i.e. not only docs / tests / build / CI configuration),
  • there is an issue linked to the PR,
  • and that issue is not marked with a type-bug, type-crash, or type-security label,
  • then Bedevere will mark this PR with the type-feature label.

Rollout

The rollout will be in three stages.

First, mid next week the “check labels” check on PRs will be adapted to require reviews when the label is applied. We can then test whether everything works fine, very few PRs actually have this label applied. We can apply the label manually and see what happens.

Two weeks from now, Bedevere will start labeling PRs with type-feature if the issue is explicitly labeled as type-feature. Again, we will be observing how the PRs behave and whether this isn’t disruptive.

Finally, three weeks from now, Bedevere will apply the label according to the process described in “Implementation” above.

Considerations

The goal here is to introduce code review for new functionality not only to catch potential bugs, but more importantly to signal consensus in the new functionality’s direction. Most new features are not PEP-worthy but it doesn’t mean they should be designed, authored, and pushed by a single person. We want to avoid having to revert changes after some deficiency was discovered much later, sometimes by curious users.

We understand certain parts of the project have a single maintainer at best. The Steering Council suggests onboarding another person in this case. You can also ask specific people for review in the PR GUI on GitHub, or shop for reviews on Discord. If all of that is impossible, you can reach out to the Steering Council to discuss exceptions or other options.

We also understand the number of open PRs is growing and any changes in PR processing that slow down merging might be undesirable in this environment. If we’re talking about slowing down addition of new features that aren’t understood by other core team members, then that’s actually by design. This process change will not affect other types of changes.

The process isn’t meant to be airtight. If a core developer changes the classification of an issue and therefore the PR, landing without a review will be possible. Manual removal of the label from a PR will be respected by Bedevere as well. This stems from the general “consenting adults” rule, but is also meant to allow unblocking workflows in case of any edge cases. We hope this safety hatch won’t be abused.

Finally, the rollout is meant to be gradual also to allow all core developers to get familiar with it and raise any concerns they might be having. So if there’s anything we should know, now is the time to speak up.

35 Likes

As much as I don’t like mandatory reviews, this looks like the best possible approach and I’m happy with it. Let’s hope it goes smoothly!

13 Likes

I don’t quite understand what you mean by onboarding another person? Is the single maintainer responsible for onboarding the person?

I believe before reaching out to steering council, core devs can ask dev in residence to help out in reviews?

1 Like

I’m happy in general to help out with reviews for under-resourced modules; if you’re the only person working on some module and need a review, feel free to ping me and if I have time and I can understand what’s going on, I’ll try to provide a review.

10 Likes
  1. This requires a sharper classification than we have now. How is a PR adding new tests classified for this purpose? They are sometimes treated as new features not to be backported, but could be a bugfix for this purpose since everything should be tested. What about performance improvements, which are often not backported?

  2. What counts as a 2nd review? There is the specification and the implementation. Many people can usefully comment on a new IDLE feature, and the general python code, but not the tkinter part of the implementation.

1 Like

Note that triagers also get green checkmarks. Is a review from a triager considered sufficient?

I don’t think new tests should be considered as features. I feel this change is about making sure that the interface we present to users of the language is reviewed. Internal-only changes such as tests don’t fall under that.

6 Likes

Good question! The main point here is that if the single maintainer wants to make sure they aren’t the sole maintainer, they need to transfer their knowledge to more people. Those people don’t have to be strangers, it might be as easy as asking a fellow core developer (including yours truly) and spending some time with them and the functionality in question.

Tests aren’t treated as features, there’s less scrutiny there in terms of design and implementation. Documentation is similar, because we can adjust it rather liberally after Beta 1 for a given Python release, while we cannot make significant changes to new features after the beta freeze. But both test changes and documentation changes can be extensive enough that we decide not to backport them to a previous bugfix version.

So the question of backports is orthogonal to the question of mandatory reviews.

It would be perfect if a single reviewer could understand everything and catch every problem. That’s not the expectation here. Rather, asking for “just” a single review is being pragmatic about the review bandwidth we can provide.

If it’s an informed review, I don’t see why not. If this becomes a problem, we can discuss further, but I would be inclined to trust triagers on this one. There’s a reason they have the power of the green checkmark.

:100:

7 Likes

This is helpful. For IDLE, a reviewer should then open IDLE and look at or invoke the proposed new look or behavior. I would very much like to have at least 1 other person looking at the visual effect of every patch. For this purpose, looking at the code would be secondary.

4 Likes

I think that it would be helpful to also have a “new feature” is merged notification as a Discord channel. The GitHub channel for all PRs is noisy so a “new feature” merge notification would be useful.

Alternatively, a weekly run of @nedbat’s dinghy project for new feature label activity would be useful.

On another note, I’m assuming that the Steering Council is trying to surface awareness of new features as much as trying to maintain quality with these workflow changes.

4 Likes

AFAIK in all or most cases when the feature was a surprise for some core developers, it was reviewed and approved by other core developer. So this will not completely fix the issue.

4 Likes

According to these criteria, an internal refactor of code is marked as a feature.

3 Likes

There’s an escape hatch for this sort of thing:

The process isn’t meant to be airtight. If a core developer changes the classification of an issue and therefore the PR, landing without a review will be possible. Manual removal of the label from a PR will be respected by Bedevere as well. This stems from the general “consenting adults” rule, but is also meant to allow unblocking workflows in case of any edge cases. We hope this safety hatch won’t be abused.

But a review of internal refactors is also a good idea.

4 Likes

Sure (a review of anything is a good idea), but the feature label needs to be removed. Removing that label is not going to be a rare event.

For those PRs you probably already edit labels to put “skip news” on there, right?

In general, if we find the need for a type-refactor label, we’ll add that in the future to avoid “false positives”.

5 Likes

This is now done. See the results below. Note that I split the DO-NOT-MERGE check into a separate check for communication clarity on the list of green checkmarks on a PR. Now there’s a separate “Unresolved review” check, which should make it obvious what is requested.

Behavior without a type-feature label

The “warning” above is how the check for the type-feature label works. If the label is missing, then the next step (“Check for complete review”) is skipped. But the entire “Unresolved review” job is successful, which you can see in the above green checkmark.

That means there’s no workflow change for PRs without the label.

Behavior with a type-feature label but no review

Here the type-feature label was found, but the review was missing. The check is red. A core developer needs to approve this PR.

Behavior with a type-feature label and an accepting core review

Here the complete check passed: there’s a type-feature label and somebody left an accepting core review, so Bedevere assigned the PR the awaiting merge label. All is good, all is green, can merge.

Behavior with a type-feature label and a core review requesting changes

If a core developer leaves a review that requires changes, Bedevere sets the awaiting changes label. In this case the first part of the check fails. We don’t even have to check for the type-feature label: this part of the process was already like this. It’s not changing.

5 Likes

If there’s a PR that was opened before this change that is approved, but won’t let you merge:

Then click the “Update branch” button and it should be mergeable.

4 Likes