Automatically close all enhancement PRs for code lacking an expert?

brettcannon · November 20, 2019, 5:57pm

The problem

As I write this we have 1,008 open PRs. Back in September at the core dev sprints we managed to get it below 1000. Unfortunately since then it has climbed back up into quadruple digits.

I think this is bad for two reasons. One, it simply makes it hard to manage the PR queue when it reaches this size. There’s just so much it’s hard to keep things organized.

Two, it’s a bit demotivating for both PR authors and those trying to work on PRs. For PR authors it gives a feeling that their PR will never be looked at (which is honestly quite possible). And for those working PRs it gives a feeling of dread and hopelessness that one could keep up with the PR queue.

The idea

So to try and get the PR queue down to a more manageable size, I have a somewhat radical idea: automatically close all enhancement PRs that only touches code that has no experts. Basically if a PR’s changed files does not have a single match in CODEOWNERS and the PR is deemed an enhancement then Bedevere automatically closes the PR.

I have a few reasons for thinking this isn’t a bad idea. One is that if code lacks an expert then there isn’t someone who necessarily has an API design in their head or how the module is typically used to make it easy to judge whether an enhancement is good or not. Two is that if there isn’t an expert then there isn’t a core dev who necessarily cares enough to review the code, and so the PR is just going to sit there anyway, so we might as well close it. Three is that enhancements typically bring more code, not less, so expanding something that no one is choosing to focus on is not going to ease up the maintenance either. Four is the longer we leave PRs open the more drift there will be and thus the greater the chance the PR won’t be mergeable due to merge conflicts anyway.

Now this idea has nothing to do with enhancement ideas nor bugs in any way. The PRs will also not be deleted and will have a label delineating why they were closed so if someone ever wants to become an expert for a module later and go back through old PRs they can do so.

The suggested plan

If people don’t think this is a crazy idea, I think a general plan of implementing it would be:

Update bugs.python.org to pull its experts data from CODEOWNERS (probably via some comment metadata, e.g. # [importlib] and all the names on the following line are considered experts for that area, then use the GH -> b.p.o mapping to translate on the bugs.python.org side)
Ask core developers to update their expertise in CODEOWNERS
Delete the now-stale experts list from the devguide (and update docs as appropriate)
Create a new no experts label
Update Bedevere to label PRs that don’t touch a single file with an expert with no experts
Update the devguide to point out that enhancement PRs for code with no experts will be closed (and explain how to figure that out along with saying “just ask” as a fallback)
Update the PR template to point this all out as well (is there anywhere else to state this?)
Update Bedevere to close PRs with a message when a PR has both no experts and type-enhancement labels
Start going through our backlog of PRs and applying type labels as appropriate, letting Bedevere automatically close the PRs

What do people think?

I’m starting the discussion here to keep it initially small and to make sure triagers participate in this discussion as they will be part of the front line in marking PRs as type-enhancement. If people are generally on board I can then take it to python-dev.

brandtbucher · November 20, 2019, 6:18pm

Thanks @brettcannon, I think this is a great idea. It also has the side effect of encouraging more devs to register themselves as experts.

Two thoughts I have:

It would now be more important than ever to generate/sync the Experts Index in the devguide and the CODEOWNERS file with a single source of truth (something I’ve been wanting for a while). This is easier now that you’ve linked GH/BPO/IRL names for the core devs, and seems to me to be the next logical step.
If somebody newly registers themselves as an expert, should we have some mechanism/bot to reopen all of these deferred PRs? It could be nice in some cases, but suddenly open the floodgates in others.

corona10 · November 20, 2019, 6:31pm

First of all, Thanks for the great planning @brettcannon

So to try and get the PR queue down to a more manageable size

I agree with this opinion.
Before the start the plan, I 'd like to recommend that we should update the CODEOWNERS within a specific deadline.

I can not see the names of some enthusiastic core developers on the list.
Thanos’ s finger-snapping is an effective operation,
but hopefully, it should close as little as possible.

tiran · November 20, 2019, 7:05pm

I have an additional suggestion to compliment your proposal. It is quite common that contributors put a lot of effort into a PR without getting consensus about the feature or bug first. IMHO any feature or non-obvious and trivial bug fix should go through a discussion first. For new features there should be at least an agreement that the feature makes sense at all.

How do you feel about auto-closing of PRs when there is no agreed solution yet? We could use the “needs patch” stage.

asvetlov · November 20, 2019, 7:57pm

I recall the recent https://github.com/python/cpython/pull/17236 pull request from @brandtbucher

The PR had no assignment. @thomas merged it after the review, while Thomas is not the code owner.

Does it mean that the PR will be closed automatically if the proposed strategy will be applied?

For me, being the code owner means a sort of responsibility for the owned part of Python. It doesn’t mean that I cannot work on other subsystems, it means that I don’t want to be the last resort for things where I don’t care too much.
fcntl is a good example. Even threading and multiprocessing, unittest and mock have no owners. Does it mean that the PR for threading improvement should be closed automatically?

brandtbucher · November 20, 2019, 8:02pm

Does it mean that the PR will be closed automatically if the proposed strategy will be applied?

I don’t think so, since that specific PR was tagged as type-bugfix, not type-enhancement.

Even threading and multiprocessing , unittest and mock have no owners.

They do have experts though! I think this ties into my earlier point about syncing these two resources.

As an aside, perhaps PRs without code owners could be tagged and left open for some period of time (a week?), then closed automatically after that. That way they still have a chance, but don’t grow stale and become a triage/review burden.

brettcannon · November 20, 2019, 8:24pm

Yes, and I assumed that truth would be CODEOWNERS since we are looking to move issue tracking to GitHub. We can add metadata in comments as necessary and then update bugs.python.org to read CODEOWNERS instead of the devguide (and honestly ditch the experts list in the devguide).

I’ll update the proposed plan.

No, as chances are the PR will have gone stale.

Yes, I was assuming that bit. Sorry for not writing that down as a step initially.

Maybe, but that’s orthogonal and should be its own thread.

No. As @brandtbucher pointed out that was a bugfix, not an enhancement.

Exactly. And I’m not saying you can’t work in other areas on occasion (especially in helping with bugfixes). But what I am saying if there isn’t a single core dev who wants to be that person of last resort when it comes to features then we just shouldn’t lead people on by letting their PRs languish for years (and I do mean years: the oldest open PR right now is 2 years, 9 months old).

I think so because no one is choosing to care enough to be that person of last resort. In that instance no one is keeping the design of threading in their head or is worrying about API coherency. I don’t think drive-by enhancement updates are necessarily a good thing, especially for widely used modules where a misstep in design has a much broader impact.

I will also point out that I very specifically called out “type-enhancement”. This doesn’t apply to e.g. “type-performance” or “type-tests” or anything else that is much easier to judge in isolation.

Chance of what, though? As I said above, I don’t think drive-by enhancements necessarily help things so I don’t think an extra week for a PR to languish helps matters.

brandtbucher · November 20, 2019, 8:48pm

Chance of what, though? As I said above, I don’t think drive-by enhancements necessarily help things so I don’t think an extra week for a PR to languish helps matters.

Well, let’s use another one of my recent PRs as an example:

These files have no code owners, and the only experts would be from the overly-broad “extension modules” category. While this PR is (rightly) tagged as an enhancement, the linked issue is a bugfix omnibus that would be greatly simplified with this change. It’s straightforward, easy-to-understand, and addresses an issue that we’ve known to be a bug magnet for years.

While I don’t think it should be open for review for more than a few weeks (at most), I also don’t think this should be shut down immediately… do you?

asvetlov · November 20, 2019, 9:40pm

Thanks for clarification.
type-enhancement only is the point that I’ve missed.

The proposal sounds good to me now, at least I’d like to try and see how things go.

pitrou · November 20, 2019, 10:38pm

Some “enhancements” can be very small or obvious additions that don’t require an expert to review, just a motivated core developer.

Generally, if a project has some automatic closing procedure that triggers when I submit a PR (I actually never saw such a thing before), I will find that quite demotivating. So, to me, automatically closing PRs is certainly not a solution to the demotivation problem, but an aggravation.

nad · November 20, 2019, 11:35pm

I think we do need to address the problem of languishing PRs and I’m sympathetic to your proposal. But I agree with @pitrou that this seems like overkill and has a very strong potential to demotivate contributors.

One problem with this is that I think, in practice, there are going to be PRs that defy automatic classification. There will continue to be reasons why some files don’t have a CODEOWNER other than lack of core developer interest. And, no matter how nicely the PR template and closing messages from the bot are worded, an immediate auto-closing is going to be seen by many as off-putting and demotivating. I think allowing an extra week or so does matter. If nothing else, it gives some time for an initial discussion and perhaps making other potential stakeholders aware of the proposal. It doesn’t matter for the goal here (to reduce the number of languishing PRs) if a PR is kept open for a limited period of time as long as expectations are set properly.

I’m OK with your suggested criteria for identifying PRs that are likely to languish. What I would suggest instead is that, when a PR is opened that meets those criteria, the bot insteads posts a message to the PR noting that, because there is no code owner identified, the PR will be archived (or some other term) if no core developer accepts ownership (by filling in the Assignees field of the PR) within 14(?) days. That gives some time for a discussion and for interested parties, including the submitter, to advocate for the change and try to find an interested core developer.

Along with that, I think we should introduce a weekly report of PRs similar to the long-standing one for bugs.python.org issues. Besides potentially reporting on PR status change statistics, it should include a list of these current open ownerless enhancements PRs pending archiving. I expect that many core developers do not have the time to regularly look at pending PRs and would welcome such a service. (The report could also highlight backport PRs awaiting merging, another continuing source of open PRs.)

In fact, the more I think about it, I wonder if we should just treat all enhancement PRs this way, regardless of code ownerships. If so, perhaps the review period should be a bit longer but basically it would force core developers to acknowledge open enhancement PRs by assigning PRs to themselves with the documented understanding that doing so does not make a committment about when or ultimately whether the PR will be merged. And, as necessary, release managers or steering council members could continue to manually assign a PR to someone else.

And, sure, under those conditions, we could still end up with a large backlog of languishing PRs that do have assignees rather than being ownerless. But that’s a different kind of problem with different solutions.

brandtbucher · November 21, 2019, 1:34am

For the benefit of the discussion: of the 1,007 PRs currently open, 711 are untagged, and 102 are tagged as type-enhancement.

Of both these groups combined, 25 PRs are less than a week old. I see no way to filter searches based on CODEOWNER hits, though.

corona10 · November 21, 2019, 4:28am

While many PRs will be closed with Thanos finger snaps, I think that the PR queue can grow again.

It’s a bit off-topic, but how about considering to add the feature that can help this issue will not re-happen?

For example notify the message about old PR. (e.g: 30 days over notification)
or adding label for PR size (e.g: XS, S, M, L, XL) to help that the reviewer can sort the PR review order.
These features can help to reduce this happen.

gpshead · November 21, 2019, 8:33am

This will probably not encourage me to register my name in more places so long as being listed as one generates more demands on my time from Github automation. Being responsive to PRs isn’t my volunteer job and bugfixes get higher priority than enhancements regardless.

Prototype desired. If we’re going to do this, putting something up that shows a live-ish list of which PRs are going to be mass-closed if done using our current config so we can peruse those and see if there are areas anyone wants to adopt before we actually start auto-rejecting people’s work.

“Closed” feels like a rejection. No matter what bot does it for what reason. Even though it is merely a state flag being proposed to be used to reduce noise and doesn’t prevent reopening. From a BPO UX side of things “Closed” showing up there means I’m less likely to bother opening the GH PR link. Because it doesn’t convey enough. Was it rejected for reasons discussed somewhere? Was it tossed aside as wrong? Was it auto-closed by a bot that assumed nobody cared? Those are all different states. But they aren’t captured in the UI listing PRs on an issue so the lazy strategy that works in my favor is to assume the worst of Closed and ignore anything marked as such. GH doesn’t give us all of the states we need.

I assume you’re proposing Closed because the GH default UIs make it easy to see Open vs Closed things. Rather than people needing to know they should search for issues without the “help-nobody-claims-ownership-of-the-files-this-pr-enhances–how-long-can-github-tagnames-be-anyways-and-is-it-a-byte-limit-or-rune-limit-and-do-s-count?” tag on it. (meta: off to bed with me!)

pitrou · November 21, 2019, 9:42am

As an aside: if too many open PRs threatens the usability of the PR list UI, since Github issues have roughly the same UI as PRs, does it bode well for migrating our issue tracker to Github? Or is Github gonna develop the usability improvements that would be required?

Rhodri · November 21, 2019, 2:32pm

Do we need a bit of a push for owners of unowned areas of code?

pf_moore · November 21, 2019, 2:50pm

Having “owners” is a bit of a two-edged sword (for smaller areas, at least). On the one hand it means that there’s someone who has agreed to take an interest in that area, but on the other hand, it might discourage others from making decisions on PRs etc. Smaller code areas don’t necessarily need a strong “design philosophy”, so it’s relatively OK for any core dev who has an interest to review/approve PRs and enhancements.

I think we need a better understanding of the problem here. @brettcannon quoted “1008 open PRs”. Are those all enhancements? If not, how many enhancements are there in that? And how do those enhancements break down across the various code areas? It’s difficult to evaluate the proposal without knowing the numbers in more detail.

Although I agree with @pitrou that auto-closing PRs just because “we have no-one interested in reviewing this” is a very demotivating message to send to interested contributors. Who might, after all, be motivated to become the expert in the area they are contributing to…

Rhodri · November 21, 2019, 4:40pm

Well, yes. I have an interest to declare: Ethan Furman and I agreed to take on cgi/cgitb, though I haven’t had time to work on anything since the end of August and Ethan’s been busy too. This (theoretically) involves me going through b.p.o. and putting up PRs for things I think do need work. Getting them auto-closed because no one officially “owns” cgi would be very demotivating.

stoneleaf · November 21, 2019, 6:10pm

I added us to the experts file back in August. How, and if, that affects the CODEOWNER file I do not know.

brettcannon · November 21, 2019, 10:05pm

Honestly, if there’s no one willing to own modsupport.c, then sure. Would you feel better if it sat there for a week and then we closed it because no one got to it? How about a month? I personally would not feel better about having a PR automatically closed due to lack of availability/engagement after some time because no one chose to look at it compared to closing it upfront because I didn’t read some docs that I should checked upfront.

But so is leaving PRs to languish. Do you find it better to have a PR sitting open for a month or 3 months or a year on a project, wondering if anyone is going to get to it while it drifts farther from what’s in master? For me personally I would rather read the contributing docs and be told, “check if someone will be available to review this, else we just don’t have the capacity for it”. Compare that to “we will leave it open for a week in hopes someone gets to it, otherwise we will close it”, which seems to give hope where there quite possibly won’t be any.

The bot was always going to leave a comment, there was never a worry of a PR being closed with no message as to why.

But what is “archived” even supposed to mean? And what state would that be: opened or closed? That affects how tools interact with the PR queue.

Please start separate topics for those other ideas.

I never said it was because GitHub’s UI couldn’t handle it, I said a queue of 100 is hard to manage, period. And I don’t think bugs.python.org is in any way better equipped to manage this (and I honestly think it’s worse since there isn’t a clean separation unless the ‘patch’ label is applied and we don’t have the same querying capabilities against patches as we do with GitHub and PRs).

You would have registered your interest immediately and been done.

They are totally separate things due to the current split of GitHub and bugs.python.org. You should probably go into CODEOWNERS and add Rhodri and your details for any files involving cgi and cgitb.