Triaging/reviewing/fixing issues and PRs

ezio-melotti · April 29, 2022, 1:47am

After a few discussions on Discord and the language summit, I wanted to summarize my thoughts in this post, trying to break down the problems, listing (some of) them, and suggest possible solutions (some of which are already known).

Issue discoverability

One of the problems is making sure that people with the right set of skills become aware of specific issues. Without these people – which are mostly core developers – issues/PRs can not be moved forward and risk becoming stale.

There are many possible solutions that vary depending on the person:

triager @mentions
label subscriptions
custom email filters
review requests (for PRs)
CODEOWNERS (for PRs)
related issues/GitHub projects
reading new-bugs-announce
reading activity summaries
reading activity on Discord/IRC
search queries on the repo
active triaging
etc.

Some of these require people to take the initiative (e.g. searching through issues), while others don’t (being @mentioned).

One of the goals is trying to filter the signal (interesting issues that the dev can help move forward) from the noise (issues that the dev can’t contribute on).

Suggestion: since the definition of what qualifies as signal and noise are highly subjective, we should provide a wide variety of options so that everyone can pick the one(s) that work better for them. These should be listed and documented clearly, explaining how to use them more effectively. We should also add new options, possibly including:

the weekly summary report (being worked on)
custom/personalized dashboards
GitHub projects that group related issues
GitHub teams to easily notify experts

Reducing the number of open issues/PRs

Bonus post - Noisy Monitors - by Sam Schillace was brought up, and while it has some valid points (e.g signal vs noise, addressed above), I’m not sure others apply to us. In particular, in open source projects, the priority is generally dictated by the whim of the developers, rather than by specific goals set by a project manager that wants to ship certain features by a certain deadline and needs to have a clear view of who will work on what.

In open source, anyone can decide to fix any issue at any time, regardless of how old or obscure the issue is.

I’m not convinced that having a lot of open issues is actually a big problem by itself – it’s mostly a symptom. The real problem is our inability to keep up with the ever-growing number of issues reported and PRs created.

Even if we had an order of magnitude less open issues (700 instead of 7000), we would still have over 30 pages of issues on GitHub. This might be a problem while searching/filtering, but it can usually be solved by a more specific search query.

The number of PRs is less of an issue by itself, because their discoverability is generally tied to the discoverability of the issues that link to them.

In addition, we should avoid falling victim to Goodhart’s law (“When a measure becomes a target, it ceases to be a good measure.”) and ending up closing issue/PRs just to keep the numbers low.

Suggestion: we should try to focus on the problems, rather than symptoms (the number of issues).

Closing old issues

Issues should be closed if they are invalid, and generally kept open indefinitely otherwise, even if they are old.

Note that issues might become invalid if the issue got fixed elsewhere, if a better solution is now available, if the module has been deprecated, etc. Feature requests that are not well-specified or require more discussions might also be considered invalid (see “Determining the validity of a feature request” below).

If an old issue is closed, users that run into the same problem and only check among the open issues might risk re-reporting it, creating a duplicate and losing all the previous discussions and linked PRs.

Suggestion: leave old issues open, unless they have become invalid in the meanwhile. Possibly revisit issues at regular intervals to make sure they are still valid (see “Dealing with stale issues” below).

Closing issues automatically

When it’s not obvious if an issue is valid or not, a triager/dev can request more information or the opinion of someone else in order to determine the validity of the issue.

Suggestion: in this situation the issue could be set as pending by a human and automatically closed after a certain amount of time if the information/opinion is not provided. In all other situations, issues should not be closed automatically (see “Dealing with stale issues” below).

Closing PRs automatically

If the contributor didn’t sign the CLA, their PR can’t be merged, so it’s safe to close it.

Suggestion: in this situation the PR could be automatically set as pending and automatically closed after a certain amount of time. The same could be done if the author doesn’t reply to review requests.

Since PRs are linked to issues, as long as the issue is kept open it’s not too difficult to find related PRs, even though they got closed/rejected. This means we might decide to automatically close PRs if the OP didn’t reply to a review request or in other situations.

Noticing new issues/PRs

Sometimes new issues go unnoticed, and if they aren’t triaged they are even less likely to be noticed.

It was discussed and proposed to add an untriaged (or new) label, but eventually we didn’t add it. One reason was that contributors can’t set labels, so if an issue has no labels, it’s probably untriaged. Now that we added issue templates, some labels are applied automatically, making this distinction impossible.

Suggestion: mark new issues with an untriaged label. If the issue needs more triaging, the label can be left or reapplied. The label should be added automatically on new issues.

New PRs might mention the issue they address and this creates a link in the issue timeline, but no email notification.

Suggestion: automatically add a message to the issue whenever a new PR is created and/or @mention author/commenters in the PR.

Replying to new issues/PRs

Even if they get noticed and triaged, sometimes issues and PRs get no replies. The weekly summary report used to list new issues with no replies to prevent this.

Suggestion: create a report with a list of issues/PRs that haven’t been noticed/triaged/replied to. This can be done in the new weekly summary, and also as a documented search query (possibly with a direct link).

Dealing with stale issues/PRs

Sometimes issues/PRs just “fall through the cracks” and are forgotten, even after some initial activity. The fact that the default issue view on GitHub is sorted by creation date doesn’t help. Having a mechanism to “ping” issues might be useful, but might also end up being too noisy and being ignored. There is no solution that works for everyone.

Suggestions: use the stale and pending labels as described in this core-workflow issue. Document useful search queries (e.g. issues where you commented or PRs you reviewed that have been inactive for over a month).

Determining the validity of a feature request

Unlike bugs, feature requests are more tricky to evaluate:

Unrealistic requests (e.g. adding braces to the language) can be rejected directly.
Major requests (e.g. adding a new module) should generally be proposed as a PEP and/or discussed on python-ideas/dev first.
Minor requests can live in the issue tracker if they seem reasonable.

The line between these is not very well defined, so the triager will have to take a somewhat subjective decision.

Suggestion: clearly document the guidelines so that both reporters and triagers know what is likely to be accepted. Triagers can also link to the guidelines while rejecting feature requests. Changing the Python Language - Python Developer's Guide is somewhat related.

Explaining decisions

There are common reasons to reject issues and PRs. Currently the triager has to spend time writing and sometimes arguing with the author. Having a list of explanations will make this simpler.

Suggestion: create a FAQ-like page with explanations for common case (invalid bug, invalid feature-request, third-party issue, CLA not signed, can’t reproduce issue/not enough info, etc.). This could be done for commonly reported issues/feature requests (e.g. Programming FAQ — Python 3.10.4 documentation).

See also Proposal: canned explanations for issue/PR closure decisions (Language Summit follow up)

Limiting invalid reports

Both “invalid” feature request and unactionable bug reports waste triagers’ time, since they have to either close the issue and explain why, or request more information. Less invalid reports, means less work for the triagers.

Suggestion: Improve issue templates with clearer instructions, and possibly forms that are easy to fill. This is supported by GitHub, and might also help us with automatic labelling.

Getting/retaining triagers and devs

This is a bigger issue that was investigated in the past, but I don’t remember anything conclusive.

If we can identify:

what drives people to contribute to CPython
what problems they met before becoming part of the team
what problems they are meeting now (while triaging/reviewing/fixing/etc.)
what might drive/have driven them away from the project

we can then address these friction points and help getting and retaining triagers and devs.

Suggestion: ask the team (through a poll or in a thread), identify the problems, and discuss/address them. The same could be applied separately to outside contributors too.

Retaining devs after a migration

Every time we change our workflow or introduce/replace tools, we lose a number of contributors that either are not keen to learn them or they are simply left behind because they don’t know how or didn’t bother updating their setup.

GitHub also provides some features that might replace some of the previous tools and workflows, such as label subscriptions, CODEOWNERS, teams, use projects, milestones, etc., but not many people are aware of them and know how to use them

Suggestion: We should clearly document migrations and explain how to use new tools, especially when they replace one that was used previously. We should also encourage devs to subscribe to labels and update the CODEOWNERS file, and possibly look into other features.

Increasing the number of available people/hours

One way to solve this is by streamlining and automating the workflow so that devs can use their time more efficiently and contributors can learn faster and then join the team. Some of the previous items include more specific suggestions related to this.

Some devs might also be willing to contribute more time if their feedback is requested explicitly and if obstacles are removed, but they might not want to spend time actively looking for issues that they might be able to fix.

Suggestion: invest more resources into improving the infrastructure and the workflow, including bots, tools, GitHub actions, CI, documentation, etc. Hiring more core devs from the team might be an option too.

pf_moore · April 29, 2022, 9:54am

Hi - I haven’t had the time yet to read through this whole post, but one thing I have noticed is that before the migration, I used to get automatically added to the nosy list of Windows issues. While this wasn’t ideal (there’s a lot of Windows issues I don’t know much about ) it did give me some visibility of what was going on with issues.

Since the migration, I seem to be getting essentially no issue emails (unless they are being caught by my “github PRs” filter[1]). Did the auto-nosy feature get lost in the transition (which would be auto-subscribing in Github terms, I guess)? And if so, is there any plan to get something like that back? I note that the “Noticing new issues/PRs” section only really suggests approaches that require people to actively log onto github, or only apply to PRs.

I’m not aware of any way for me to subscribe to “types” of issues in Github - I can either manually subscribe to individual issues (which is fine for following things I have found out about, but no use for finding out about issues in the first place) or I can subscribe to the repo (which is unworkable, because the traffic is way too high, and also offers me no way to distinguish in the emails I receive between “stuff I just want to be aware of” and “stuff I’m actively interested in”). If I could subscribe to individual labels (ideally with the label being identifiable from the email, so that I can filter on it) that would be ideal, but I don’t think that’s possible. Equally, a feed of just the first post on each issue (but not follow-up comments) would be useful, but again I don’t think github supports this.

It’s not a huge deal - in the worst case, I’ll simply accept that I can no longer get emails for issues unless I manually subscribe to them - but given that this means I’ll end up being less active, I thought it would be worth mentioning the issues I face (in the spirit of “Retaining devs after a migration”). I’m a relatively active user of github, but my entry point on other projects is almost entirely email notifications - I don’t actively scan the website for issues on any project I participate in.

[1]: Is there a way to separate issue emails from PR emails in a filter? Does Github mark the two differently?

hugovk · April 29, 2022, 11:49am

There should be a way to watch issues with certain labels, like this. But I’ve had this since just before the BPO migration and not been pinged for any docs issues yet:

Lots of options at Configuring notifications - GitHub Docs but strangely no issue/PR distinction.

Checking some emails, the message ID or subject may be useful fields:

Message-ID: <python/cpython/pull/91450/push/9608516986@github.com>
Subject: Re: [python/cpython] gh-72346: Added isdst deprecation warning to email.utils.localtime (PR #91450)

Message-ID: <python/cpython/issues/92033@github.com>
Subject: [python/cpython] Usage of first-person pronouns in documentation (Issue #92033)

See also:

Improved notification email titles for issues and PRs - The GitHub Blog

erlendaasland · April 29, 2022, 1:00pm

It is possible. The feature is still in beta, but it works for me.

storchaka · April 29, 2022, 2:04pm

Excellent! I with I could add a to every section, but I cannot, so I add only one .

I think the weekly report of new issues plus a custom filter (open issues with 0 comments) would solve the most of the problem.

Is it possible to create personalized list of issues? For example I would like to create lists related to zipfile or json. It is not worth to add global label for every module/topic, and search with specific terms could give unrelated issue or do not give all related issue, so I would like to manage the lists manually and to work on them when I have a time and an appropriate mood. It would also be helpful to distinguish issues which I want to follow (but not going to resolve it myself) from issues which I am going to resolve myself (some time in future), from issues on which I am currently working, and from issues in which I just left some comment.

erlendaasland · April 29, 2022, 2:07pm

You can create a GitHub Project on your account (not on the python repo) and add the issues to that project. It is possible to add issues from another repo to your projects on your own repo.

ezio-melotti · April 29, 2022, 3:26pm

On top of the label subscriptions @hugovk mentioned (and we have a OS-windows label), there is a python/windows-team that can be explicitly mentioned and/or added as a reviewer. For example, I asked for a review from the windows-team on bpo-36329: Replace 'make serve' with 'make htmlview' by hugovk · Pull Request #826 · python/devguide · GitHub and it looks like you are part of the team, but I’m not sure if you got a notification.

Note that some users reported that some mails triggered by label subscriptions might end up in the spam folder (we informed GitHub of the problem already).

Like the new-bugs-announce mailing list?

You are right. The untriaged label would be useful in case someone replies to the issue while the issue is still untriaged, but if the triagers manage to go through all the new issues regardless of the number of replies, then it might not be needed.

Like @erlendaasland mentioned, you can use projects, both personal and org-level ones. I wrote some notes and instructions about projects in this post: Using GitHub (beta) projects in our workflow (including how to create personal projects).

If you think the project might be useful for other devs, you can also create it on the Python org: this way it will be visible in the sidebar and will help you and others discover related issues. I also recommend using the new beta projects (you can ping me if you need help setting them up).

pf_moore · April 29, 2022, 4:49pm

Ah, cool. It’s possible I did get a notification, but I generally ignore “windows team” PR review requests, because I typically don’t have much useful to add (I think the CODEOWNERS triggers review requests on a big bunch of stuff that’s only marginally Windows related). As I said, I’ve yet to work out how to split issue and PR mails (or better still, set things up so that I get issue notifications, but don’t get PR notifications unless I’m explicitly @-mentioned by name).

The remaining thing missing from Roundup for me is therefore getting notified on issues that are tagged as “Windows”. If I understand correctly, users can’t set a “Component” on new issues the way they could on Roundup, so that basically comes back to label subscriptions combined with a reliance on triagers to add the label. So that’s on me to do some work to set things up how I want.

lol, I’d forgotten that existed, because the auto-nosy was sufficient for me on Roundup (and if I recall, the number of emails if I subscribe to all new bugs was too high). I’ll take a look at that again.

By the way, I should say thanks for all the work that’s gone into this. I probably sound rather negative, but honestly I’m mostly just trying to understand all the new options I have (as I say, I use github a lot, but only on projects much smaller than CPython, where “subscribe to everything and handle the flow in my email client” is sufficient, so I’m having to learn how to handle larger scale problems this time). So I appreciate your patience helping us all get familiar with the new setup.

guido · April 29, 2022, 5:06pm

As has been clear in the responses (which seem to have focused on the mechanics of the migration rather than concerns about issue/PR triage/review/etc.), this thread is too unfocused to be useful. Irit’s idea was the better one: post threads focused on one particular idea, and keep the threads apart in time, to keep the discussion focused. For example, let’s start here.

ezio-melotti · April 29, 2022, 5:18pm

I didn’t mention this here (only on Discord), but I wanted to use this as meta issue to gather some initial feedback on the specific suggestions and then turn actionable items into issues on the core-worflow, bedevere, cpython, and other repos after some consensus is reached. Having other threads for specific problems is fine too if they need more discussion (in fact, I linked to @irit’s thread from one of the problems above).

ezio-melotti · April 29, 2022, 5:30pm

Currently it triggers for everything in PC/, PCbuild/, Tools/msi, and Tools/nuget (see the CODEOWNERS file)

This is correct. Relying on reporters is tricky, since they might classify issues incorrectly. For example, we could create templates with a drop down where the user can select their operating system, and while the information is useful, that doesn’t necessarily mean that the issue is Window-specific.

Understanding the problems that people are facing so that we can address them is one of the things I wanted to accomplish, and negative feedback is much more useful towards that than positive one

AlexWaygood · April 30, 2022, 5:45am

I like a lot of these ideas, but I’m not sure about this one:

I’m not 100% clear on what removing an untriaged label would signify. As a triager, I read quite a lot of newly opened issues. I’ll close obviously spammy issues, as well as issues that obviously need a lot of discussion on this site or python-ideas before an issue should be opened. I’ll also do my best to add relevant labels.

But at the end of the day, I’m not an expert on the vast majority of the code in the CPython repository. For many (most?) issues, I don’t feel confident to say definitively, “Yes, this is definitely a legitimate bug report that deserves a core developer’s time”. So I don’t think I, personally, would be removing the untriaged label very often at all, even though I do a fair bit of triage work.

ezio-melotti · April 30, 2022, 6:28am

If you feel that the issue needs more triaging, then leaving the label after a partial triaging is fine.

If we decide to add this label, we should also agree on what we consider “triaged”: is it just closing clearly invalid issues and adding the relevant labels to the valid ones, or should it also include approving the issue and/or replying to the author?

erlendaasland · April 30, 2022, 1:42pm

Isn’t this kinda the same as was discussed on the summit: adding an acknowledged label on bugreports that were confirmed (reproducible and it’s a bug in CPython, not in the environment) and feature request that were considered reasonable? Adding an an ack label or removing an untriaged label has, in practice, the same effect.

I would prefer an ack’ed label.

ezio-melotti · April 30, 2022, 5:43pm

They share some similarities but the goal and target is different:

the purpose of untriaged is to indicate to other triagers that they should look at the issue and triage it, so that it doesn’t go unnoticed.
the purpose of acknowledged (or the initially proposed accepted) is to inform the author that their issue has been seen and/or accepted.

For example if I see an untriaged issue about asyncio and I can reproduce the behavior on 3.10/main, I might add the type-bug, stdlib, expert-asyncio, 3.10, 3.11 labels, and remove the untriaged label. This will notify the asyncio expert and they (hopefully) will take it from there, either by fixing the issue or by closing it if it turns out the behavior is expected/intentional.

acknowledged/accepted might also create expectations in the author, and has the disadvantage that most open issues will end up with an extra label. Searching for -label:acknowledge might also bring up old/triaged issues that are awaiting for more information, an expert opinion, or are still being discussed, making it more difficult to find untriaged issues.

This idea of new/untriaged was initially proposed in Map bpo issue metadata to GitHub fields/labels · Issue #5 · psf/gh-migration · GitHub (as an awaiting triaging stage), and briefly in GitHub Issues Migration: label mapping - #16 by blink1073 (as needs triage label), but it didn’t get much traction. I brought it up again because templates now add some labels, making it more difficult to notice untriaged issues, however I’m not sure how many issues actually go unnoticed.

As @storchaka said, it might be enough to use the new proposed weekly report (bpo-to-github-migration replace roundup summary script by Harry-Lees · Pull Request #91738 · python/cpython · GitHub) and include a list of new issues with 0 comments (or 0 comments from triagers/core-devs).

AlexWaygood · April 30, 2022, 11:25pm

I don’t feel like I have a clear idea of what it means for an issue to be “fully triaged”, and even if we come up with a common definition, it’s going to be a fairly broad definition. That means I’d end up having to study an issue in detail to work at what stage of triage it is – which is basically what I do already, so I’m not sure having the label would change much.

I prefer the idea of having more informative labels, like reproduced (for bugs) or accepted (for features), which give some information about what specific triaging steps have been taken. triaged/untriaged feels too vague to be useful (from my perspective )

ezio-melotti · May 1, 2022, 12:22am

What I mean is pretty much to classify the issue by adding all the applicable labels and possibly @mentioning relevant people, or close the issue if it’s clearly invalid. Maybe unclassified is better?

This can generally be done rather quickly, even without fully understand the problem. After the issue has been classified, the untriaged label can be removed and an “expert” can pick it up from there. You may also decide to spend some more time investigating the issue yourself, looking at the code, trying to come up with solutions, and possibly even proposing a PR and fixing it.

The goal of this label is just to provide a convenient way for triagers to know which issues they should look at without having to actively chase new issues, in order to make sure that every issue gets at least seen and classified. In general we would only have a few recent issues with the untriaged label. This is assuming we need a label to ensure that issues don’t go unnoticed in the first place.

This was pretty much the role of stage field on bpo, but it was decided to remove it.

gramster · May 3, 2022, 6:16pm

Interesting timing on this discussion. I just posted this blog post over the weekend, about a system I set up for our Python tooling repos at Microsoft. I’d be happy to take feedback and enhancement requests here if you think this might be useful to you too.

hugovk · May 4, 2022, 6:16am

I’m getting “This site can’t be reached. www.grahamwheeler.com’s server IP address could not be found.
ERR_NAME_NOT_RESOLVED” for your website.

In the meantime, here’s an archive:

brettcannon · May 4, 2022, 7:35pm

I can hit it fine, so may have been transient.