After a few discussions on Discord and the language summit, I wanted to summarize my thoughts in this post, trying to break down the problems, listing (some of) them, and suggest possible solutions (some of which are already known).
One of the problems is making sure that people with the right set of skills become aware of specific issues. Without these people – which are mostly core developers – issues/PRs can not be moved forward and risk becoming stale.
There are many possible solutions that vary depending on the person:
- triager @mentions
- label subscriptions
- custom email filters
- review requests (for PRs)
- related issues/GitHub projects
- reading activity summaries
- reading activity on Discord/IRC
- search queries on the repo
- active triaging
Some of these require people to take the initiative (e.g. searching through issues), while others don’t (being @mentioned).
One of the goals is trying to filter the signal (interesting issues that the dev can help move forward) from the noise (issues that the dev can’t contribute on).
Suggestion: since the definition of what qualifies as signal and noise are highly subjective, we should provide a wide variety of options so that everyone can pick the one(s) that work better for them. These should be listed and documented clearly, explaining how to use them more effectively. We should also add new options, possibly including:
- the weekly summary report (being worked on)
- custom/personalized dashboards
- GitHub projects that group related issues
- GitHub teams to easily notify experts
Bonus post - Noisy Monitors - by Sam Schillace was brought up, and while it has some valid points (e.g signal vs noise, addressed above), I’m not sure others apply to us. In particular, in open source projects, the priority is generally dictated by the whim of the developers, rather than by specific goals set by a project manager that wants to ship certain features by a certain deadline and needs to have a clear view of who will work on what.
In open source, anyone can decide to fix any issue at any time, regardless of how old or obscure the issue is.
I’m not convinced that having a lot of open issues is actually a big problem by itself – it’s mostly a symptom. The real problem is our inability to keep up with the ever-growing number of issues reported and PRs created.
Even if we had an order of magnitude less open issues (700 instead of 7000), we would still have over 30 pages of issues on GitHub. This might be a problem while searching/filtering, but it can usually be solved by a more specific search query.
The number of PRs is less of an issue by itself, because their discoverability is generally tied to the discoverability of the issues that link to them.
In addition, we should avoid falling victim to Goodhart’s law (“When a measure becomes a target, it ceases to be a good measure.”) and ending up closing issue/PRs just to keep the numbers low.
Suggestion: we should try to focus on the problems, rather than symptoms (the number of issues).
Issues should be closed if they are invalid, and generally kept open indefinitely otherwise, even if they are old.
Note that issues might become invalid if the issue got fixed elsewhere, if a better solution is now available, if the module has been deprecated, etc. Feature requests that are not well-specified or require more discussions might also be considered invalid (see “Determining the validity of a feature request” below).
If an old issue is closed, users that run into the same problem and only check among the open issues might risk re-reporting it, creating a duplicate and losing all the previous discussions and linked PRs.
Suggestion: leave old issues open, unless they have become invalid in the meanwhile. Possibly revisit issues at regular intervals to make sure they are still valid (see “Dealing with stale issues” below).
When it’s not obvious if an issue is valid or not, a triager/dev can request more information or the opinion of someone else in order to determine the validity of the issue.
Suggestion: in this situation the issue could be set as
pending by a human and automatically closed after a certain amount of time if the information/opinion is not provided. In all other situations, issues should not be closed automatically (see “Dealing with stale issues” below).
If the contributor didn’t sign the CLA, their PR can’t be merged, so it’s safe to close it.
Suggestion: in this situation the PR could be automatically set as
pending and automatically closed after a certain amount of time. The same could be done if the author doesn’t reply to review requests.
Since PRs are linked to issues, as long as the issue is kept open it’s not too difficult to find related PRs, even though they got closed/rejected. This means we might decide to automatically close PRs if the OP didn’t reply to a review request or in other situations.
Sometimes new issues go unnoticed, and if they aren’t triaged they are even less likely to be noticed.
It was discussed and proposed to add an
new) label, but eventually we didn’t add it. One reason was that contributors can’t set labels, so if an issue has no labels, it’s probably untriaged. Now that we added issue templates, some labels are applied automatically, making this distinction impossible.
Suggestion: mark new issues with an
untriaged label. If the issue needs more triaging, the label can be left or reapplied. The label should be added automatically on new issues.
New PRs might mention the issue they address and this creates a link in the issue timeline, but no email notification.
Suggestion: automatically add a message to the issue whenever a new PR is created and/or @mention author/commenters in the PR.
Even if they get noticed and triaged, sometimes issues and PRs get no replies. The weekly summary report used to list new issues with no replies to prevent this.
Suggestion: create a report with a list of issues/PRs that haven’t been noticed/triaged/replied to. This can be done in the new weekly summary, and also as a documented search query (possibly with a direct link).
Sometimes issues/PRs just “fall through the cracks” and are forgotten, even after some initial activity. The fact that the default issue view on GitHub is sorted by creation date doesn’t help. Having a mechanism to “ping” issues might be useful, but might also end up being too noisy and being ignored. There is no solution that works for everyone.
Suggestions: use the
pending labels as described in this core-workflow issue. Document useful search queries (e.g. issues where you commented or PRs you reviewed that have been inactive for over a month).
Unlike bugs, feature requests are more tricky to evaluate:
- Unrealistic requests (e.g. adding braces to the language) can be rejected directly.
- Major requests (e.g. adding a new module) should generally be proposed as a PEP and/or discussed on python-ideas/dev first.
- Minor requests can live in the issue tracker if they seem reasonable.
The line between these is not very well defined, so the triager will have to take a somewhat subjective decision.
Suggestion: clearly document the guidelines so that both reporters and triagers know what is likely to be accepted. Triagers can also link to the guidelines while rejecting feature requests. Changing the Python Language - Python Developer's Guide is somewhat related.
There are common reasons to reject issues and PRs. Currently the triager has to spend time writing and sometimes arguing with the author. Having a list of explanations will make this simpler.
Suggestion: create a FAQ-like page with explanations for common case (invalid bug, invalid feature-request, third-party issue, CLA not signed, can’t reproduce issue/not enough info, etc.). This could be done for commonly reported issues/feature requests (e.g. Programming FAQ — Python 3.10.4 documentation).
Both “invalid” feature request and unactionable bug reports waste triagers’ time, since they have to either close the issue and explain why, or request more information. Less invalid reports, means less work for the triagers.
Suggestion: Improve issue templates with clearer instructions, and possibly forms that are easy to fill. This is supported by GitHub, and might also help us with automatic labelling.
This is a bigger issue that was investigated in the past, but I don’t remember anything conclusive.
If we can identify:
- what drives people to contribute to CPython
- what problems they met before becoming part of the team
- what problems they are meeting now (while triaging/reviewing/fixing/etc.)
- what might drive/have driven them away from the project
we can then address these friction points and help getting and retaining triagers and devs.
Suggestion: ask the team (through a poll or in a thread), identify the problems, and discuss/address them. The same could be applied separately to outside contributors too.
Every time we change our workflow or introduce/replace tools, we lose a number of contributors that either are not keen to learn them or they are simply left behind because they don’t know how or didn’t bother updating their setup.
GitHub also provides some features that might replace some of the previous tools and workflows, such as label subscriptions,
CODEOWNERS, teams, use projects, milestones, etc., but not many people are aware of them and know how to use them
Suggestion: We should clearly document migrations and explain how to use new tools, especially when they replace one that was used previously. We should also encourage devs to subscribe to labels and update the
CODEOWNERS file, and possibly look into other features.
One way to solve this is by streamlining and automating the workflow so that devs can use their time more efficiently and contributors can learn faster and then join the team. Some of the previous items include more specific suggestions related to this.
Some devs might also be willing to contribute more time if their feedback is requested explicitly and if obstacles are removed, but they might not want to spend time actively looking for issues that they might be able to fix.
Suggestion: invest more resources into improving the infrastructure and the workflow, including bots, tools, GitHub actions, CI, documentation, etc. Hiring more core devs from the team might be an option too.