In a case like this, it would be good if you indicate on the issue that you intend to work on it. This would make it one of the exceptions for closing (that is - it would be clear that there is interest in the issue, and who it is that it interested in it). At the very least you could subscribe to issues you are interested in so that you get a heads up and can respond when someone decides it will never happen.
If you don’t agree that a large pile of low value issues is a problem, then we are unlikely to come to an agreement here. If there is no cost to leaving issues open, then of course we should leave them open on the off-chance that someone will do something with them. But I don’t believe it is true that there is no cost, so I think we need some kind of balance.
When I began inheriting the Flask project and other related projects, the six repositories had well over a thousand open issues and PRs between them. It was paralyzing. It was hard to find duplicate discussions, or to know if closing one issue could close others. It was hard to know if I could merge a PR, because there were years of other PRs that might have conflicts in their implementation or purpose. It was hard to have a sense of what I should close, because why should I close this issue when all these remained open.
At my first PyCon, I ran a sprint, and I struggled to find things I was confident in having people work on. It was just as likely they would work on something only to find that it didn’t make sense, had already been done, or was no longer valid.
I’ve worked for six years to cut down that backlog. There are now between 60-100 open issues across six repositories. It has been the greatest thing I’ve ever done to help myself as a maintainer. I’m confident that what is open are things I really want for the projects. I’m confident in assigning tasks to new contributors at sprints. I’m confident at coming up with new tasks and knowing where I can take the projects next, since there’s not 10 years of history piled up to think about first.
Yeah, it’s going to take a long time, and anyone starting now will take a long time to see a visible dent when there’s 7.1k open issues. But if you don’t work at it, it won’t change. If you do start, you’ll see a change eventually, and things will progressively become easier to track. As a maintainer, I highly recommend closing your backlog one way or another.
Personally I prefer to keep old valid issues or requests open, regardless of activity, simply because of having the “bucket” of things to look through to do.
I have found it also helps avoid duplicates, as many reporters do search open issues before posting a new one (despite the fact that some post without searching - nothing we can do about them).
However, in this case, I believe GitHub Issues is woefully under-featured for this many issues, and we should close as many as possible. A decent issue tracker lets you manage more issues than can fit in your head, but GitHub is not (currently) one of them. So as David says, closing as many as we can in order to get it down to the ~100 issues that GitHub can reasonably represent is the best way forward (okay, we’re a big project, we’ll probably only get to ~500 )
We can argue about the best way to do it (personally I think getting every issue under a given category with an assigned expert who can make the call is the best way), but let’s at least agree that it’s not going to work to have a bottomless list in our current tool.
This might have the opposite effect though. For example Generator-based HTMLParser · Issue #61612 · python/cpython · GitHub is a 9 years-old feature request. The proposal is reasonable, but it’s low on my priority list (even though it came up again just the other day). If I assign it to myself, people might avoid working on it, since they might interpret that as “someone is already going to work on it”. I left it unassigned to indicate that whoever can pick it up and create a PR (which I might then assign to myself for review and merge).
For the problem at hand, this might be useful as a way for core-devs to indicate some “high-priority” issues that contributors could pick up. It might also be useful for RMs and core-devs themselves for tracking issues/PRs, but otherwise it might just add overhead. For most issues, if they miss a release they will just be included in the next.
They can have additional custom labels (that can indicate how difficult they are, if they are available, how long it’s going to take to fix them, and other useful things)
On bpo we had both priority and severity. I’m not sure if severity was ever used (it’s in Roundup by default and at some point it was removed), but 95.7% of the open issues had “normal” priority. Because of this, only release blocker and deferred blocker were kept (and also added to a project). In an open source project, the importance of an issue doesn’t matter too much if there is no one willing to work on it, and we can’t force people to work on issues.
I don’t think that’s achievable unless you start closing valid issues too just to keep the number low. After a certain threshold the list is big enough that you are never going to look at it as a whole, so IMHO it doesn’t matter too much if you have 500 or 5000.
Just yesterday I noticed that the list of PRs of the devguide repo was small enough to fit in a single page, and at that point I decided to go through them, I closed a couple, and triaged a few others (one of which was then merged by Zach).
For cpython I never go through the whole list, and I’m either looking at a specific subset (narrowed down by a search query), or hopping from issue to issue following links. For example, the list of all open HTMLParser issues fits nicely in a single page and I can easily wrap my head around it while ignoring the other 500 or 5000 issues.
Maybe one of the problems is that contributors just start wading through an endless list of issues because they don’t know what to look for. Rather than solving this by reducing the total number of issues, we solve it by creating more manageable subsets (through labels, projects, suggested search queries, etc.).
My projects have separate issue trackers, but they’re all interrelated, just like the parts of cpython. You can evaluate groups of issues rather than considering the whole tracker at once, that’s a good way to start.
Don’t fall into the trap of trying to decide if every issue is “valid” or “reasonable” because almost every issue is reasonable to some degree to someone. You can expend endless energy trying to justify keeping something open, or struggling to write a “satisfying” reason for the close. That slowed me down a lot in the beginning, until I became more sure of my goals. If something is truly important, a new issue can be open to have a fresh and active discussion.
Related to that, the next best thing I did was to lock closed issues after 2 weeks. This seriously cut down on notifications by about half. If a user really needs to report something new about something old, they can open a new issue referencing the old one. I can evaluate the new issue based on current standards for issue reporting, know that someone was actively interested in the outcome, and can potentially get them to contribute the fix. “Happy to review a PR” has been a surprisingly successful comment on new issues.
I think this is a very interesting observation. Looking at the expert-labels, each of those subprojects have some hundreds issues attached. Isolated, those numbers are not very frightening. For sqlite3, there’s ~100 issues (IIRC); that’s a couple of pages to navigate through.
Perhaps what is needed most is categorising issues (and I’m also very keen on using GitHub Projects to manage sub-projects, as Ezio has been talking about).
Oh, another observation; a large piece of the issue cake is doc-labelled issues It would be interesting to see if we could make that particular subset more manageable by using GitHub Projects.
I tried looking into the nature of issues labeled ‘docs’ a bit. I didn’t have time to get to the bottom of it, but I noticed that sometimes a legitimate code issue is also given the ‘docs’ label in order to remind the (eventual) implementer[s] that docs are also needed. This would inflate the number of docs issues, and reduce the value of the label.
Maybe we shouldn’t add a docs label until there is at least a PR under review for the code? And maybe the docs label could be split into several different sub-labels, e.g. docs-tooling, docs-bugs (actually broken docs), docs-incomplete (e.g. a function is documented but not all its arguments are, or ditto class/method, or module/class), docs-missing (an undocumented module or perhaps class), docs-structure (e.g. a library reference section contains excessive tutorial text), docs-reminder or docs-needed (for the aforementioned reminder to update the docs), perhaps I’ve missed some or I’m going overboard?
(We might be getting into new thread territory re dealing with docs issues. Regardless…)
For PRs, I’d suggest at the very least docs-only (PR only touches .rst files in Doc/, can be applied automatically by a bot) and docs-typo (intentionally chosen over ‘minor’, this would apply to literal typos or very small grammar fixes - this would be applied manually by triagers or core developers on a judgement basis).
For issues, Guido’s list seems reasonable, although I don’t know which is the better strategy of starting with a smaller collection of labels and expanding, or starting with a larger set — both in terms of going forwards and recategorising currently open issues.
@admins@moderators Could you please execute a thread-split starting with @AA-Turner 's comment #27 with all following comments except @EpicWink 's comment 31 immediately above this one and my comment following this one (if its posted by then)? Suggested title something like “Labeling of documentation-related issues” (we can change it if needed) in the #documentation category.
At least on other projects, I’ve found release-based milestones to be extremely helpful in other projects, particularly as a way to quickly tell at a glance what issues/PRs have already landed or will land in each release without digging through the changelogs of each one, being able to quickly list all of them, and making searching and browsing much faster and more effective. Projects could be used to categorize issues by other cross-cutting themes and further improve discoverability.
It does add some measure of overhead and a degree of at least semi-automation would be needed, though projects could help with that. Considering the number of other tags and metadata that CPython issues (BPO and GH) already have, the cost seems relatively small relative to the potential long-erm benefit. But there may be reasons it doesn’t make sense for CPython.
One potential distinction here is that CPython doesn’t have a single all-powerful maintainer with a single monolithic set of goals, it has over a hundred of them all with their own unique perspectives, priorities, vision and values, which are on display here in this very thread. Sure, there’s the SC, but they typically only step in in relatively extra-ordinary cases, and certainly are not going to be making a decision on each of 5k+ open issues. Even back when Python had a BDFL, the de-facto process was, as I understand it, still mostly decentralized and consensus-based, with a set defined process for making major decisions.
Considering that even proposals to do something like this after 6 months or a year have not been uncontroversial, a timeline anywhere near this aggressive seems very unlikely, but something much longer could potentially be acceptable. However, having issues be locked after closing them and forcing users/devs to open new ones to continue the discussion adds more friction and implications to closing an issue, more work to keep everything linked together, creates more issues overall and increases the friction of actually solving the issue, so it seems to be at cross purposes to the original motivating problem here, of too many issues being open.
Hi, I’m not part of the team but I was following the conversation out of interest.
I was expecting this topic to be fairly uncontroversial, and only focus on the fine details. After a private conversation, I suspect there might be some misunderstandings at play that are making an agreement harder to reach. To summarize:
The proposal is not about closing issues more aggressively than before
It’s acknowledged that old feature requests are not the only source of noise
The goal is to document a shared understanding of what is obviously noise to everyone
This shared understanding serves a pragmatical purpose to reduce friction and emotional work, so it ought to be translatable into action
Specifically, the goal is not to reduce the scope of what is a valid issue, but to let the existing agreement emerge
The value of the proposal is not only the immediately actionable, but the principles that could emerge in the discussion, because they can unlock improvements in the issue tracker workflows
The details of the improvements are probably out of scope, but some folks are already working on this
Happy to rectify in case I misrepresented anyone’s thoughts.
I’m leaving only the tl;dr here, but I had originally written some excessively verbose comments that can be skipped because I don’t want to hijack the conversation.
I’d suggest to add a comment to the ticket saying exactly what you expect. It’s better to be explicit about such expectations rather than implicit (as usual in Python ).
The approach to try to put everything into tags or other mechanisms helps with discoverability, but it doesn’t help with improving the communication between people submitting PRs and core devs who can actually merge those PRs.
The latter can only be improved by actually putting comments on tickets and showing that real people are communicating, rather than bots.
Agreed. Tags often don’t communicate very well, especially not to those new to the community (we’ve seen even Triagers confused by tags). Write a sentence! Use GitHub’s “canned responses” feature if you tend to write the same over and over.