Questioning necessity of having both a GitHub issue and PR

The CPython contribution guide at Lifecycle of a Pull Request currently says you should create both a GitHub Issue and a PR for any change.

As a very casual contributor to CPython, I find this workflow/requirement uncommon as far as GitHub contribution workflows go and somewhat annoying due to the overhead it inflicts. It frustrates me enough that it discourages me from contributing to CPython.

Many of my open source contributions start and end with a code change. I write detailed, self-explanatory commit messages / PRs. Most bugs/issues are going to require PRs to fix eventually. So why not just skip the project/issue tracking overhead and jump straight to the self-describing commit/diff/PR? At least that’s what goes through my mind when contributing to most open source projects. And many other projects seem to have no trouble practicing this workflow - especially those on GitHub.

Perusing through recent CPython PRs, it seems many PRs are associated with an issue filed mere moments before the PR. See e.g. Tkinter: test_getint incompatible with Tcl 9.0 · Issue #104411 · python/cpython · GitHub and gh-104411: Update test_getint for Tcl 9.0 by chrstphrchvz · Pull Request #104412 · python/cpython · GitHub (randomly selected recent example). This just-in-time behavior seemingly implies a lot of PR authors are creating issues just to fulfill the process requirement that an issue exist.

You may have noticed that on GitHub, issue and PR numbers share the same monotonically increasing integer namespace. URLs for the wrong item type redirect to the other. Using our example above, https://github.com/python/cpython/issues/104411 and https://github.com/python/cpython/pull/104412 are canonical but https://github.com/python/cpython/pull/104411 and https://github.com/python/cpython/issues/104412 redirect to them just fine.

Back when bugs.python.org was a thing and code and issues were discussed in different places, there was stronger justification for mandatory annotations to keep everything in sync. But with issues and code changes under one roof in GitHub, this justification seems weaker now.

What is the current justification for having an issue for every PR?

Is there room to allow PRs that don’t have a corresponding GitHub issue?

If separate issues (even just-in-time issues) are providing a benefit and there must be an issue for every PR, perhaps the workflow could be improved so contributors aren’t burdened with filing an issue? e.g. perhaps some bot can recognize when a PR is merged without an issue and create the issue post facto. This way we have the separate issue without imposing overhead on contributors. Seems like a win-win?

I want to underscore that this feedback is shared out of a mutual desire to improve the state of the [C]Python ecosystem. As I’ve learned from working professionally in developer productivity for 10+ years, every extra development hurdle / barrier matters and discourages contributions. In corporate settings it inhibits cross-team contributions. In open source it filters out potential contributors and undermines community diversity because of various survivorship biases (e.g. parents with limited time may contribute to a project that doesn’t impose workflow overheads).

Is it time to reconsider the necessity of GitHub issues for [C]Python contributions?

3 Likes

We do have a skip issue label that committers can apply to PRs when they feel it is appropriate, same as for skip news. It seems entirely reasonable to explain why you don’t believe an issue is necessary in the PR opening message and be willing to accept disagreement from committers and open one if asked. I usually only apply those labels for trivia or documentation fixes myself.

Yes that process means we get some trivial issues opened that may not have strictly been necessary. But I don’t consider that a bad thing.

This may be all around the tone of our automated messaging in the CI and bot workflows on a PR where we say the Issue must be filed.

5 Likes

added background:

CPython requires the freedom for our project to move to other infrastructure when Github inevitably stops meeting our needs (yes, really!) - as all past infrastructure has. Issues are something that’d be migrated to an equivalent new destination (issue trackers have stood the test of time). Issues are where you tie things together when someone proposes a PR that isn’t appropriate and gets rejected in favor of another approach, or requires a series of changes. It isn’t knowable up front for many things if that will happen. Issues are also where you record decisions and reasoning.

PRs are not intended to be used for that - they’re single-owner fork branch driven hope as a strategy change proposals rather than broader discussion centralization points - when anything goes beyond what makes sense in a PR it can make a giant mess to attempt to untangle and follow coherently. An issue clearly needs opening by that time. Unfortunately the information you’d want in said issue has already been scattered around PRs at that point given no centralizing issue existed. people working on it will thus be :confused:.

The conflation of PRs and Issues is one of many of github’s UX problems/features. Sometimes you want separate, sometimes you want both merged into one. As soon as you stop using issues, people need to learn to change how they search for bugs as they now have a lot more noise to trawl through than just the issue tracker results.

I’m not averse to the idea of more PR-only things. But I don’t want us to just blindly remove the issue requirement. We want them when they’re useful to track important changes. It’d be a huge loss to gravitate towards PR-only becoming dominant.

10 Likes

I’m aware of skip issue and skip news. My limited experience has taught me that reviewers seem reluctant to allow their [ab]use and both these are de facto required / enforced at both the machine and human level.

I’m not sure if a git log | grep is sufficient to find all occurrences of skip issue and skip news. But if it is, these features are so rarely used that they might as well not exist.

You can find “skip issue” if you look in git log for commits without a “gh-” prefix. There are two just from the last few days, 1be80ed107deb15b926f2794b8e6a7a527b8b84c and 2f7b5e458e9189fa1ffd44339848aa1e52add3fa (both small documentation fixes).

I hear where you are coming from and I can’t really take issue with much of it. I was saying much the same things when I was a maintainer of the Firefox build system and supporting VCS at Mozilla!

I think we violently agree on the importance of having a forensic log of changes and sovereignty over that critical data and decision log.

My take on this is that if this is your end goal, the best place to capture this context is in the [Git] commit message. Having a distributed VCS, that text will be automatically backed up on thousands of machines. I view the commit messages as the primary data store for important context because it is the most resilient data store. The code review / issue tracker activity is just a fallback in case the commit message is insufficient.

Unfortunately, I’m hard pressed to name projects outside the Linux kernel that come close to achieving this. Humans are busy/lazy. Many of us aren’t good writers. There will inevitably be important details not captured in commit messages. Capturing additional context after code review starts requires amending commits, which Git and GitHub are both notoriously bad at. What I’m trying to say is while striving for capturing as much context in commit messages is a noble goal, you absolutely have to assume people won’t do it and you need to fall back to code review, issue trackers, etc to supplement missing data. So we’re back to agreeing :slight_smile:

We already have a NEWS requirement. And the merged commit message links back to the PR / code review, plus whatever other references have occurred on GitHub. So I’m seeing 2 other mechanisms for tracking important changes. What’s so special about issues to justify their existence as a 3rd mechanism for ~all PRs? Put another way, what exactly are you losing if a bunch of just-in-time issues stop ceasing to exist?

I totally grok and advocate that some issues need to exist. And I agree GitHub’s current conflation of issues and PRs is as confusing as it is a feature. But I also feel that designing a workflow around a future world that we have no indication of coming to fruition is premature optimization. Yes, hedge against the risk of data loss if you change platforms. But optimize your workflows for the tools you use now. And right now, with GitHub issues and PRs being very similar, I’m just not seeing justification to de facto requiring 2 for every change.

I’m pretty sure this came up briefly when we migrated to GitHub. I don’t think anyone had a strong argument in favour of changing the workflow.

If the issue requirement is causing a significant enough burden to make someone stop contributing, their contributions must be tiny. We don’t actively try to encourage more of those.

Having issues and PRs be separate means that you can close a PR, without losing the record of the issue that the PR attempted to fix. In projects that let you conflate PRs and Issues, its common for issues to get lost in the shuffle when PRs get closed for a variety of reasons that aren’t “the issue has been solved”.

6 Likes

Personally, as the newest core dev, I also questioned this duality when I started contributing myself not too long ago, right around the time of the BPO → GH transition. However, as a contributor I quickly grew to not mind it, and then appreciate it, as there was little to no duplication of effort once understood and executed as intended—the issue was for explaining and discussing the motivating need for a change (bug, enhancement, feature, task, etc) and the overall abstract design of the change to best address this, while the PR is for describing and reviewing the details of a specific implementation. This has helped me better think through a change and ensure I’m describing both of these aspects to the extent necessary, even if (as I very often do) submit both at the same time.

Core devs certainly vary in how much detail they put into issue vs. PR descriptions, with some (like me) including a lot of detail in both, vs. some keeping the PR description cursory or even non-existent and letting the implementation and commit messages speak for itself. However, I’m not sure eliminating the issue requirement would really improve anything at either end of the spectrum. And like you mention, for commit messages it’s even more hit or miss among core devs, and unlike on issues or PRs its a lot harder to maintain a consistent standard there, as it only gets finalized at merge time with GitHub’s workflow.

One final important note is that one issue can be linked to many PRs. This means one issue can easily serve as the central proposal and discussion point for numerous small individual changes, that would otherwise not easily be connected and discussed as a unit, and also helps avoid many to most cases where creating an issue for each PR would be more a nuisance than a benefit. See, for example,

Just to note, triagers also can, and in clear cut/well established cases should (as I did for such, at least for other people’s PRs), apply this label too. As mentioned in the devguide, this should be used:

for trivial changes (such as typo fixes, comment changes, and section rephrases) that don’t require a corresponding issue.

This is most common for small to trivial docs fixes.

Or you can simply check the list of PRs with the label.

The NEWS requirement depends upon the issue requirement, as the changelog tooling (blurb, the Sphinx extension, etc) all requires an issue number for reference and to link back to.

3 Likes

As a triager, I only apply “skip news” labels to PRs with nonfunctional doc/test changes.

The issue tells others what you want and why it is desired. In the meanwhile, PR tells others how you are going to do it.

I think they tell things in 2 respects. Also, there can be multiple PRs to solve one issue.

This “problem-solution” abstraction is very natural. Every solution needs a problem, just like every PR needs an issue.

1 Like

The issue also ties together backport PRs, and any follow-up PRs in case an additional flaw is found after merging.

3 Likes

One other factor here is that we use squash commits. While it’s certainly true that in an ideal world we would ensure that the commit message of the squash commit was a well written combination of the individual commit messages of the individual commits in the merge, in reality it’s likely that people either let the bots do it (so the result is presumably just the individual messages tacked together, including admin commits like bot comments) or create a simple summary of the change, losing the narrative. Furthermore, commit messages written per-commit don’t always make sense when part of a squashed commit.

Having the narrative in an issue avoids that problem.

I couldn’t find any guidance in the devguide on what to do with the commit message when merging.

1 Like

Not sure how closely these are followed, but there’s guidance here on making good commit messages in the first place:

On commit messages when merging PRs:

And commit messages for reverting PRs:

3 Likes

I don’t follow core CPython development closely to comment on whether the requirement is appropriate for CPython specifically, but another project I contribute to (Apache Airflow) does allow conflating issues and PRs. While this is indeed convenient a lot of the time, the downside is exactly what Donald mentioned; history gets lost if you are not careful, everyone gets sloppy some of the time, and for a long-running project with a lot of contributions (i.o.w. it’s not possible for any single core maintainer to look at every PRs) some things eventually get lost.

Another thing I want to comment on. A significant amount of contributors we get actually feel oblighed to open an issue to accompany their PR, that I don’t think this practice is uncommon as far as GitHub contribution workflows go as described in the top post.

1 Like

Thanks. I found the first and last sections, but missed the one for merging. In particular:

In either case, adjust and clean up the commit message.
…
The bad example contains bullet points that are a direct effect of the PR life cycle, while being irrelevant to the final change.

That’s the problem I was trying to express, if there’s “detailed, self-explanatory commit messages”, rewriting those to avoid them having a flow that’s based on the PR life cycle is a lot of work, and not something that I imagine most people will do when merging a PR with extensive commit messages. It’s certainly not something I’d normally consider doing, I’d prefer to replace them with a message that was a summary of what was implemented, rather than how.

But this is a secondary point, TBH. The main issue, I agree, is when a PR gets closed unmerged, or replaced with an alternative implementation - the discussion on the original PR gets lost.

1 Like

It’s actually human merges that create unhelpful overlong messages, because github pre-fills the commit message with all concatenated commit messages, which is often overlooked and not edited.
Browser extensions like Refined Github help mitigate this.

miss-islington used to take the PR first post for commit message, which could be edited when needed at any time before the merge. I am not sure what the built-in automerge does.

The builtin automerge lets you write your own commit message. For commit messages, the interface is identical to the interface you get when doing a regular merge

2 Likes

Ah, sorry - I had it backwards. I typically just delete the concatenated message and replace it with a summary of what the change does (often just the title of the PR), precisely because editing the concatenation into something meaningful is way too hard. I haven’t looked closely at auto-merge yet, so I’m not sure what I’d do in that situation. Probably something similar, if not exactly the same.

1 Like

Yup, IMO committers should be taking the (relatively modest) time to edit commit messages in whatever fashion suits the PR and their workflow—people certainly have different preferences, but honestly any of the approaches mentioned here are strictly superior to not modifying any of the default messages at all.

Right, but that could also contain a ton of information (including reviewer notes, graphics, examples, questions, checklists, pre-filled templates, etc.) that is not relevant or is too detailed for the to the final merge, and leads to a huge wall of text in the commit log, if similar care is not taken—particularly given the guidance in this thread to avoid duplication between issues and PRs.

So more potential benefit than by default in most cases if committers don’t edit them, but also more potential harm, and much easier (especially with GHR) to copy/paste from the PR description when desired than try to reconstruct the commit message list if useful and authors have actually made well-organized, atomic commits with meaningful descriptive messages.

For others’ reference, the most recent discussion on this is Change the default commit message to "Pull request title and description." · Issue #501 · python/core-workflow · GitHub .

If you prefer a clean slate (to either copy/paste from the PR description or write your own), as @merwok mentioned, GitHub Refined has a feature (enabled by default) that does exactly this, clearing out the default message and only leaving a single copy of the important trailers like Co-authored-by and similar.