Confused about multiple independent PRs

smontanaro · June 28, 2023, 6:38pm

On June 26, I filed a bug report and opened a PR about it (issue: 106123 & associated PR: 106124). It appears to have been merged, closed, etc by @hauntsaninja. So far, so good.

Today, I discovered what I believe to be a minor bug in the check-clean-src Makefile target (backslash in a path where a forward slash should be). I opened an issue (106208), fixed it in my fork, pushed, synced with upstream, then opened a new PR (106215).

Looking at the changed files, it shows the file changes from the previous PR are part of the new PR. If @hauntsaninja already merged my previous change and I synced with GitHub - python/cpython: The Python programming language, why do the changes from that earlier PR show as part of the current PR?

jeanas · June 28, 2023, 6:46pm

This is how it appears for me:

I don’t see what’s wrong.

hauntsaninja · June 28, 2023, 7:13pm

Your PR also appears to currently be in a good state to me.

It looks like you committed both sets of changes to main branch in your fork, so if you opened a PR without having first fetching latest upstream’s main and merging it in, you’d see the diff on Github. I see you merged upstream’s main in after opening the PR and this probably fixed things.

The diff on Github is between your branch and the merge base with the target branch, so it might not exactly show what would be applied to the target branch in cases where the target branch has parallel sets of changes.

One workflow that can avoid this kind of issue is to just always use a new branch for a new PR, instead of committing to main on your fork.

smontanaro · June 28, 2023, 7:16pm

Looking here:

I see four commits:

The first two relate to the previous PR. That’s what’s confusing me.

CAM-Gerlach · June 28, 2023, 7:25pm

The key bit here is you also committed your previous commits on your previous PR to your fork’s main branch as well, instead of a feature branch, and then made more commits on top of those commits in your current PR. Those commits don’t exist on the “real” upstream main, because the CPython repo uses squash-merging rather than “true” merging, meaning your changes were actually committed to upstream main as a single squashed commit rather than your commits being merged with it.

It’s essentially the same problem as if you’d created a feature branch for that previous PR (as you should have), that got merged, and then you made more commits to that same previous feature branch and made a new PR off that, rather than creating a new clean feature branch off upstream main first.

If you’d like, I can explain in more detail step by step, but I figured I’d hold off the full accident report unless you needed it, haha.

It’s actually not a big problem for actually merging your PR, as the squashed commit will only contain the 3-way merge diff listed in the Files tab, though it is potentially confusing to reviewers (and the old commit messages could potentially end up in the squashed commit description, if the committer isn’t being appropriately careful about it, which unfortunately many are not).

The main takeway, though, is before starting to work on a new topic (==issue/PR), always do (assuming the remote upstream is the upstream python/cpython repo):

git fetch upstream
git switch -c TOPIC-BRANCH-NAME upstream/main

which will create a new feature branch off the latest clean upstream main.

And be sure to avoid ever committing to main, or to an old feature branch used for a different PR.

smontanaro · June 28, 2023, 7:37pm

For tiny changes like these two, this at least doubles the amount of work necessary. You might argue that it doubles only a small amount of work, but still… If I was to make a big change, I agree, a feature branch makes sense. For a one-line change to a Makefile or configure script as these were, the extra effort seems counterproductive.

CAM-Gerlach · June 28, 2023, 7:38pm

Once this PR is merged, you’ll also probably want to fix your main branch for the future. To do that (again assuming origin is your fork and upstream is python/cpython, per standard convention):

git fetch --all
git switch main
git reset --hard upstream/main
git push -f origin main

You can also fix the proximate issue of the old commits immediately (and the merge commit), though you’ll still need to do the above once the PR is merged (since your current PR branch commits will still be on main, and you can’t change a PR’s source branch on GitHub):

git fetch --all
git switch main
git reset --hard upstream/main
git cherry-pick 0e8f8ac0a789947add62c19469930ca8644a2b4a
git push -f origin main

This just cherry-picks the one commit you want onto your branch; you could also do this with rebase but this solution is conceptually simpler also does away with that pesky merge commit (which might cause problems with the rebase).

CAM-Gerlach · June 28, 2023, 7:55pm

I don’t follow, sorry. Of the options that don’t cause this problem (which if not dealt will continue to get progressively worse, with more old commits and eventually merge conflicts if not dealt with), creating a new branch involves the least effort, and not doing so costs you far more effort and trouble than it saves. You just need to run one line when starting a new PR:

git switch -c BRANCH-NAME

By contrast, if you don’t run the above to create a branch and instead commit directly to your main for a PR, you’ll need to run these 3-4 lines after merging that PR and before creating any new PR based on main:

git fetch upstream
git switch main
git reset --hard upstream/main
git push --force origin main

And furthermore, if your main is contaminated, if you do want to create any new branch (as you necessarily must if you have more than 1 PR in flight at a time), you’ll need to make sure you always do

git fetch upstream
git switch -c BRANCH-NAME upstream/main

jeanas · June 28, 2023, 8:02pm

Not sure what your level of experience with Git is, but my experience is that it is mostly people who have used other VCS systems before who tend to think of creating a branch as “A Big Deal ™”. In Git, it’s really not. Creating a new branch for each PR is a no-brainer.

smontanaro · June 28, 2023, 8:09pm

OKay, I’ll take your word for it. Git will always be a mystery to me beyond the simple forms of pull/commit/stash/push sorts of commands.

CAM-Gerlach · June 28, 2023, 8:52pm

Here is the basic intro to git branching, aimed at users of other non-branch-based VCSes (I’m assuming you’re probably most familiar with CVS/SVN, based on when you were a CPython core dev).

To make this easier, you can add a git alias. I’ve created and tested one for you that will do a bunch of useful stuff for you to remove all the overhead of creating and pushing a new branch correctly. It syncs the latest remote changes, create a new branch off the latest clean upstream, switches to it and sets it up to automatically pull to and pull from a branch of the same name on your fork.

Here’s the command to add it (assuming you’re using some sort of bash-like shell, including the default Git Bash on Windows, as I use):

git config --global alias.new '!git fetch --all && git switch --no-track -c $1 upstream/main && git push -u origin'

You can call it with:

git new BRANCH-NAME

You can name it anything you want by changing alias.new in the above to, e.g. alias.n, which would be called by git n BRANCH-NAME.

Using, say, the issue number as the BRANCH-NAME (though I suggest making it more descriptive), for a one-commit change your workflow is now basically as easy as committing directly to main and pushing that (much more so, if you count pulling/merging the current upstream main first to make your changes off).

git n gh-106215
# Make your changes
git -a -m "Your commit message"
git push

stoneleaf · June 28, 2023, 9:08pm

Wow, thanks!

Any chance you can update the PR alias in the docs? It’s great for getting the PR on my machine, but it doesn’t set up the return path and hurts everytime I have to figure it out.

jeanas · June 28, 2023, 9:19pm

Do you mean this?

I would recommend just using gh pr checkout anyway.

stoneleaf · June 28, 2023, 9:53pm

I don’t use the GitHub CLI nor hub tools.

smontanaro · June 28, 2023, 10:38pm

I actually use git a bunch, but nearly always for personal, one-person projects (sometimes stuff that’s never find its way to a central server). I get branching, committing, pushing, etc, but it’s rare that I need to coordinate with other people, let alone on projects as vast as CPython.

CAM-Gerlach · June 29, 2023, 1:52am

I actually have a git pr alias much like that one that I used to use years ago, but later moved to using hub basically for that one feature plus hub sync, which replaces another alias I had cooked up of the same name, which essentially remains the case today (I haven’t switched to gh, nor do really use it for much else).

Unfortunately, as far as I’m aware its not possible with a git pr NNN alias, nor is something like it implemented in any of the snippets I’d seen, since what these all do is pull down the appropriate ref from the upstream under refs/pulls/$number. To push back to the source branch, AFAIK you’d need to know the source fork and branch, for which you’d need to do a call to the GitHub API (which is what hub and gh do, presumably).

The best I can cook up with is a command to check out a branch from a GitHub fork given the fork username and source branch name, as specified in the PR header:

git config --global alias.try '!REPONAME=$(basename "$(git rev-parse --show-toplevel)") && git fetch https://github.com/$1/$REPONAME.git $2 && git switch -c $2 FETCH_HEAD && git config branch.$2.remote https://github.com/$1/$REPONAME.git && git config branch.$2.pushRemote git@github.com:$1/$REPONAME.git && git config branch.$2.merge refs/heads/$2 && :'

which is used as follows (assuming the PR was stated to be from CAM-Gerlach:test-branch-4):

git try CAM-Gerlach test-branch-4

You can then both pull and push with the normal git pull and git push, and this will work equally well with branches that are PR branches, or those that are not (though you may not have push access, of course). Above I set things up to pull via HTTPS (so it always works for public repos) and push via SSH, but you could potentially configure it to use HTTPS to push with some tweaking.