Policy to revert commits on buildbot failure


(Pablo Galindo Salgado) #1

I know it was decided some time ago that commits should be reverted when the buildbots fail to build the commit and there is not a quick fix but I am not sure how much time should we wait to see if the original author or another person fixes the issue. As an example, today this commit broke almost all the Windows buildbots:

https://buildbot.python.org/all/#/builders/32/builds/1707
https://buildbot.python.org/all/#/builders/113/builds/733
https://buildbot.python.org/all/#/builders/40/builds/1137
https://buildbot.python.org/all/#/builders/130/builds/420
https://buildbot.python.org/all/#/builders/3/builds/1634

I am not sure if I can fix this today, so my question is: how much time should I wait until we need to revert that commit?


(Brett Cannon) #2

I say as soon as possible, else new PRs and any that get updated will also start failing and there ends up being a cascading effect.


(Pablo Galindo Salgado) #3

Ok, I will dedicate one hour to try to reproduce in my Windows VM and if I fail I will revert.


(Victor Stinner) #4

I proposed a policy last May:
https://pythondev.readthedocs.io/ci.html#revert-on-fail

“if a change introduces a regression and I’m unable to fix it quickly (say in less than 2 hours and the author isn’t available), I will simply revert the change.”


(Pablo Galindo Salgado) #5

That was my first link in the original comment :wink:


(Victor Stinner) #6

(Oops, sorry :slight_smile: So the answer to your question is in your first line, 2 hours, no? :-))

When I proposed 2 hours, I had no idea if it’s a good value. Maybe 1 day would be safer since it’s uncommon that core developers are available all the day?

I say as soon as possible, else new PRs and any that get updated will also start failing and there ends up being a cascading effect.

Usually, it’s rare to get two regressions the same day. But if a regression broke the CI used for pre-commit, I would suggest to immediately revert the change. For buildbots, we can give more time, since they are not blocking the workflow.

Since you added notifications to GitHub, maybe we should make the rule stricter, to not spam PRs with unrelated regressions.

I don’t know, it’s an hard topic :slight_smile: You should feel safe to revert a change if it breaks the CI. Deciding a rule announced to everybody is advance should make your safe to revert.


(Eric Snow) #7

Ideally, committers would watch the buildbots after merging their PR. So their availability isn’t so much an issue, no? If they don’t have time to watch or to address a failure then they can leave a note on the issue. They would probably also leave a note if they have a reason why a failure-causing merge shouldn’t be reverted. Likewise if they are working on a fix (or planning on a revert). So 2 hours otherwise sounds reasonable.

-eric


(Victor Stinner) #8

Usually, a revert is advertised, like “because of failure xxx on xxx, the commit xxx will be reverted next hour if nobody comes in the meanwhile with a fix” (or something like that). I mean, we don’t revert and then talk. Usually, there is a short talk with comments on the PR and/or issue to notify of the regression.