Auto-verifying bugs in new release

We’ve had the topic of auto-closing bugs come up a couple times now.

Would it be (technically) possible to have a test for a bug, and have that test run when a new version is released, and:

  • if present, add new version’s label
  • if absent:
    • if no active version labels remain for bug
      close it

?

Writing test is a work, and writing good test can be larger work than writing a fix. Without knowing the cause of the bug it is difficult to write right test.

Some bugs are difficult to reproduce automatically – they require third party packages, human interaction, enormous amount of computer resources (memory or CPU time) or unknown conditions.

1 Like

I would have thought so. Of course, the infrastructure would need to be written for bpo and then ported to github as part of the migration, so that’s in effect two implementations that would be needed (and it’s far more likely that there would be difficulties automating it for github, where we don’t control the tracker APIs).

But in principle I like the idea (although as @storchaka points out, it may only be applicable to a small subset of reported issues).

True. Yet we’ll need a test to add to the regression suite to ensure it doesn’t happen again.

Ideally, the bug submitter would attach a test; otherwise, a triager could add one, or core-devs might choose to spend time on it.

I think what you’re after is making us promote merging failing tests even before there is a fix and somehow have that encase the issue that is tracking the failing test.

That’s one way to do it, but not the way I would choose.

If we had a test associated with a bug/PR, then we could have a tool that would go through and run that test when a new version was being released, and take appropriate action depending on the success/failure of the test and which version(s) the bug was targeting.

I would be in favor of automatically adding and changing labels, but I think don’t think the issue itself should be automatically closed if the test passes, as platform-specific and intermittent test failures are a significant concern. In order to close the issue, I think it should require some degree of verification from a core dev or triager rather than relying on an automated process.

So for such a tool you would need to:

  1. Have a PR that only contains tests can be applied cleanly
  2. Get a checkout
  3. Patch it
  4. Build Python
  5. Run the test suite

At which point you’re halfway to solving the problem and you’re just missing the fix. :wink: All of that to me would suggest just committing the test and admitting to ourselves we have a bug or really prioritize fixing bugs so things don’t sit around. Otherwise we could write a script that acts like a mini-tox that does this locally on one’s machine. But otherwise not committing the change seems like a lot of work to put into a tool for code that isn’t getting checked in.

My experience with CI is next to nil, but what I was thinking was more along the lines of:

  1. Build Python
  2. Run the test suite (verifies everything that should pass still does)
  3. For each open PR
    a. extract a specially-named file in that PR (e.g. testthis.py) (skip if not found) (should be no applying of patch as we are extracting a file)
    b. run test and take action based on results

If that’s not feasible, then going with the approach you outlined:

  1. Add new support decorator open_bug(xxx) which does the label updates (maybe closing PR in some cases), but whose results are not reported as a failure to CI.
  2. Perhaps only have these types of tests run with a command-line switch, which is scheduled for once a day or week.
  3. Merge these tests into the main repo.

So, two questions:

  • Is the approach I outlined possible? (Mostly I want to make sure my understanding is not flawed somewhere with these tools.)

  • Is the open_bug refinement possible?

Of course, the final question is whether this is all worth it, but having an idea of the possible solutions would help with that answer as well.

Possible? Sure, although we are starting to push up against CI time limits with how this is outlined. You’re also expecting users to name a file appropriately and then potentially clean that file up later when the PR is ready to be merged. Based on my experience that is a tall order. :wink:

I believe so. And that automatic closing is possible if it was done as a GitHub Action and the issues were on GitHub, but how are you going to now fix the the fact that an open_bug() test is now no longer a bug? Start throwing an exception? Automatically leave a comment on the issue (if everything was managed on GitHub)?

FWIW, you may be interested in https://github.com/rust-lang/glacier, which is the closest thing I’m aware of to a prior art on this concept.

Kind of feels like an automated version of https://github.com/python/cpython/tree/master/Lib/test/crashers.