As part of my work to publish C-level coverage of the Python test suite (see Issue #94759), I’m finding some interesting holes in coverage that we should probably work to fix. These range from things that are good first bugs to things that probably require a bytecode interpreter specialist to resolve. I’d like to start filing bugs, but since I anticipate there being dozens of them, I thought it best to work out a plan here first since I don’t want to place an undue burden on triagers. To reassure those worried this will just create more work, I expect the Faster CPython folks will take on resolving a lot of these bugs, since improving this coverage is pretty important to confidently making the kinds of changes we’d like to make – we are motivated to make this better.
My plan (all open to discussion) is to:
File bugs only for large coverage holes with “interesting logic”. Chasing down every bit of error handling, especially memory-exhaustion related, is a fool’s errand, IMHO.
Create a metabug with a checklist, where each entry is a C source file. This will be used to keep track of which files have been examined and had bugs reported. This allows me to share some of the bug-filing work with others.
Each block that needs coverage improvement would get its own bug. These probably should have the [test] label, many of them will have the [easy] label as well as an [expert-*] label. One could imagine adding a [coverage] label, but I understand the desire to not proliferate the number of labels. That could also be dealt with by having a special token in the subject line (e.g. [coverage]) and/or by linking back to the metabug.
Posting this plan (once finalized) somewhere prominent so people can understand and help with the plan.
This is fine for most areas but IMO ceval.c is an exception where every single branch even if it is very unlikely error handling should be tested. See #93252 as an example where the error handling was wrong as it didn’t pop the frame from the the thread state when an exception occurred when creating a frame.
Return on investment would be less bugs to fix after beta releases, at the very least the coverage report will help discover potential point of bugs. The specializing adaptive interpreter only optimizes code IIRC 8 or so executions so most test code runs the “usual” form of bytecode or “unspecialized” which leads to complex bugs to debug as specialized instructions are not tested as much as regular instructions. This isn’t hypothetical but happened in practice specialized PRECALL opcodes don't check types · Issue #92063 · python/cpython · GitHub.
@kumaraditya303: I think you and I agree – deciding what to cover requires some judgement and you’ve made a good argument that ceval.c is one place to pay extra attention to. The point of my original statement was just to make it clear that 100% coverage isn’t the goal (and I was thinking mostly of places where PyMem_Malloc fails etc. which probably aren’t worthy of coverage).
As for the issue of making sure we are testing optimizations – that seems like we may need some special approaches specifically for that. The issue direct unit tests for compiler optimisations is kind of related (more about testing the optimizer passes than running the optimized bytecode, though). If you have any thoughts about how to make sure we are running optimized code more often in the test suite, that seems maybe worth making an issue for, if there isn’t one already.
How to do this can be a bit of “make the process up as you go along”… we know important things like the ceval loop deserve at least a tracking issue if not some sub issues for particular parts. While others are less urgent and could be a checklist of files with gaps that someone found noteworthy in a rollup issue.
I really do like that github seems to make using markdown checklists in bugs easy so I’d probably start with that approach and see what works. If a sub-issue is filed for something the bullet item in the list can be updated to link to it.
This also might be an opportunity for a GitHub Project (as it seems to be what GitHub projects are designed to organize), though that does add some initial complexity and there are some usability limitations for triagers, at least until python/core-workflow#460 is resolved (I plan to open a new dedicated #core-dev thread on it to gather any additional feedback on the proposed workaround before proposing it to the SC for consideration).
Eventually we settled on using that issue as meta-issue and a checklist for all the files. Each file might then have sub-items that briefly describe a problem or link to other issues/PRs. This should allow keeping track of all the problems while avoiding a flood of preemptively created issues, and @mdroettboom seemed happy with this solution.
The same could also be accomplished with a project, and it was suggested in the issue, but if we want to group the issues by file we would need to create a custom field with 100+ values for each file, and then manually add and classify all (draft) issues to the project as they are created.
A milestone could be used in addition to the checklist/project, but AFAIK it only tracks actual issues and not draft issues, so I don’t think it will be particularly useful.