Updating sys.monitoring: how branch events are disabled

nedbat · July 30, 2024, 10:55am

Some time ago, I raised a concern that sys.monitoring’s branch events couldn’t be disabled in a way that provided the information needed for coverage measurement: PEP 669: Low Impact Monitoring for CPython - #47 by nedbat

Disabling a branch event after its first firing will make it useless. If I only get low impact monitoring by disabling the event, then I won’t have low-impact branch coverage. Is this OK?

In the time since, I’ve explored two ways to use branch events as they are currently implemented:

Add my own bookkeeping to disable the event once both destinations have been recorded. This adds overhead and requires keeping the event enabled longer than we want. The result is slow.
Use ideas from SlipCover to add synthetic lines during compilation so that branch coverage can be measured using only line events. This is complex and limits the ways coverage measurement can be collected.

Neither of these feel like good engineering. They are work-arounds that either miss the advantage of low-impact monitoring, or shift the complexity burden to users of the API.

I would like to update sys.monitoring to provide branch events that can be disabled in a more useful way: when a branch event is reported with a source and destination, disabling the event will disable only that source/destination pair. Events will still be reported for the same source but with a different destination. For compatibility, this would likely need to be a new event rather than changing the behavior of the existing event, but perhaps not.

I can help with drafting and/or implementation if needed.

jpizzorno · July 30, 2024, 2:53pm

+1 for branch events to be disabled on a per (source, destination) basis, leaving the branch active for the other destination (or destinations, if that were the case); that would be ideal.

On a different, but very much related issue, we also need a way to map from bytecode back to the source code construct that originated the branch… using code.co_positions() yields inconsistent results. I had some examples in my PyCon’24 talk, but would also be happy to share here if there is interest.

… Juan (SlipCover author)

gpshead · July 30, 2024, 3:39pm

For reference, to save others the search, I believe @jpizzorno 's PyCon 2024 talk is Near Zero-Overhead Python Code Coverage (youtube).

markshannon · July 31, 2024, 10:43am

It should be possible to disable taken/not-taken branch events separately.
I can see two ways of doing it:

Add NOPs to the not-taken paths and instrument those to produce “branch not taken” events. This would slow down the interpreter a bit, but we should be able to rely on the JIT eliminating those NOPs in hot code.
Add quite a lot of extra machinery to track events for taken and not-taken branches. This would add a lot of complexity (and probably bugs) to the implementation. It would also slow down branches where one direction was disabled, as we couldn’t remove the instrumentation until both directions were disabled.

@jpizzorno can you make a CPython issue for any problems you are having with co_positions()?

I am curious about the approach of inserting fake AST nodes.
That seems like an approach that would work for 3.12 and 3.13, unlike adding new events.
What are the issues with that approach?

markshannon · July 31, 2024, 4:02pm

I’ve run a quick experiment to see what the performance impact of adding the extra NOPs is:
On the plain interpreter, I see a slowdown of something like 0.5% to 1% slowdown.
This is larger than I think worthwhile for this feature, but I might be able to reduce this a bit.

mdroettboom · July 31, 2024, 9:36pm

Would it be possible to do this by recompiling the code objects retroactively only when branch events are enabled? Then this should have no impact unless used…

nedbat · July 31, 2024, 11:06pm

Inserting fake AST nodes works: the line events give accurate branch traversals and can be disabled for low overhead. But inserting the nodes requires an import hook that compiles the code. Import hooks introduce problems:

That import hook clashes with the pytest assert-rewriting import hook.
The import hook can only work if coverage is started and registers the hook before any measured code is imported. Many people use the pytest plugin pytest-cov to run coverage, which will often import product or test code before starting coverage.

markshannon · August 1, 2024, 8:37am

That’s not really feasible for a few reasons.

The bytecodes are embedded in the code object, so cannot be modified beyond the per-instruction modifications that the interpreter and monitoring do
In order to fully restore performance once all monitors were disabled, we would need to track the number of instruments and when it hit zero, recompile again.
It’s quite complex machinery on a rarely use path, and that makes it really hard to find the inevitable bugs.

markshannon · August 1, 2024, 8:57am

Performance of a tweaked version of the naive approach of adding one NOP per jump seems to have a low-enough performance impact (in the noise, but probably < 0.5%) Branch
Since we expect the JIT to execute >80% of the code, and to eliminate the NOP, that’s an expected overhead of <0.1%.

markshannon · August 1, 2024, 9:12am

CPython issue

nedbat · August 1, 2024, 10:27am

Thanks for pushing this forward! I’ll watch the issue. Let me know how I can help

markshannon · August 1, 2024, 3:37pm

Here’s a draft PR to experiment with: GH-122548

jpizzorno · August 1, 2024, 3:39pm

I’m a little bogged down right now, but will do. Thanks, Mark!

jpizzorno · August 6, 2024, 10:11pm

Here it is: Better mapping of `sys.monitoring` branches back to source code · Issue #122762 · python/cpython · GitHub