PEP 669: Low Impact Monitoring for CPython

Hi, looks like I’m late to the party. :slight_smile:

The PEP was accepted without mentions of a C-API and without a way to let users send events to listeners. This is unfortunate because it means that software that wants to mimic Python’s execution events (like Cython or other tools that execute code and want to support profiling or tracing) cannot do this. In addition, it seems that the implementation is not backwards compatible with the previous thread state behaviour and thus, events sent the old way no longer reach new-style listeners.

It’s probably too late to revive profiling and tracing support in Python 3.12, but I would like to see it re-enabled at least in 3.13.

For that, we need a C-API that allows injecting events into the system efficiently. It looks like this would require exposing the _PyInterpreterFrame, which is now hidden in pycore_frame.h. Code that wants to generate CPython compatible events probably needs this. Overall, the event interface seems very much tied into CPython’s execution internals, which is really unfortunate since it’s part of the design. I see a couple of references to an “instruction offset” in the event arguments, which seems meaningless without byte code. Finally supporting branching events (e.g. for coverage analysis) would also have been nice, but again, a “destination offset” is probably not easily provided.

Overall, it seems that this is yet another incarnation of the problem that CPython’s own C-API is not good enough to implement its own features.

How can we get the event sending side back to a usable state?

3 Likes

I looked some more into the details.

  1. We probably don’t need frames. That’s great, because it’ll remove a lot of ugly complexity from Cython’s tracing/profiling code (in 5-10 years, when we drop Python 3.11 support).

  2. We need a way to signal events. Since events have more than one signature, we might end up needing more than one C-API function for this, but we’ll see.

  3. We need a way to map 3D source code positions (file name, line, character) to 1D integer offsets. Code objects help quite a bit, but branches might cross source file boundaries, so that’s more tricky. For most use cases, a mapping between (line, character) positions and an integer offset would probably suffice.

I created issue C-API for signalling monitoring events · Issue #111997 · python/cpython · GitHub to discuss the implementation.

I am almost done updating coverage.py to use sys.monitoring for line measurement (branches are still in the future). One thing I noticed now in the API that seemed odd to me:

sys.monitoring.restart_events() → None

Enable all the events that were disabled by sys.monitoring.DISABLE for all tools.

Why does this affect all tools? Everything else is scoped to a particular tool id. It seems like a big hammer for me to restart events for everyone when I need to restart events for me.

Yes, sys.monitoring.restart_events() is quite a large hammer.
It is designed for attaching a debugger and the like, where a clean start is needed.

OOI, what are you using it for?

In coverage.py for statement coverage, I am disabling line events once they have fired. But I also can stop coverage and re-start it, so I call restart_events when coverage is started to make sure I will get the correct events in the second coverage measurement.

Soon, I will be supporting context measurement with sys.monitoring also. This lets you determine (for example) what tests covered what parts of the code. To do that, I’ll need to restart events when the test changes, so many times in a single process.

Perhaps it is fine, but it gave me pause when I saw it. I don’t know what else is using sys.monitoring, and my need to call restart_events is necessarily affecting those other tools. Is it hard to have restart_events scoped to a single tool, just as the rest of the API is?

One particularly tricky aspect of coverage.py is when it measures itself running its own test suite. I haven’t yet gotten to ensuring that works correctly with sys.monitoring, but I suspect there are some entanglements there as well.

2 Likes

I’ve been working with the sys.monitoring framework for the past few weeks, great stuff.

I’ve had issues using it with dynamically compiled expressions though (using the compile builtin) which are later ran using eval or exec. Is monitoring expected to work on code objects created through compile?

@zoranuri Could you show a simple example of something that doesn’t work as you expected?

Is there a fundamental reason why no API was offered for setting INSTRUMENTED_LINE only on specific lines within a code object? My understanding is that, when the LINE event is enabled, the bytecode would be rewritten with an INSTRUMENTED_LINE taking the place of the original code unit every time the line number changes. If there are no issues in theory with offering such an API, I would like to propose it. Debugging would be a use-case for this whereby one would be able to set a breakpoint/probe on a specific line, without paying the cost on other lines.

Does the PEP not answer that? Maybe its author knows why.

I don’t think the PEP does unfortunately. I’ve started tinkering with it and I’ve got a simple PoC going in Comparing python:main...P403n1x87:feat/instrument-single-lines · python/cpython · GitHub. It does the expected thing:

import sys, dis


def instrument_me():
    a = 0
    for i in range(10):
        a += i
    return a


def line_callback(code, line):
    print(f"Instrumentation on line {line}, in {code.co_qualname}. Local variables:")
    for n, v in sys._getframe(1).f_locals.items():
        print(f"  {n}: {v}")


def line_to_offset(code, line):
    for offset, l in dis.findlinestarts(code):
        if l == line:
            return offset
    raise ValueError(f"Line {line} not found in code object")


sys.monitoring.use_tool_id(sys.monitoring.DEBUGGER_ID, "debugger-test")

sys.monitoring.set_local_events(
    sys.monitoring.DEBUGGER_ID,
    instrument_me.__code__,
    sys.monitoring.events.LINE,
    {"offsets": {line_to_offset(instrument_me.__code__, 7)}},
)

sys.monitoring.register_callback(
    sys.monitoring.DEBUGGER_ID, sys.monitoring.events.LINE, line_callback
)


instrument_me()

Instrumentation on line 7, in instrument_me. Local variables:
  a: 0
  i: 0
Instrumentation on line 7, in instrument_me. Local variables:
  a: 0
  i: 1
Instrumentation on line 7, in instrument_me. Local variables:
  a: 1
  i: 2
Instrumentation on line 7, in instrument_me. Local variables:
  a: 3
  i: 3
Instrumentation on line 7, in instrument_me. Local variables:
  a: 6
  i: 4
Instrumentation on line 7, in instrument_me. Local variables:
  a: 10
  i: 5
Instrumentation on line 7, in instrument_me. Local variables:
  a: 15
  i: 6
Instrumentation on line 7, in instrument_me. Local variables:
  a: 21
  i: 7
Instrumentation on line 7, in instrument_me. Local variables:
  a: 28
  i: 8
Instrumentation on line 7, in instrument_me. Local variables:
  a: 36
  i: 9

Why can’t you specify which line numbers?

The original version of the PEP did offer the ability to set probes at specific points in the code, but it was more complex to implement and to use than the simpler approach of instrumenting all the lines and then disabling the ones you don’t want.

How do you monitor a few lines efficiently?

The docs are here: sys.monitoring — Execution event monitoring — Python 3.13.1 documentation
Specifically here

Thanks @markshannon.

but it was more complex to implement

Having put my hands into it I can see where some of the complexity arises from

the simpler approach of instrumenting all the lines and then disabling the ones you don’t want

So the DISABLE return value will undo the instrumentation at just the offset where the callback returned it, rather than disabling the whole event entirely for the code object. I can see how this is essentially equivalent to eventually enabling instrumentation on certain lines. However, when implementing tools for production environments, this approach is not ideal (but at this stage probably still better than any other alternative).

What would be the “attitude” towards implementing what I’m proposing? I’m asking just to see whether I should continue investing more into implementing it, or whether I should look for alternatives instead :pray:

However, when implementing tools for production environments, this approach is not ideal

OOI, why not?

In the worst case, a user might want to add instrumentation to the end of a pretty long function. When that runs, the callback will be hit for every other line that is not of interest. Whilst the amortised cost is eventually negligible, I feel this is still not ideal in a production environment. That is, “not ideal” as opposed to the alternative where we can straight-away instrument the line(s) of interest.

“Amortized cost is negligible” sounds close to ideal to me :slightly_smiling_face:

It only takes in the order of a microsecond to call a trace function and have it return DISABLE immediately, so the overhead should be low even for large functions.
Complicating the process of instrumenting the code also has a cost, so the cost advantage is not at all clear.

It also complicates the semantics. What happens if all line events are enabled, some are DISABLEd and then we want to turn on events for a some lines. Does that re-enable all events, just some of them, or what?

I would argue that a microsecond is quite “a lot” when some simple operations (e.g. addition) take O(10ns). Unless I’m doing something terribly wrong, with this example I get a factor of 2 improvement with the code in my change above:

import sys, dis, os, time

sys.monitoring.use_tool_id(sys.monitoring.DEBUGGER_ID, "debugger-test")


def instrument_me():
    a = 0
    for i in range(10):
        a += i
    return a


def line_callback(code, line):
    ...
    # print(f"Instrumentation on line {line}, in {code.co_qualname}. Local variables:")
    # for n, v in sys._getframe(1).f_locals.items():
    #     print(f"  {n}: {v}")


def lines_to_offsets(code, lines):
    for offset, l in dis.findlinestarts(code):
        if l in lines:
            yield offset


def instrument_lines(code, lines):
    sys.monitoring.register_callback(
        sys.monitoring.DEBUGGER_ID, sys.monitoring.events.LINE, line_callback
    )
    sys.monitoring.set_local_events(sys.monitoring.DEBUGGER_ID, code, 0)
    if not lines:
        return
    sys.monitoring.set_local_events(
        sys.monitoring.DEBUGGER_ID,
        code,
        sys.monitoring.events.LINE,
        {"offsets": set(lines_to_offsets(code, lines))},
    )


def instrument_lines_old(code, lines):
    def callback_closure(code, line):
        if line not in lines:
            return sys.monitoring.DISABLE
        ...


    sys.monitoring.register_callback(
        sys.monitoring.DEBUGGER_ID, sys.monitoring.events.LINE, callback_closure
    )
    sys.monitoring.set_local_events(sys.monitoring.DEBUGGER_ID, code, 0)
    if not lines:
        return
    sys.monitoring.set_local_events(
        sys.monitoring.DEBUGGER_ID, code, sys.monitoring.events.LINE
    )


(instrument_lines_old if os.getenv("OLD", False) else instrument_lines)(
    instrument_me.__code__, {6, 8}
)

start = time.monotonic_ns()
c = 0
while time.monotonic_ns() - start < 1_000_000_000:
    instrument_me()
    c += 1

print(c)
❯ ./python.exe -m test_monitoring  # with the single-lines change
2355242

❯ OLD=1 ./python.exe -m test_monitoring. # with the current implementation 
1228398

For reference, the code with no instrumentation at all produces these many iterations

❯ ./python.exe -m test_monitoring       
2951693

I believe the large difference comes from having to perform the disablement check every time the callback is called. This is not needed when one does not have to disable code locations.

If you return DISABLE then the callback won’t be called again, so it has no overhead.
In your example, you aren’t disabling the hot lines in the code, as a debugger would do.
Try a more realistic example, and see what the overhead is.

But that’s what I’m doing in

    def callback_closure(code, line):
        if line not in lines:
            return sys.monitoring.DISABLE
        ...

no?

What is the ...?
In a debugger that would do a lot of work, totally dominating the overhead of instrumentation.

What is the ... ?

That’s where the actual debugger logic would be, same as in the callback that is used with the proposed change (here we don’t need to perform the line check at the beginning)

def line_callback(code, line):
    ...

And of course ... would dominate the instrumentation cost.