PEP 669: Low Impact Monitoring for CPython

encukou · August 11, 2022, 2:36pm

Thanks for the PEP!

Tools may see events after returning DISABLE, in which case, they will not see those events until sys.monitoring.restart_events() is called. Note that sys.monitoring.restart_events() is not specific to one tool, so tools must be prepared to recieve events that they have chosen to DISABLE.

This looks like an editing mistake, should it be something like “Tools may return DISABLE from an event callback, in which case, they will not see that event…”?

Thus, if PEP 523 is in use, then calling sys.monitoring.set_events() or sys.monitoring.set_local_events() will raise an exception.

I first read that as “after any function specified in PEP 523 has been called”, but I don’t see how _PyCode_SetExtra would interfere with monitoring. Did you mean “if a custom frame evaluation function is set”?

It seems that JUMP and BRANCH events are each generated by a specific set of bytecodes, which change as new optimizations come in. Would it make sense to expose the sets in dis, so coverage tools can more easily detect blocks? Or is dis.hasjrel+dis.hasjabs already guaranteed to contain all jump opcodes?

markshannon · August 11, 2022, 3:16pm

I first read that as “after any function specified in PEP 523 has been called”, but I don’t see how _PyCode_SetExtra would interfere with monitoring. Did you mean “if a custom frame evaluation function is set”?

What use does _PyCode_SetExtra have without setting a frame evaluation function?
It seems simpler to ban any combination of PEP 523 and PEP 699, than attempting to reason about what might be OK.

Or is dis.hasjrel+dis.hasjabs already guaranteed to contain all jump opcodes?

opcode.hasjrel is what you want.

encukou · August 11, 2022, 4:07pm

Simpler to write, sure, but at least in the implementation/review you’ll need a precise definition to reason about.
It’s unclear (=hard to reason about) what “using” a document means. Accessing interp->eval_frame probably doesn’t qualify as “use of PEP 523”. Setting it probably does, but can’t check if sys.monitoring.set_events() has been called (is the check delayed in that case?). Calling _PyInterpreterState_SetEvalFrameFunc probably does qualify, though the function isn’t even mentioned in PEP 523. Calling _PyEval_EvalFrameDefault obviously doesn’t qualify, even though it was introduced in PEP 523.
Please be precise when writing a PEP, so as a reader I can be sure I know what you mean.

I would be surprised if no one found another use for marking code objects.
Generally it seems weird to group API based on the proposal that introduced it. Does co_extra actually interfere with monitoring?

markshannon · August 11, 2022, 5:00pm

Ok, you’ve convinced me. PEP 669: Clarify and restrict interaction with PEP 523. by markshannon · Pull Request #2760 · python/peps · GitHub

encukou · August 18, 2022, 11:32am

Thank you!

And thank you again for the PEP. It is a great feature, and I have no doubt in your ability to make a self-consistent feature.
I can also see how these “edges” where the feature meets the rest of the system can be annoying – they’re not part of the actual improvement, and they can be under-documented and used in surprising ways. But that’s also why I think a PEP should specify them as carefully than the feature itself, if not more.
So please bear with me. (Or maybe delegate this so you focus on the meat of the change?)

With the current wording, “using PEP 523” as it is written – that is, setting PyInterpreterState.eval_frame directly – will avoid the exception.
I suggest adding the following:

To avoid bypassing _PyInterpreterState_SetEvalFrameFunc() by
setting PyInterpreterState.eval_frame directly (as specified in
:pep:523), the field will be renamed to _eval_frame and documentation
will be updated to avoid references to :pep:523.

I’m happy to help with that documentation update, but since I’m not an expert in this area and don’t know who’ll be affected, I’d go through the PEP process on the decision.

Another place where I’m not sure how this PEP interacts with the rest of CPython is Quickening. I’m still unhappy about PEP 659 being referred to while it’s still a draft.
Was PEP 659 implemented as written, or are there any notable changes?

For better introspection in Tool identifiers, may I suggest an API like:

sys.monitoring.use_tool_id(id, name=None) -> None
sys.monitoring.free_tool_id(id) -> None
sys.monitoring.get_used_tools() -> dict[int, str|None]  # unclaimed IDs are not included
# str(name) should help a human identify the tool, name has no requirements beyond that

so use_tool_id can raise ValueError: monitoring ID 3 is already used by Cinder, and tools can get the name for their own error messages, like warnings on the profiler+debugger case mentioned in Events in callback functions.

I sent PR 2767 with some smaller suggestions.

markshannon · August 18, 2022, 2:09pm

Thanks for reminding me that we need to sort out the status of PEP 659.
PEP 659 is up to date and accurate.

I like the idea of using names. Clashes should be very rare, but it will be a lot easily for a user to sort out if tools have names.
If we are going to name tools, we might as well make the name compulsory.

sys.monitoring.use_tool_id(id, name:str) -> None
sys.monitoring.free_tool_id(id) -> None
sys.monitoring.get_tool(id) ->  str | None

TeamSpen210 · August 18, 2022, 10:26pm

To reduce magic numbers (and seamlessly allow for increasing the number of tools in the future), perhaps it’d be a good idea to add a sys.monitoring.VALID_IDS = range(6) constant? Or just an integer MAX_ID, if using a range here would be problematic.

barry-scott · August 19, 2022, 8:22am

Why not add a call to allocate an Id for a tool?
If that call fails then the caller can report too many tools are using the feature?

encukou · August 22, 2022, 1:55pm

OK, I submitted it to the SC.

IMO Python & the tools should treat names purely as human-oriented “flavor text”, so there’s no need to worry about clashes.
The get_used_tools() -> dict was meant to give you all registered tools at once, which might be useful for introspection. I’d like to see that dict in automated error reports, for example.

markshannon · August 24, 2022, 10:09am

I didn’t mean clashes in names, but in IDs.

I would expect that if a tool’s preferred ID is in use, it would fail.
No one wants two different debuggers running at the same time (unless they are debugging a debugger).

If you really need get_used_tools(), it can be implemented as

def get_used_tools():
     res = {}
     for id in range(6):
          name = sys.monitoring.get_tool(id)
          if name is not None:
              res[id] = name
    return res

fabioz · September 1, 2022, 6:56pm

Hi @markshannon one thing which I don’t see in the PEP is the interaction that it has with exceptions from the monitoring callbacks.

Is it possible to add some info on what’s expected in such cases?

In particular the following points:

What happens when an exception originates from a monitoring callback?

i.e.: One of the hard things in the settrace is that if some exception originates in the tracing callback the tracing is disabled (so, if for some instance the user is paused in a breakpoint and he does a Ctrl+C the debugger will no longer work, which may not be what the user expects). Ideally this wouldn’t happen (or at least KeyboardInterrupt would have special treatment).

What happens in a RecursionError?

A RecursionError also disables the tracing right now (when this happens it can be reasonably hard for users to know why it happened, especially if they silenced it accidentally) – the problem is that even in the new monitoring structure trying to hook into when a stack overflow error is raised there’d probably be no more available stack to handle it in the monitoring, so, it’d be interesting to have some alternative here (or at least make users aware of the issue somehow prior to disabling it).

markshannon · September 2, 2022, 10:26am

What happens when an exception originates from a monitoring callback?

It propagates like any other exception.

What happens in a RecursionError?

It is treated like any other exception.

If the RecursionError is from hitting the recursion limit, then temporarily raising the recursion limit in the callback should allow it to operate normally.

If the RecursionError is from C stack exhaustion, you might find that the VM gives a fatal error if you consume much stack in the callback. Not much we can do there, we can’t magic up extra stack.

nedbat · September 4, 2022, 7:29pm

This is a really interesting PEP. I apologize for taking so long to get to it. I have a number of comments/questions…

In the list of events, what is the logic for when they are named PY_* vs C_* vs no prefix? Why isn’t LINE called PY_LINE? Why isn’t PY_START called PY_CALL? I’m assuming there’s a reason, but it seems asymmetric at a first reading.
It took me a while to understand sys.monitoring.use_tool_id(id, name:str) -> None. Perhaps we can have fleshed out docstrings for these functions. IIUC, use_tool_id means I want to claim an id, and I am associating name with it. I’m not sure what use will be made of name though?
The pre-defined ids make some presumptions about the composability of tools. For example, it assumes that I can’t coverage-measure a coverage tool. It is difficult, and coverage.py uses some tricks to accomplish it, but it’s valuable. Since there’s no enforcement to the idea that only one tool of each kind can be running at once, I suppose everything is fine, but I wonder if this idea will appear in other places with real consequences?
I should know what this means, but I don’t:

You won’t be able to register a C function pointer, but you can implement the PEP 590 vectorcall interface on the callable, for performance close to that of a raw function pointer.

Coverage.py uses C-implemented trace functions now. Will that still be possible?
This sentence could use some clarification:

If a callback function returns DISABLE, then that function will no longer be called for that (code, instruction_offset) until sys.monitoring.restart_events() is called.

5a. LINE takes (code, line_number) rather than (code, instruction_offset); I assume DISABLE will apply to those arguments also.

5b. The BRANCH event is called with (code, instruction_offset, destination_offset). If I return DISABLE, does that disable all (code, instruciton_offset) events, or only those with the same three arguments (code, instruction_offset, destination_offset)? It won’t be useful to disable unless it’s the latter.
In the Coverage Tools section:

Coverage tools need to track which parts of the control graph have been executed. To do this, they need to register for the PY_ events, plus JUMP and BRANCH.

I don’t understand why I would need the JUMP event? Maybe I’m not understanding the full implications of the events. Coverage.py watches line numbers being executed, and today tracks branches by remembering the previous line when a line is executed, and tracking the (previous, current) pairs of line numbers.

Perhaps JUMP and BRANCH make sense for instruction-offset-based tracing rather than line-based tracing? I’m happy to talk more about the way coverage.py currently works, and how it might work in the future.

markshannon · September 7, 2022, 9:07am

Thanks for the feedback.

Most of the design and discussion has focused on semantics, not syntax. So the names might not be the best. Suggestions for improvement are very welcome.
PY_START occurs within the callee, so the call has already happened, whereas C_CALL happens before the call.
Would you prefer it if C_CALL were changed to CALL and included Python functions?
The name is just a name. The VM doesn’t care what it is. It should help debug id clashes.
The pre-defined IDs are just suggestions. They are there to help the common case where you don’t to debug a debugger, or do coverage on a coverage tool. For those unusual cases, you are free to choose any ID you want, and it is then your problem to avoid clashes and provide sensible error messages.
No. Callbacks must be callable Python objects. You can implement those in C, or C++ or Rust, provided the resulting object is callable. Using the vectorcall protocol will give you near C function-pointer performance.
For a coverage tool, Python will be plenty fast enough. The trick is to return DISABLE and only get called once per location.
a. Since the line number is fixed for any (code, instruction_offset), the two are equivalent.
b. All branches from that point. I can see the advantage of tracking each direction independently, but it would be a special case and would impact performance and memory consumption.
Using JUMP and BRANCH is more efficient than line based tracing. Using line numbers will continue to work, though.

nedbat · September 7, 2022, 11:29pm

Most of the design and discussion has focused on semantics, not syntax. So the names might not be the best. Suggestions for improvement are very welcome. PY_START occurs within the callee, so the call has already happened, whereas C_CALL happens before the call.

I was trying to infer a pattern from the names, but if there isn’t one, that’s OK too.

Would you prefer it if C_CALL were changed to CALL and included Python functions?

I wouldn’t want C and Python functions mixed together.

No. Callbacks must be callable Python objects. You can implement those in C, or C++ or Rust, provided the resulting object is callable. Using the vectorcall protocol will give you near C function-pointer performance.

I don’t know what “vectorcall protocol” means, but I will figure it out when the time comes.

For a coverage tool, Python will be plenty fast enough. The trick is to return DISABLE and only get called once per location.

I have a sense overall that you have a specific idea about how a coverage tool will work, and that coverage.py doesn’t work quite that way. In particular, there are a number of reasons why getting called just once per location wouldn’t always be sufficient. There are options in coverage.py that require collecting more data than that: branch coverage and contexts are two options that would mean I can’t disable an event after it is fired.

All branches from that point. I can see the advantage of tracking each direction independently, but it would be a special case and would impact performance and memory consumption.

You’ve placed great emphasis on the idea of disabling a callback after it has fired. In order to measure branch coverage, I need to know all of the branches that have been taken. If I disable the branch event once I’ve received it, and that disables all branches from that point, then I don’t have the information I need. That’s why I said it would be useless to disable the branch event if it didn’t take the destination into account.

Using JUMP and BRANCH is more efficient than line based tracing. Using line numbers will continue to work, though.

I haven’t worked this all through, so I might not be understanding your idea completely. JUMP and BRANCH would give me data about bytecode offsets. If I track that information, then I need to map that back to line numbers to produce a report for the user. Is that right? Can you say more about why JUMP and BRANCH are more efficient than line-based?

I definitely don’t want to have to understand the specifics of individual bytecode operations, but I don’t think you are saying that.

pablogsal · September 15, 2022, 6:56pm

Hi,

I am writing this message on behalf of the Python Steering Council.

We are quite happy with PEP 669 and we think this will be a great addition to Python. Before we are ready to accept the PEP we would like to discuss some aspects of it:

The PEP does not include anything regarding threads. The following questions are pertinent:
- How does one activate/deactivate the new functions on new threads.
- How does one activate/deactivate the new functions on existing threads.
The pep mentions the following:

and sys.setprofile() can be made a lot faster by using the API provided by this PEP.

Could you please add backing information on how can this be true and by how much? We don’t see how changing bytecode will make sys.setprofile “a lot faster”. This should be quantified.
The PEP puts a lot of emphasis on debuggers (PEP 669 – Low Impact Monitoring for CPython | peps.python.org) but there are some questions regarding the APIs provided.
- The provided APIs receive code objects as their argument. Debuggers support breakpoints based on function names or filenames + lines. How are debuggers supposed to translate that into the provided APIs that receive code objects in a performant way? The PEP mentions that
Debuggers can use the PY_CALL, etc. events to be informed when a code object is first encountered so that any necessary breakpoints can be inserted.

but this only would work for function names, not for breaking in arbitrary lines. Forcing debuggers into receiving full LINE events will basically return them to have to trace every line, so it means that the claims that this PEP makes that “it will make debuggers much faster” at the very least require some clarification.
The PEP mentions the following:

This makes sys.settrace and sys.setprofile incompatible with PEP 523

We don’t believe that is True. Many users are leveraging PEP 523 as a trampoline function that calls again the default evaluation function. In this case there is nothing wrong with using sys.settrace and sys.setprofile so we think this PEP should
not formalize anything on these lines and make sys.settrace and sys.setprofile incompatible with PEP 523, as this would technically not be backwards compatible.
Could you also add a section outlining how new events can be added in the future if necessary?
Although we can more or less understand it from the PEP, is unclear how a profile function can request granular results. For example, let’s say a profile function doesn’t want the line number and uses PY_START events, how can the API ensures that this information is not calculated if the callback doesn’t need it?
The questions that @nedbat asks in PEP 669: Low Impact Monitoring for CPython - #35 by nedbat should also be answered to ensure that the API makes sense and that it can be leveraged as much as it can by coverage tools.
In general the PEP lacks time benchmarks for some common usages like simple coverage, profile or tracing functions. Having time benchmark information is important so we can make an informed decision

markshannon · September 20, 2022, 11:32am

I don’t know what “vectorcall protocol” means

PEP 590

If I track that information, then I need to map that back to line numbers to produce a report for the user. Is that right?

Yes, you’ll need to do that. code.co_lines() has the offset to line information.

Can you say more about why JUMP and BRANCH are more efficient than line-based?

Two reasons.

There are fewer JUMP and BRANCH events than line events
JUMP and BRANCH events map to specific VM instructions, so can be instrumented more efficiently.

markshannon · September 20, 2022, 1:00pm

The PEP does not include anything regarding threads.

The PEP makes no mention of threads, because they are not relevant.
Instrumentation is per-interpreter not per-thread. I’ve added a line to the PEP to make this a bit clearer.

“sys.setprofile() can be made a lot faster by using the API provided by this PEP”

The full sentence has a typo in it, which doesn’t help. I fixed it in the PEP. The full sentence should have read:

However, tools relying on sys.settrace() and sys.setprofile() can be made a lot faster by using the API provided by this PEP.

How is this true? Not because the proposed approach is amazingly fast, but because sys.settrace() and sys.setprofile() are really slow.

How are debuggers supposed to translate that into the provided APIs that receive code objects in a performant way?

I don’t know what “receive code objects in a performant way” means, but if you are asking how one should implement a breakpoint in a way that minimizes performance impact, here is one way:

When debugger is attached, create an empty map of filenames to code objects and an empty map of filenames to uninstrumented breakpoints.
When receiving a PY_CALL event:
- For all breakpoints in the uninstrumented map, if they lie within the code object, insert them. Finding the breakpoint is O(log n) where n is the number of uninstrumented breapoints per file.
- Add the code object to the code object map, then return DISABLE.
To add a breakpoint:
- If the code object containing the breakpoint is in the map, use insert_marker() to set the breakpoint.
- If not in the map, then add the breakpoint to the set of uninstrumented breakpoints
- Finding the code object is O(log m) where m is the number of code objects per filename.

Feel free to design your own scheme, but the above scheme is fast enough to implement in Python without noticeable overhead.

We don’t believe that is True [that sys.settrace is incompatible with PEP 523]

Are you claiming that all tools using PEP 523 support sys.settrace and sys.setprofile perfectly? Cinder doesn’t. I doubt that any of the debuggers using PEP 523 work flawlessly with pdb. It isn’t even clear what is debugging what.

Rather than hoping for the best, I think it better to just say: “This doesn’t work”.

Could you also add a section outlining how new events can be added in the future if necessary?

I don’t think that makes sense in the PEP.
Future events are likely to come from future language changes, and I have no way to predict how those would be implemented.

Also I’m not sure whether you are referring to the social or technical process.
The social process; a new PEP or just an issue?
Or do you mean how would the CPython source be changed to support additional events?
If the latter, then no different from any other code change, I guess. Make a PR with the changes.

Although we can more or less understand it from the PEP, is unclear how a profile function can request granular results

I don’t understand what you mean by “granular results”

a profile function doesn’t want the line number and uses PY_START events

The callback for PY_START events is func(code: CodeType, instruction_offset: int). No line number.

how can the API ensures that this information is not calculated if the callback doesn’t need it?

You can’t. Although I am puzzled why any user of the API would worry about the VM doing pointless calculations.

In general the PEP lacks time benchmarks for some common usages like simple coverage, profile or tracing functions. Having time benchmark information is important so we can make an informed decision.

I’m afraid there will be no benchmarks until it is approved, as I’m not willing to implement it until at least conditionally approved.

You could make approval conditional on the performance being good enough. That way I’m not wasting my time implementing this for you to reject it, and you are not accepting it without performance being satisfactory.

Regarding coverage, take a look at Slipcover which uses instrumentation
and is faster with coverage on 3.11 than no coverage at all on 3.10. The instrumentation is a bit fragile as there is no VM support. With VM support performance would be even better.

For debuggers, the scheme I described above costs one call into the debugger for each code object (not per call) plus the overhead of the actual breakpoints, and no other overhead.

For profilers, instrumentation will be quicker than sys.setprofile(), but if you care about performance use a statistical profiler

I hope that clarifies things.

carljm · September 21, 2022, 1:32am

Cinder doesn’t use PEP 523 either, so probably not a relevant example here. (It is true that the Cinder JiT doesn’t support sys.settrace or sys.setprofile at all.)

markshannon · September 21, 2022, 10:51am

IIUC, Cinder replaces the entirety of _PyEval_EvalFrameDefault(). So while it may not use PEP 523, it does the equivalent.
I think the same argument also applies to Pyston.

My point is that replacing the _PyEval_EvalFrameDefault() with anything but the most trivial wrapper and correctly supporting sys.settrace() is sufficiently difficult that we might as well just declare it impossible.