PEP 669: Low Impact Monitoring for CPython

Given the absolute silence when I announced this PEP on Python-dev, I thought I’d try again here.

Any comments or thoughts?

Feel free to comment here, or on python-dev.

2 Likes

This looks promising. I have a dedicate library for in-production runtime profiling, but the goal of that library is more high-level than this PEP: I suspect monitoring every function call, then checking if the called function is in the set of functions I want to monitor, could still lead to a significant performance penalty.

I usually know callbacks which are invoked when events occur as “event handlers”.

Do you expect tools to update their event handlers for newer versions of Python, or are you guaranteeing that the required event handler signature stays the same in newer versions (or uses some dynamic argument-passing via inspection)? If not, would you consider some struct / named-tuple / slotted-dataclass as the only argument (or code, struct as two arguments) for all handlers going forward?

You say that “the callee’s frame will be on the stack”, suggesting that the event handler will have access to that frame (and the rest of the stack). Does that mean the event handler’s (invocation’s) frame will be on the same stack (as the next frame)? That’s seems to be the case judging by the event handler’s signature. This would mean exceptions raised in the handler could bubble to the user, and look like the user is partly responsible (as their code is part of the traceback). Alternatively, I suggest putting the event handler in its own new stack, and passing the stack/frame in to the handler (or you could just warn raised exceptions in handlers).

I read PEP 669 and PEP 659. I found them to be quite amazing and really useful. On the point of PEP 669, I think this is very useful and much more effiecient.
I always found a great need of profiling code that is running in production instead of profiling code over simulated traffic. However, there’s always some subtle over head one incurs even when they use sys.setprofile() and sys.settrace(). So if this PEP proposes a more powerful way one can implement profiling of production code, I’d gladly take it up.

PEP 659 is also something I’m excited about.

The API will be fixed. Event handlers that work for one version should continue to work indefinitely.
We might add new events, but the old ones would still work.

The event handler will be on the same stack, and any errors in the event handler would propagate up to the user.
Tool writers might need to take care not to raise exceptions, and to report errors by other means.

Very interesting! I maintain pyinstrument, a CPython profiler. This could improve performance quite a bit I think and provide a nicer API, too :slight_smile:

A couple of questions on this :

  1. The PEP says

To register a callable for events call:

sys.monitoring.register_callback(event, func)

Functions can be unregistered by calling sys.monitoring.register_callback(event, None).

So does this mean that multiple sys.monitoring users can operate at the same time? E.g. a profiler and a debugger, or multiple profilers at the same time? That would be good. I’ve had cases where users accidentally start multiple profilers at the same time and with the sys.setprofile API, they just silently overwrite each other, not good!

  1. Should the PEP include a C API like PyEval_SetProfile? I have found that C API to be more performant, presumably because a C function call is a lot cheaper than a Python one.

  2. Why is it necessary to call sys.monitoring.set_events and register_callback, could the set of active events instead be computed by the active callbacks that are registered?

1 Like
  1. Only one can callback object can be registered at once.

It is responsibility of the tools to not trample on each other. You could make a wrapper to pass events to multiple callbacks.
I’ll update the API so that register_callback returns the previously registered callback, so tools will at least know if they need to cooperate.

  1. Registering callbacks is an infrequent operation. Its performance is unimportant. It is the performance of the callback objects that matters. You can implement the callback object in any language you like, as long as it is callable.

  2. There is no reason to couple registering callback objects and turning on and off events. It would reduce flexibility for no obvious benefit. Specifically, we want to be able to turn events and off for individual code objects.