The PEP does not include anything regarding threads.
The PEP makes no mention of threads, because they are not relevant.
Instrumentation is per-interpreter not per-thread. I’ve added a line to the PEP to make this a bit clearer.
“sys.setprofile() can be made a lot faster by using the API provided by this PEP”
The full sentence has a typo in it, which doesn’t help. I fixed it in the PEP. The full sentence should have read:
However, tools relying on sys.settrace()
and sys.setprofile()
can be made a lot faster by using the API provided by this PEP.
How is this true? Not because the proposed approach is amazingly fast, but because sys.settrace()
and sys.setprofile()
are really slow.
How are debuggers supposed to translate that into the provided APIs that receive code objects in a performant way?
I don’t know what “receive code objects in a performant way” means, but if you are asking how one should implement a breakpoint in a way that minimizes performance impact, here is one way:
- When debugger is attached, create an empty map of filenames to code objects and an empty map of filenames to uninstrumented breakpoints.
- When receiving a
PY_CALL
event:
- For all breakpoints in the uninstrumented map, if they lie within the code object, insert them. Finding the breakpoint is
O(log n)
where n
is the number of uninstrumented breapoints per file.
- Add the code object to the code object map, then return
DISABLE
.
- To add a breakpoint:
- If the code object containing the breakpoint is in the map, use
insert_marker()
to set the breakpoint.
- If not in the map, then add the breakpoint to the set of uninstrumented breakpoints
- Finding the code object is
O(log m)
where m
is the number of code objects per filename.
Feel free to design your own scheme, but the above scheme is fast enough to implement in Python without noticeable overhead.
We don’t believe that is True [that sys.settrace
is incompatible with PEP 523]
Are you claiming that all tools using PEP 523 support sys.settrace
and sys.setprofile
perfectly? Cinder doesn’t. I doubt that any of the debuggers using PEP 523 work flawlessly with pdb
. It isn’t even clear what is debugging what.
Rather than hoping for the best, I think it better to just say: “This doesn’t work”.
Could you also add a section outlining how new events can be added in the future if necessary?
I don’t think that makes sense in the PEP.
Future events are likely to come from future language changes, and I have no way to predict how those would be implemented.
Also I’m not sure whether you are referring to the social or technical process.
The social process; a new PEP or just an issue?
Or do you mean how would the CPython source be changed to support additional events?
If the latter, then no different from any other code change, I guess. Make a PR with the changes.
Although we can more or less understand it from the PEP, is unclear how a profile function can request granular results
I don’t understand what you mean by “granular results”
a profile function doesn’t want the line number and uses PY_START
events
The callback for PY_START
events is func(code: CodeType, instruction_offset: int)
. No line number.
how can the API ensures that this information is not calculated if the callback doesn’t need it?
You can’t. Although I am puzzled why any user of the API would worry about the VM doing pointless calculations.
In general the PEP lacks time benchmarks for some common usages like simple coverage, profile or tracing functions. Having time benchmark information is important so we can make an informed decision.
I’m afraid there will be no benchmarks until it is approved, as I’m not willing to implement it until at least conditionally approved.
You could make approval conditional on the performance being good enough. That way I’m not wasting my time implementing this for you to reject it, and you are not accepting it without performance being satisfactory.
Regarding coverage, take a look at Slipcover which uses instrumentation
and is faster with coverage on 3.11 than no coverage at all on 3.10. The instrumentation is a bit fragile as there is no VM support. With VM support performance would be even better.
For debuggers, the scheme I described above costs one call into the debugger for each code object (not per call) plus the overhead of the actual breakpoints, and no other overhead.
For profilers, instrumentation will be quicker than sys.setprofile()
, but if you care about performance use a statistical profiler
I hope that clarifies things.