I gave a talk yesterday at language summit about native profiling. Discussing some ideas about having some native profiling features in the language based on signal delivery.
Summary:
Attribute native time to signal delay
Account for child threads by monkey patching, child thread enumeration , stack inspection and byte code disassembly
Similar ideas can also report on memory for both Python vs. Native
Most of these are already used by the community in the scalene library here:
I read the paper and could be convinced that Scalene is the current best CPython profiler. I am not quite clear what your goal is. Incorporate some part of the core of scalene into CPython? Reproduce some of its core functions?
Yes, I am proposing providing some core part of scalene, I am more interested in bringing out the Python vs. Native time abstraction. We could improve some things, a good starting point would be starting with the CPU time part. The memory bit is tricky with the shim allocator replacing the Python allocator, but I think its still feasible.