Improve the accuracy of output from inspect.getsource for expression-based code objects

iritkatriel · March 21, 2024, 6:52pm

The question is can you get that information from the code object (or from the function object containing it).

Why is that the question?

We have the code object and the source code. Why should we not use the source code to get this information?

MegaIng · March 21, 2024, 7:06pm

While what you are describing is possible, I believe this is quite complex and requires careful consideration. It would be way simpler for the consumer if code objects had an easy way to communicate their start and end columns. If you have an idea for how to implement this easily with the already available tools, I think it would be best if you posted a quick prototype.

iritkatriel · March 21, 2024, 7:42pm

Changes to the code object API require a PEP. The PEP would need to justify why the change is needed.

BrenBarn · March 22, 2024, 3:44am

Well, I wasn’t the one who posted the original question, but later he said:

And I think that makes sense. What you pass to inspect.getsource isn’t source code, it’s a Python object. It seems quite reasonable to me to say that we would like people to be able to use inspect.getsource in cases where it doesn’t currently work (examples of which were shown in the thread).

You might not have the source code because the code using inspect.getsource might be for instance, a debugging tool or an IDE or some sort of plugin or who knows what, that gets passed some Python object from who knows where and wants to be able to display its source code. In other words, again, all the places code might currently use inspect.getsource are cases where that calling code might not know where the source code is — that’s why it calls inspect.getsource to get it.

That’s fine, and I’d imagine the justification could be similar to PEP 626, which added co_lines:

To assist tools, a new co_lines attribute will be added that describes the mapping from bytecode to source.

blhsing · March 22, 2024, 8:38am

I’ve rewritten my original post in the format of a PEP, with more coherent reasonings and justifications then.

blhsing · March 22, 2024, 9:26am

Note that my original idea when first creating this topic was to add co_end_lineno, co_col_offset, co_end_col_offset as new attributes to the code object, and I think that remains to be a viable option.

It would marginally increase the size of a code object, but avoids the downside of requiring existing code analysis/coverage tools that presume no-op bytecodes to have zero width to be refactored.

iritkatriel · March 22, 2024, 9:47am

RESUME is an implementation detail. The PEP would need to add it to the language specification. It would also need to explain why the alternative of using the AST was rejected (are you claiming that the performance of inspect.get_source is important?)

You’d also need a core dev to sponsor the PEP. Hopefully one will turn up.

blhsing · March 22, 2024, 10:16am

I see. In that case I would be in favor of switching back to my original idea of adding 3 new attributes to the code object, so to keep the proposal compatible with Python implementations that do not use CPython bytecodes.

Sure. Will add that section. I do think the performance of inspect.getsource is rather important in keeping the impact of a runtime profiling tool as minimal as possible so it can remain unobtrusive to provide more accurate profiling measurements.

Would you kindly sponsor a PEP? I know you contributed a lot to CPython’s compiler so I can certainly respect that. Thanks.

iritkatriel · March 22, 2024, 10:43am

I’m not convinced that this justifies a change to code object, so I am not the right person to help push your idea forward.

iritkatriel · March 22, 2024, 10:50am

Another point you need to consider is that comprehensions are sometimes inlined, and then there is no code object for them.

blhsing · March 22, 2024, 1:04pm

I see. Sure. Do you have a specific concern though? Is it because of the size of the added 3 integers even though tens if not hundreds of bytes have been added to the line table to support accurate debugging? Or is it because the accuracy of inspect.getsource is unimportant to you?

I just want to be able to handle code objects in a generic fashion. If there’s no code object generated from a construct, there’s nothing to handle to begin with. So it’s fine if something is made inline.

iritkatriel · March 22, 2024, 2:07pm

Do you have a specific concern though?

My main concern is that you are not considering alternatives to changing the code object API. Changing such a fundamental API is a big deal, we don’t do that on a whim.

You would need a lot more research in order to convince me that it’s necessary. At a minimum: (1) you implement the alternative with AST as well (or show why that’s not possible). (2) you measure the performance gain you get by changing code object API compared to the AST solution. (3) you demonstrate (experimentally) that there are important, real world applications that would benefit from the performance improvement you achieved.

Or is it because the accuracy of inspect.getsource is unimportant to you?

This is a cheap shot. I proposed an alternative solution that I believe will achieve the same accuracy.

blhsing · March 22, 2024, 2:24pm

Thanks. I get your points now.

My need for a generic runtime profiler is real, and yes there are always workarounds. Just thought this would be a relatively small change since the information needed is readily available and just needs to be exposed.

Will attempt at an AST-based solution for a measured comparison.