Improve backtraces in REPL context with ability to interpret a #line directive or equivalent

As a beginner, the python code I sketch in editor and submit to the REPL frequently hits runtime errors.

In editors that allow to select text from a python file, and submitting it to the REPL, I get confused by the backtraces, it is because they don’t show the correct line number in such context (say VSCode, using alt+enter).

When I do the same kind of work with F#, the editor keeps track of the submission to the REPL (in terms of filename and line in the file), and issues preprocessor directives (#line) before the code, which allows the potential backtraces to be representative.

Prior art:

I also believe the same technique is used for generated code to be debugged from another representation of it (in context of macro expansion, etc.), but I’m mainly interested for the basic REPL usage.

Would there be an interest in python interpreter exposing such preprocessor directives or maybe calls to an interpreter session object, that could be hooked into the editors?

Hmmmmmmm, very interesting.

(Other prior art, in case anyone’s wondering if this is a Microsoft-specific feature: Line Control (The C Preprocessor) I’m pretty sure it predates both the C# and F# languages.)

There’s no fundamental reason why this CAN’T be done. In fact, for an initial proof of concept, it could even be done with actual #line statements, as it would thus be completely backward compatible - this would be similar to type comments. At least, the basic usage of #line 123 would work that way; not sure if a quick hack would allow changing the file name, but certainly the line number.

Let’s see. To make this work, you’d need to parse to AST, then go through all the nodes and change their line numbers, and finally compile it the rest of the way. How hard can it be?

Caution: This code is EXTREMELY simplistic and nothing more than a VERY basic proof-of-concept. Still, it works, both for warnings and for exceptions. Tested on Python 3.4 and 3.13, although for 3.4 I had to switch out str.replaceprefix.

code = """# imagine this came from an external file
import warnings

def okay(n):
	print(n, n / 2)

def iffy(n):
	if not n:
		warnings.warn("Dividing by zero is deprecated, "
			"future universes will treat this as an error")
		n = 1
	print(n, 1 / n)

#line 1000
def bad(n):
	print(n, n + 1)

okay(42)
iffy(0)
bad(None)
"""

import ast
# Compile the module to AST
module = ast.parse(code, "linesdemo.py")
# Scan for #line directives (VERY simplistic)
xfrm, pos = {}, 1
for idx, line in enumerate(code.split("\n"), 1):
	print("%2d %4d %s" % (idx, pos, line))
	xfrm[idx] = pos
	if line.startswith("#line "):
		pos = int(line.removeprefix("#line "))
	else:
		pos += 1
# Walk the module and replace all line numbers
for node in ast.walk(module):
	try:
		node.lineno = xfrm[node.lineno]
		node.end_lineno = xfrm[node.end_lineno]
	except AttributeError: pass # Not all nodes have line number information
module = compile(module, "linesdemo.py", "exec")
exec(module)

Move the #line directive around to test things out.

Yes, and IMO that’s the main use case - the original #line directive in C is mainly so that a preprocessor can do things like #include and #define, but still allow the main compiler to reconstruct the original file and line information. In Python, this is far less significant, but there’ll still be places where it’s useful. I can imagine, for instance, that this could be used by learning platforms that might need to insert some code at the top of a file; they can insert something like this:

#line 1000000
import test_harness
test_harness.initialize()
blah, blah, blah
#line 1

and all your tracebacks would look correct.

Note that the AST module already has some tools that can achieve a similar job to this (eg ast — Abstract Syntax Trees — Python 3.12.0 documentation), so this could well end up becoming a feature of the AST module, rather than strictly being part of the language specification.

Other prior art,

As this is my first post on this forum, I couldn’t post more than two links, but indeed, back to the roots and the C precompiler has this for a long time :slight_smile:

the original #line directive in C is mainly so that a preprocessor can do things like #include and #define, but still allow the main compiler to reconstruct the original file and line information.

This is also useful for things like step debugging something that was generated from a source file which is not the actual code, like https://github.com/fsprojects/FsLexYacc/blob/67bddf6bd8215ae9fffb67b462c5bd2b58c832d4/src/FsYacc.Core/fsyaccpars.fs#L300-L316 which is generated from a grammar.

it could even be done with actual #line statements

To be fully compatible, I’d imagine it should be a stateful thing between the interpreter host, and the interpreter, otherwise it could mess the backtraces if anyone has put #line comment, in the more general sense.

Note that the AST module already has some tools …

Interesting, I’m wondering if what is the right approach so the REPL can expose the few hooks necessary, but if already the AST as the infrastructure to tag the code to a given location.

What concerns me is that the API you show seems to be using relative line index, I assume it involves keeping track of more stuff for the consumer in many of the use cases, maybe a set_lineno member would be the practical to have in the AST API?

I’m afraid I don’t understand the use case for this. If I’m copying out code to a REPL for testing/debugging, it should ordinarily only be a few lines; and I will have just done the copy and paste when I run it - so the window for the code file should already be open to the point whence I copied the code, and it should be immediately clear where it came from. No?

Yeah. The one I linked to isn’t really what you want here, which is why I handrolled a thing that updates all the line numbers manually. But this could absolutely be done in a less ad-hoc way if it’s wanted.

Cool thing is, though, even if this DOESN’T become part of the language or stdlib, it can still be done manually.

Sometimes it’s not just a few lines. When I run py-execute-region on a code selection in Emacs, it can be 100 lines just as easily as 10. A meaningful line number in the error message would be very useful, and #line could provide that simply.

1 Like