Custom string literal tinkering?

Rosuav · July 16, 2025, 11:05pm

Calling on people who have more experience writing import hooks than I do!

Suppose I wanted to invent a new type of string escape. We’ve had various proposals; to avoid tying this to any specific one, I’m going to invent a new one for the sake of argument, since that isn’t the point here. Is it possible to set up an import hook that can affect how string literals get parsed?

I’m thinking of something like this:

# app.py
import setup_import_hook
import main_program

# main_program.py
print("Adorned letters: \[a']\[e`]\[c,]") # "áèç"
print("It's now 11\[oo]C") # 11°C
print("We accept these currencies: \[Y=]$\[L=]") # ¥$£

The hook would need access to the unparsed string literal. Assume that the end of the string is defined by standard rules (so you can’t make "this has " a quote" into a single string literal), and it would be REALLY awesome if this could invent new string prefixes too eg z"abc" but if not, so be it.

This would make it much easier to tinker with proposals and then discuss them. I had a bit of a shot at it but all I could do was to completely redo the source code parsing manually, without taking advantage of anything else.

(EDIT: Turns out “S=” is not a compose key sequence for “$”; the others work though.)

kailando · July 17, 2025, 2:41pm

In short, no you cannot, directly at least.
Strings get preparsed, meaning that they get simplified before runtime, and before they hit your hook.
But, if you were to make a preprocessor that replaces the desired strings and feed in main_program.py, then treat the output file as main_program, then that would output the desired result.

alwaysmpe · July 17, 2025, 5:08pm

I’ve never done it myself, but I think you could set up your own importer that parses the source into its AST, then traverses each imported AST and modifies any Constant nodes containing str and finally compile the modified AST into a code object for running. The builtin AST parser is slightly limited, I sometimes use astroid instead depending on what I’m doing.

To implement the importer you can probably rely on the builtin finder but a customized loader. Then register it in sys as shown in the above linked page. Advice I can give is limited, but that hopefully should point you in the right direction. What you want isn’t trivial.

Rosuav · July 17, 2025, 6:19pm

By the time they’ve become AST nodes, it’s too late for this. That’s why I’m hoping to hook into different aspects of parsing.

Hmm, interesting. I think it has the same limitation though - string literals have become parsed string nodes by the time you see them.

So far, my best guess is to do this with tokenize and then reconstitute it afterwards, but I’m not certain that this will maintain correct line numbers etc. I’d really like to do it at some intermediate level (maybe the concrete syntax tree??) if anyone has experience doing that.

alwaysmpe · July 17, 2025, 6:36pm

IIRC astroid nodes keep the associated source line accessible, so you could reparse that without needing to parse the whole file

alwaysmpe · July 17, 2025, 6:41pm

Yes, there’s an “as_string” method in astroid which gets the source. Might get you what you want.

Stefan2 · July 17, 2025, 8:44pm

Maybe Python’s Preprocessor could be of help.

Rosuav · July 18, 2025, 12:09pm

Cool, thanks! Just looked into this, and while it does look interesting, it unfortunately is only able to parse valid Python code, and thus can’t be used to extend the syntax. So it’s back to tokenizing instead.

Here’s what I’ve come up with. The goal here is that anyone who has a syntax proposal can take a copy of this, and - without writing any C code or messing with the interpreter itself - test out the changes, being able to actually run the tweaked Python.

The included example run.py will use Compose key sequences as character escapes, per the example above.

BrenBarn · July 18, 2025, 7:51pm

You might be able to do it with a custom encoding declaration. I seem to recall some people used this to do wacky syntax experiments in the past. It requires more setup than just an import though. Here’s a recent blog post that I found when looking it up: Change Python's syntax with the "# coding:" trick