Custom string tag syntax

So I have gathered that PEP 750: Tag Strings For Writing Domain-Specific Languages - #313 by dg-pb abandoned custom tag syntax.

So I am thinking whether it would be useful to introduce it independently from string formatting.

E.g. In JupyterLite it was made to behave as follows:

def greet(*args):
    print(args)

greet"Hello ?? {a:1}"    # (DecodedConcrete('Hello ?? '), InterpolationConcrete(...))

What about making it generic?

def greet(string):
    print(string)

greet"Hello ?? {a:1}"    # "Hello ?? {a:1}"

Why would this be useful?

Essentially, this would provide a playground for users to implement various conveniences. Being able to do this, might provide a motivation for experimentation with ideas that would otherwise not happen due to inability to have attractive syntax off the shelf.

2 use-cases that come to mind:

  1. String formatting

Now, instead of receiving preprocessed interpolation, it can be implemented within (and user can implement whatever he likes):

def greet(string):
    args = prepare_interpolation(string)
    salutation = args[0].upper()
    return f"{salutation}"

greet"Hello"    # "HELLO"
  1. Conveniences for string-code.
    E.g.:
    a) subinterpreters
    b) Backquotes for deferred expression - #114 by Paddy3118

For a simple eval:

def e(string):
    return eval(string)

e"1 + 1"

Also, it could be used to implement conveniences for deferred evaluation without needing to make any changes to Python until it is clear that something specialised and hard coded/more performant is required.

E.g.

def d(string):
    # maybe sys._getframe
    return DeferedExpr(eval(f'lambda: {string}'))

a = 1
deferred = d"a + 1"

So this would potentially cover a certain chunk of various syntax change proposals for the sake of convenience.

Of course, API could be more robust than a simple function so that it wouldn’t take space in namespace.
In the same way that f"" does not shadow variable f.

E.g. by some sort of more sophisticated registry process.

Given PEP750 explored this, thought it might be worth suggesting this while concept is still fresh.

Does this mean f needs to become a one character builtin function that accepts a single string?

If f"" is handled specially, what if you actually have a function f that you want to use for string formatting?

The PEP mentions f-strings, but there are 8 prefixes currently, I believe: b, f, r, u, fr, rf, br, rb.

What if you want to have keyword argument called from? Maybe the answer to that is just “no”. :slight_smile:

Well the point of this syntax is to save you a pair of parentheses. Which means you would really want to name your prefix function in the shortest form.

With that said, you’ve already listed a bunch of hardcoded prefixes that cannot be overridden. Plus some really common characters people use as a habit (e.g. i, j, k for iteration, x, y, z for numerical calculation). I wonder how much namespace is left for you to safely use…

d = lambda s: f"hello {s}" 

d"123" # ok

for d in range(10):
  ...

d"456" # Boom!

OP already said something about this. Maybe this is doable.

Another challenge is how can you make IDE and linter aware of what’s been registered? Can they jump to the correct function?

None of these could be done using a string prefix function. (Also, side note, you can use uppercase letters; rB"\x" is a perfectly valid two-byte literal.)

I’m strongly against using syntax for something that could just as easily be a function. F-strings can’t be, because they need to be able to embed executable code. R-strings change the way that the source code for the string is converted into the string object (r"\u" is valid, but "\u" isn’t, since there’s no Unicode codepoint after it; in theory you could write a function that handles all non-raw string literals, but they don’t have prefixes, so that’d be a nightmare). B-strings result in a completely different data type. The only one that could be implemented as a function is U, and only because it doesn’t do anything.

So what’s the advantage of these prefixes? With PEP 750, the advantage is that it’s a single new type of string that gives an unevaluated template and interpolatable values. (PEP 501 had a similar goal, and was thus withdrawn in favour of 750.) In theory, you could define F-strings as “T-strings with a join()”, but there wasn’t a lot of support for T-strings back in 2015 (note that PEP 498 and PEP 501 were created a mere week apart, and only one of them got traction at the time).

Let’s go through these use-cases.

Trivial example, not really a use-case, wasn’t meant to be one. Moving on.

I’ve no idea what’s going on here, but greet(t"Hello") should be able to achieve that. There’s nothing interpolated though so I’m at a loss as to what prepare_interpolation is supposed to be doing.

Leaving aside what a terrible idea it is to have syntax for eval, this… isn’t buying you anything above just e("1 + 1"). Why not just pass a string to a function? What’s so hard about writing a couple of parentheses?

This goes the opposite direction to PEP 750, instead looking for the raw text of the embedded expression. It’s again unable to do anything that d("a + 1") can’t do. You have to eval to create a lambda function. If you did those last two lines inside another scope, it would not work. (No, sys._getframe won’t save you here and I can prove that if you give me the code that you think would actually work.)

The existing string prefixes, and the one proposed in PEP 750, actually provide benefits that you can’t get any other way [1]. Trying to go too generic makes the proposal less useful and, in fact, almost certainly not beneficial.


  1. other than u"…" but that’s for Py2/3 compat ↩︎

2 Likes

As I said, it is purely for convenience. Everything what is suggested here can be achieved using func("string").

However, I don’t see how this is so different from say f-strings.

E.g. using this functionality, one could theoretically implement f-strings:

def f(string):
    ns = sys._getframe(1)
    var_map = {**ns.f_globals, **ns.f_locals}
    return process_f_string(string, var_map)

Of course I am sure this is not a good way to do it.
It has many issues (such as one would need to iterate through the whole stack of frames and not only merge globals with locals), that are not present in the way it is currently done. An I am sure it leaves out a lot of stuff.

But this is just for the sake of the argument that f/t-strings do not necessarily need special syntax either. I would guess it could be hacked into this at some significant cost?

So again, no need to see this more than it is - convenience for saving 2 characters. And possibility to configure one’s own IDE for custom highlighting.

Unless there are ideas that would make this something more than that…

But if there aren’t, I am negative on this myself now.

Counterpoint:

def outer():
    x = 1
    def inner():
        return f"{x}"
    return inner

outer()() # returns "1"

class Demo:
    x = 2
    attr = f"{x}"

Demo.attr # is "2"

Try implementing these with your function.

F-strings are more than this, since they actually evaluate in the scope they’re in. What you’re proposing really is just a convenience for saving two characters.

You may wish to read up some of the discussion around f-strings.

Fair enough, first example can’t be done.

However, I don’t see any issues with Demo class. Works for me.

But anyways, point taken - can’t be done. Thanks.

Interesting. Show me your code?

I just used the function as I have written above. Am I missing something here?

import sys


def f(string):
    frame = sys._getframe(1)
    var_map = {**frame.f_globals, **frame.f_locals}
    return string.replace('{x}', str(var_map['x']))


class Demo:
    x = 2
    attr = f("{x}")


print(Demo.attr)    # 2

Ah, my bad, it needs another layer. Which ends up being broadly the same as the other example.

>>> def spam():
...     x = 3
...     class Demo:
...         print(f"{x=}")
...         print(f("{x}"))
...         
>>> spam()
x=3
Traceback (most recent call last):
  File "<python-input-25>", line 1, in <module>
    spam()
    ~~~~^^
  File "<python-input-24>", line 3, in spam
    class Demo:
        print(f"{x=}")
        print(f("{x}"))
  File "<python-input-24>", line 5, in Demo
    print(f("{x}"))
          ~^^^^^^^
  File "<python-input-21>", line 4, in f
    return string.replace('{x}', str(var_map['x']))
                                     ~~~~~~~^^^^^
KeyError: 'x'
>>> 

In any case, reaching into stack frames is not just CPython-specific, it’s also incomplete. You can’t cheat with that and expect to get all the names available to you. It simply doesn’t work.

The question I am compelled to ask here is:

What functionality would what you’re proposing allow, that can’t be done with PEP 750 tag-strings?

How so?

import sys
from collections import ChainMap


def f(string):
    frame = tmp = sys._getframe(1)
    stack = [frame.f_locals]
    while (tmp := tmp.f_back):
        stack.append(tmp.f_locals)
    stack.append(frame.f_globals)
    stack.append(frame.f_builtins)
    ns = ChainMap(*stack)
    return string.replace('{x}', str(ns['x']))


def spam():
    x = 3
    class Demo:
        print(f("{x}"))

spam()    # 3