PEP 750: Tag Strings For Writing Domain-Specific Languages

barry · August 13, 2024, 5:42pm

From the flufl.i18n vaults, and referring back to the _.defer_translation() context manager mentioned above, Mailman has to lazily translate some messages in several cases. We still need to wrap the source string in _() so the off-line extractor will discover it, but in some cases, the actual translation can’t be done at the point of definition.

My memory is a little bit hazy at this point, but let’s see if I can reboot enough context. One useful code example is here.

This is a rule which attempts to determine whether a message sent to a mailing list is “administrivia”, i.e. an email command erroneously sent to the list. If detected, the rule sets up some data structures so that the “moderation reason” will say “Message contains administrivia” and that string is marked with _() for extraction and translation.

However, we can’t actually translate the string at this call site because we don’t know what language to translate it in, and in fact, it may be translated to several different languages. Imagine a list moderated by three people who prefer to get notifications in their non-English native language. When Mailman sends the notification message to each of these people, it will translate the source string at the point the notification emails are composed. So it will dynamically match each moderator with their preferred language, compose separate emails to each in their native language, and do the source string translation at that point.

Edit: I should mention that interpolation can still also be done lazily, but this isn’t done automatically. When needed, enough of the variable scope and context is captured so that the interpolation can happen correctly at the point of the dynamic translation.

mdrissi · August 13, 2024, 5:42pm

One concrete example of a library that does do lambda: style for laziness is tensorflow. The main apis that commonly use it are 1 and 2. At same tensorflow 2 added autograph mode that would do source to source rewrite of code before executing it and convert more natural eager style looking code to lazy lambda style. Specifically tensorflow when it sees something like (inside tf.function),

if x:
  return foo(x)
else:
  return bar(x)

it gets converted by autograph to something more like,

tf.cond(x, lambda: foo(x), lambda: bar(x))

So I guess it’s more TF1, the second style was normal, but in TF2 they try to hide laziness from the user.

Part of laziness here is that tensorflow wants to construct a “execution graph” for later usage. You can very roughly think of it like making prepared sql statement where input values are fed later repeatedly.

I dug through my internal codebase and lambda: usage for laziness was pretty rare outside of few tensorflow cases like this.

pf_moore · August 13, 2024, 6:10pm

Yes, I recall that discussion now. The problem (in my view) was that internal details of the callee (“do I want this value right now, or do I want to save it to be calculated later?”) were leaked into the API design (you need to pass a value or lambda: value), meaning that the choice has to be made up front, and changing it later is a breaking change.

Lazy evaluation avoids this by keeping the caller interface the same while allowing the callee to change the implementation later.

… at least that’s my perception of the reasons why lambda doesn’t work as a workaround for the lack of lazy evaluation.

MegaIng · August 13, 2024, 6:59pm

I guess this is the central difference in our positions: My answer to this is
“Ofcourse it’s possible”. Modify the function object to see different variables by overwriting the closure and/or globals. And since this PEP is, according to it’s title, designed to allow DSL, this is IMO a perfectly acceptable strategy.
I am not viewing the scoping behaviour of python as a bound, I am viewing it as a jumping off point to implement whatever behaviour I actually want.

But I guess if we don’t want to encourage such modifications, we could instead provided a more complex desugaring that is better for templating, and also does early binding:

The expression in tag"{a + b}" could be turned into lambda *, a=a, b=b: a + b, turning all variables used into arguments.

This would

allow easy template instantiation while overwriting (some) variables to be picked based on where it has to be used.
It would make [t"{x}" for x in range(10)] capture the value of x in each, potentially being closer to what beginners would expect.

Yes, this would be completly new behavior and “scoping” not matched by anything else in python. IMO, this is a benefit: It actually allows something new instead of just being minor syntactic sugar for tag("a", b, "c").

Or with other words: If this PEP switches over to eager evaluation, it’s IMO so pointless as to not be worth it, and it would take away syntactic potential for future proposals, so then I would be strongly opposed to it.

steve.dower · August 13, 2024, 7:07pm

This feels like a pretty drastic position to me. The PEP is incredibly useful even if expressions are evaluated eagerly.^[1]

I’d love to hear more about your intended use cases that rely on lazy evaluation but don’t benefit from DSL variable interpolation. It seems that’s a gap in the discussion so far.

I do like this transform - at one point I (weakly) proposed a shorthand syntax for lambdas that basically did this (IIRC, it was something like $a + b becoming the lambda you showed.)

I’m not opposed at all to someone proposing a lazy evaluation feature on its own, which would then be usable here. But I also think the potential for DSL parameter substitutions in PEP 750 is so strong that we shouldn’t get hung up on the lazy evaluation aspect (and clearly I think that sneaking lazy evaluation in via a tagged strings PEP is not going to be good for the language, its implementations, and its users).

Okay, I’m done with this topic for a while. Everyone else, go for it

With exceptions deferred, so a NameError isn’t raised until the tag function tries to use the real value rather than just the expression. ↩︎

brettcannon · August 13, 2024, 7:10pm

You can use JupyterLite to test out the implementation to get an answer to your question.

jimbaker · August 14, 2024, 12:08am

The full, original source string can be recovered, with the exception of some corner cases, which would seem unlikely to be used. (But it’s possible to support all of these corner cases by changing this part of the PEP specification to not be rejected.)

The two cases for Interpolation are to show how this aspect of matching can be generally useful when processing tag function args, although not actually needed here:

from typing import Decoded, Interpolation

def original(*args: Decoded | Interpolation):
    result = []
    for arg in args:
        match arg:
            case Interpolation(format_spec=None, conv=None) as i:
                result.append(f'{{{i.expr}}}')
            case Interpolation() as i:
                result.append(f'{{{i.expr}')
                if i.format_spec:
                    result.append(f':{i.format_spec}')
                if i.conv:
                    result.append('!{i.conv}')
                result.append('}}')
            case Decoded() as d:
                result.append(d.raw)
    return ''.join(result)

x = original"It's possible to recover {simple} expressions, {more * complicated:02.d!r} expressions, but not fully {debug=} expressions, and named Unicode \\N{{GRINNING FACE}}"
print(x)

jimbaker · August 14, 2024, 12:30am

Most use cases would be supported by eager evaluation. The most important of these is being able to support a tag function’s ability to parse the tag string (generally into an AST), and then apply the context from that AST to producing correctly escaped (or passed through) interpolations for the target DSL. I would generally descibe this process as 1) substitute appropriate placeholders into the tag string; 2) use an off the shelf parser to an AST; 3) walk/compile the AST to produce desired output (such as a DOM). Whether eager or deferred doesn’t change this process, nor opportunities for memoization.

Deferred evaluation can support at least the following:

Globally-scoped names being bound. This could be useful when these globally-scoped names are actually bound by a tool like numexpr, formulaic, Pandas, etc to an application specific context like a column in a Pandas dataframe. The counterargument is globally scoped names.
Straightforward deferred evaluation - everything is wrapped by a lambda, so traversing the expression tree is just a matter of using getvalue() and doing one “unwrap”. Obviously, sophisticated libraries can readily implement this sort of support to get the equivalent of changing the order of evaluation to match when it is actually needed, such as when writing an HTTP response. But then how do these libraries compose? getvalue provides a convenient protocol.
Lazy usage, such as the struct_log tag function I mentioned, without requiring every expression to be wrapped by the developer (not likely in logging).
Very interesting metaprogramming, as I showed in the rewrite example. These are fun experiments, but I’m not convinced of their utility.

ncoghlan · August 14, 2024, 1:11am

Steve Dower:

The fully worked example with the [lazily evaluated] semantics that wouldn’t work under [eagerly evaluated] semantics is this:
stmt = sql"select field + {offset} from tbl"
for offset in list_of_offsets:
    db.exec(stmt)
Anything involving passing offset explicitly is still possible.

Considering the case of eager evaluation by default, I could easily see custom templates adopting the following convention: lazily evaluated fields put “()” immediately after the substitution field so the template function knows to call the value rather than using it as-is.

stmt = sql"select field + {lambda: offset}() from tbl"
for offset in list_of_offsets:
    db.exec(stmt)

(such a convention would also be possible with PEP 501)

No implicit thunking, otherwise it would be difficult to pass in existing callables for lazy execution.

This level of flexibility in representation is one of the major reasons I prefer passing a first class structured object to template rendering functions.

However, I think a further nice way to bring the PEPs closer together would be to give template iteration in PEP 501 similar semantics to the argument generation in PEP 750. That way the authors of template rendering functions could freely choose between implementing them using PEP 501’s callback style (when each field is processed in isolation) or PEP 750’s pattern matching style (when more context information is needed during field evaluation).

zuo · August 14, 2024, 1:47am

Maybe a silly idea, but what about making deferred evaluation an opt-in with an explicit syntax, e.g. by prepending $ to the replacement field’s opening brace?

# Here `top` is evaluated eagerly, but evaluation
# of `middle` and `bottom` is deferred.
markup"... {top} ... ${middle} ... ${bottom} ..."

Desugaring would be lambda-like, except that:

obviously, annotation scope mentioned in the PEP would be used;
early binding as suggested above by @MegaIng (lambda bottom=bottom: ...-like) could also be used.

Why the $ character? I just believe that $ as a marker is visible enough. Also, the use of $ (and ${...}) for templating-like purposes has a long tradition.

[PS] Of course, if $ was used that way, there would be a need for some escape sequences…

Perhaps:

$${{ → literal ${
$${ → literal $ followed by a replacement field
1 or more $ if not followed by { → just those literal $
${{ → syntax error

Alternatively:

just $$ → literal $

barry · August 14, 2024, 3:02am

Can it not be captured and preserved, rather than having to recover it?

ncoghlan · August 14, 2024, 3:54am

I’ve started accumulating the ideas this thread is giving me for PEP 501 in an issue on the PEPs repo: PEP 501: improvements inspired by PEP 750's tagged strings · Issue #3904 · python/peps · GitHub

Some of the ideas there are also potentially applicable to PEP 750:

rather than passing the exploded list of template components as the template function arguments, instead pass a single iterable TemplateLiteral object (so for arg in args: in rendering functions instead becomes for segment in template:). This also makes it straightforward to handle Barry’s request to provide access to the entire input string for i18n catalog lookup.
TemplateText as a potential alternative name for the Decoded protocol
TemplateField as a potential alternative name for the Interpolation protocol
switch to eager evaluation by default, but allow {-> expr} to indicate lazy fields (with the same meaning as {lambda: expr})
note that template renderers can accept () as a field specifier (as in {expr:()}) to indicate that the template field is a callable that should be called at rendering time
note that passing strings as template field values provides a way to “template templates”, with the field values naming parameters to be used for later dynamic substitution via a method call (akin to str.format and str.format_map, but without having to reparse the formatting template on every invocation)

Edit: I realised that {expr:()} would be a much better convention than {expr}() for calling functions at rendering time (since it keeps the info as part of the TemplateField object rather than relegating it to the following TemplateText object)

pauleveritt · August 14, 2024, 10:47am

We could likely show a system which captured original parts and made them joinable.

devdanzin · August 14, 2024, 10:55am

Using the idea from @zware:

Could the following be a simpler way to get deferred and eager fields? The trick being a call to __str__ (which could be a render method that can return arbitrary objects instead). Of course, the markers for deferred and keepexpr are only placeholders to demonstrate the idea. (I’m probably missing something or made some mistake, if so sorry about the noise)

class DeferredEagerRender:
    """Handle deferred, literal and eager fields in tag strings."""

    def __init__(self, *args):
        """Process eager and literal values at tag call."""
        _args = []
        for arg in args:
            if hasattr(arg, "conv") and arg.conv == "KEEPEXPR":
                _args.append(arg.expr)
            elif hasattr(arg, "getvalue") and arg.conv != "DEFERRED":
                _args.append(arg.getvalue())
            else:
                _args.append(arg)
        self.args = _args
        self._str = ""

    def __str__(self):
        """Process deferred values at render time."""
        output = []
        for arg in self.args:
            if hasattr(arg, "getvalue") and arg.conv == "DEFERRED":
                value = arg.getvalue()
            else:
                value = arg
            output.append(str(value))
        return "".join(output)


defeager = DeferredEagerRender

eager = "How are you right now?"  # This should be eagerly evaluated.
greeting = defeager"Hello {deferred_name:DEFERRED}! {eager}"
deferred_name = "Mr. Render Me Later"

eager = "Welcome!"  # This shouldn't be included.
print(str(greeting))
# Hello Mr. Render Me Later! How are you right now?

assert str(greeting) == "Hello Mr. Render Me Later! How are you right now?"

print(str(defeager"""1 + 2 == {1 + 2}; 1 + 2 == {1 + 2:KEEPEXPR}"""))
# 1 + 2 == 3; 1 + 2 == 1 + 2

print(str(defeager"""{eager} {eager + '!!!':KEEPEXPR} == {eager + '!!!'}"""))
# Welcome! eager + '!!!' == Welcome!!!!

It would also work for the SQL example:

list_of_offsets = range(5)
stmt = defeager"select field + {offset:DEFERRED} from tbl"
for offset in list_of_offsets:
    print(str(stmt))

# select field + 0 from tbl
# select field + 1 from tbl
# select field + 2 from tbl
# select field + 3 from tbl
# select field + 4 from tbl

Is there any value in this?
Edit: simplified, corrected code.

Nietanod · August 14, 2024, 4:15pm

Actualy, i think that tagged strings should be used like this:

@tag("translate") # This decorator register the object as a tag
class TranslateString:

    def __init__(self, string, *args):
        self.value = string.format(*args)
        self.text = string
        self.args = args

    def __str__(self): # For display on print
        # This is an example of what kind of processing can be done
        # `translate` is not actualy a real function (yet)
        value = translate(self.value, "english", "french") 

        return self.value
    
    def __repr__(self): # For raw value
        return self.text + str(self.args)

# Usage of the tag
name = "John"
greet = translate"Hello {name}"

print(greet) # This output "Bonjour John"
repr(greet) # This output "Hello{} ('John')"

It is (in my opinion) one of the purpose of tagged strings.

I don’t realy like the idea of tags created implicitly, and a way to register them is essential (I realy like decorators, they are so elegants).

ronaldoussoren · August 14, 2024, 5:18pm

The optional lazy evaluation of interpolations is a bit worrisome because it smells a bit like lambda expressions using free variables (Programming FAQ — Python 3.12.5 documentation). As such it should be noted in the “how to teach this” section as a gotcha that needs to be explained (and documented by tag functions).

E.g. the following code might have a surprising result depending on whether or not the tag function does lazy evaluation of interpolations:


items = []
for value in some_sequence:
    items.append(tag"value is {value}")
print(items)

Nietanod · August 14, 2024, 5:57pm

I think that lazy evaluation is dangerous, or can lead to some misunderstanding, especialy when working on a loop.

The best way is alway the explicit, so the best (in my opinion) is to create copy of all the elements that can change.
Also, strings are not mutables, so why tagged strings should be mutables ?
The only purpose of tagged strings must be only a special display or a special creation.

ronaldoussoren · August 14, 2024, 7:33pm

There are good use cases for having lazy evaluation though, for example in logging where calculating the value can cause measurable overhead that’s wasteful when the value is not use (e.g. log.debug(f"big number is {2 ** 100 * 100}")).

Donatien Vachette:

ronaldoussoren:
items = []
for value in some_sequence:
    items.append(tag"value is {value}")
print(items)
The best way is alway the explicit, so the best (in my opinion) is to create copy of all the elements that can change.
Also, strings are not mutables, so why tagged strings should be mutables ?
The only purpose of tagged strings must be only a special display or a special creation.

Copying can be expensive, and would give different semantics than in the rest of the language. It is also far from clear what should be copied in the first place when the interpolation expression is more complex.

It is ~~turtles~~ tradeoffs all the way down . I haven’t formed an opinion yet on whether or not having lazy evaluation here is a good thing when all things are considered.

mikeshardmind · August 14, 2024, 8:28pm

From largest to least important detraction (subjective personal opinion):

I don’t see a good reason for laziness presented here and I think it would be a stronger proposal if laziness weren’t included in any way or if a lazy tag string was syntactically different. Without a kind of obvious separation, any time a developer sees a tag string, they need to go read the documentation for it to know if it’s going to behave how they expect or not. This will be true not just of the author but of reviewers and future maintainers as well.
I definitely don’t like the current part of the proposal where any function can act as a tag either, this is going to lead to local namespace pollution.
I don’t particularly like the syntax, I’d prefer something like t:[function name]"..."
The examples here look worse to me than existing solutions for the problems posed. In terms of DSLs specifically, I think PEP638 made a better case and proposed a better toolset.

jimbaker · August 14, 2024, 9:49pm

Thanks for the comments on our proposal. I’m going to respond specifically to comments by @erictraut , @godlygeek , @barry , and @ncoghlan , although I’m sure I’m missing some of this discussion, and see how we can improve this proposed PEP:

Tag functions are called with a TemplateConcrete object, which is builtin, with a runtime-checkable protocol Template provided from typing. Template provides args (as now) and source (original string passed into the tag string) as attributes. One nice thing about the source attribute is that it can be used as a memoization key, and of course requires no additional computation. Hopefully it can also be used for i18n purposes.
Tag functions can be decorated with @tag_evaluation('eager' | 'choice' | 'deferred'); use Literal for these choices; this decorator is available from typing. This decorator sets a dunder method, __tag_evaluation__. eager is the usual evaluation order, and the tag function guarantees it does this upon entry. 'choice' means that the tag function chooses which interpolations to evaluate, if any, in any order, before returning, but it does not retain the interpolation after it returns. deferred allows for the interpolations to be seen outside the scope of the tag function. Hopefully this decorator can support typing needs.
No short prefix names, however if dotted names are added, there’s no such restriction for a dotted name. So lazy.f is a valid tag name.
It might be desirable to have a t prefix, which simply returns a Template. I assume this will have eager evaluation semantics. (There are additional aspects discussed, but these seem to require additional enhancements to the interpolation syntax; that can be discussed separately.)