PEP 750: Tag Strings For Writing Domain-Specific Languages

pauleveritt · October 17, 2024, 9:33pm

Major thanks to @dkp who was the primary re-writer for the PEP update and joined @jimbaker at the core dev sprint to hash things out.

steve.dower · October 17, 2024, 11:01pm

Thanks for the update! I’m writing comments as I read, so ~~if I haven’t changed this text it’s because I forgot (or I hit “Reply” early) and there’ll be a summary at the end.~~ here’s my summary: I’m happy with basically everything, have some comments about where types should go, and am pretty sure there’s a better way to handle the interleaved args but I’m not totally convinced which way to go with it.

The Template Type

I expect this type will be internal to the interpreter (in CPython’s case, a native type), and so putting an isinstance-able version in types is fine. Don’t define it in terms of @dataclass though, even with the caveat. We can’t implement it in terms of that, so better to specify it directly.

Of course, if Template is going to be directly instantiable, we’re (a) going to have to be okay with the added overhead and (b) it probably doesn’t live in types anymore, just because we shouldn’t have to import types (even implicitly) in order to use a t-string. But if type(t"") is not Template is acceptable (assuming isinstance(t"", Template) is still True), then I guess the types definition can be a duck-type equivalent.

I much prefer always using Interpolation and not alternating with str. Interested what others think about that, but as a consumer I’d prefer to not have to type check - either I can assume that .value is meaningful, or str() is some reasonable default behaviour.

Alternatively, if they always come in (literal, interpolation) pairs, where literal may be an empty string and interpolation may be None/falsey. Then I can for s, v in t.args:, rather than type checking or coming up with some kind of alternating iteration.

(Side thought: if Template.__str__ applies normal formatting rules to each interpolation, then in many cases t-strings and f-strings could be interchangeable…)

Concatenation

Why not? It’s just concatenating .args isn’t it?

The debug specifier (=)

I’m not convinced about this. Perhaps we should just forbid it here? Or maybe it needs to move into the grammar in a way that lets us pass it through as a conversion flag or its own flag?

Interleaving …

Ah, structural pattern matching gets a mention. If this was earlier, I’d have been less concerned about the interleaved approach. It feels a bit clunky? I wonder if there’s a design more geared towards match that would feel smoother?

In short, my feeling here is that designing specifically for “you should use match” is fine, and so is “you should not use match”, and we’re in a weird kind of middle ground right now where neither approach feels great. (But maybe others will come in and say the match approach does feel great, and it’s just me, which is totally likely! In this case, please put it as the first example rather than the last one.)

Jelle · October 17, 2024, 11:14pm

I’m not sure I see the issue here. The types module could expose the type, and the type could set its __module__ to types, even if “really” it is an interpreter-internal type. That’s similar to how typing.TypeAliasType now works.

effigies · October 18, 2024, 3:37am

This looks really great! Only thoughts:

The interleaving feels unsatisfying. If you want to provide easy access to the static portions of args, I would do it via a property. Right now you can use an alternating sequence, but you could also use index sequences:
```
class Template:
    args: list[str | Interpolation]
    _static_indices: tuple[int]
    _dynamic_indices: tuple[int]

    @property
    def static(self) -> tuple[str]:
        return tuple(args[i] for i in self._static_indices)

    @property
    def dynamic(self) -> tuple[Interpolation]:
        return tuple(args[i] for i in self._dynamic_indices)
```
The point isn’t to bikeshed this API or implementation detail, but that it feels like interleaving ought to be left as an implementation detail. By baking it into the PEP, you’ll make it very difficult ever to make a different choice. If this use case is important enough to structure the type around, then it seems worth making an API and not a trick of ordering (regardless of whether that trick is used internally).
Concatenation doesn’t seem so difficult that it’s better to make this type of string unlike all the other types. It might take a bit of care, but it seems worth it to make template strings behave as you’d expect. I’d expect them to be viral, so a template string added to any other string ends up as a template string.
I haven’t thought through a full use case, but I don’t see why bytes strings couldn’t be constructed in this way, e.g., FileWriter(tb"{magic}{header:\x00<{padding}}{blob}"). Because you’re not relying on obj.__format__(), the PEP 498 reasoning no longer applies. A Template[bytes] could have args: Sequence[bytes | Interpolation]. That said, I understand that you’ve got a thing that you’re trying to do, and that may be a step too far. IDK if it’s worth mentioning it as out-of-scope.

ncoghlan · October 18, 2024, 4:55am

As my own feedback:

I really like this iteration of the proposal (and I expect we’ll be withdrawing PEP 501 in favour of this, since the remaining differences have solid reasons behind them that weigh in PEP 750’s favour)
I’d like to see a discussion in the Rejected Ideas section about eager evaluation of conversion specifiers (the topic was explicitly considered and I think the conclusion to keep the lazy evaluation is reasonable, it just didn’t get added to the PEP itself)
I’d like to see Interpolation offer a couple of formatting helper methods to improve the ergonomics of allowing lazy conversion when most template processing won’t need to customise it:
- f.convert_value(): apply the conversion specifier (if any) to the field value
- f.format_value(): equivalent to `format(f.convert_value(), f.format_spec)

Considering other feedback:

We should be able to have a common implementation level “unsafe” template constructor API that relies on the caller to ensure that the input sequence is correctly normalised that both the eval loop and the types.Template.__new__ Python API would call.

The eval loop would assume that the compiler hasn’t messed up the arg sequence, while types.Template.__new__ would actually do the required normalisation pass to merge adjacent string segments and insert additional empty strings as required.

That was my initial reaction too (either having a prefix string field on interpolations or having 2-tuples), but I found the cache_key = template.args[::2] example genuinely compelling, as none of the other options offer that same ability to easily say “give me just the string parts”, and the 2-tuple variant also doesn’t even allow you to easily say “give me just the interpolation fields”.

In addition to the memoization example in the PEP, the interleaving approach makes things like switching to a different placeholder relative straightforward:

def prepare_query(template, *, placeholder="?"):
    query_text = template.args[::2]
    if any(placeholder in text for text in query_text):
        msg = f"Cannot use {placeholder!r} in query template text"
        raise ValueError(msg)
    prepared_query = '?'.join(query_text)
    template_values = [f.value for f in template.args[1::2]]
    return prepared_query, template_values

And if we do want the pairwise variation, itertools.zip_longest can provide it:

segments = iter(template.args):
for prefix, field in zip_longest(segments, segments):
    ... # Do something with the text prefix
    if field is not None:
        ... # Do something with interpolation field

So yeah, I found the interleaving idea to be superficially off-putting, but it ended up feeling genuinely elegant once I started playing with the possibilities it offers.

nhumrich · October 18, 2024, 5:20am

I love the new PEP, but feel like the lack of concatenation is a mistake, especially the explicit concatenation.
I means that you can’t effectively break a single t-string into multiple lines.
You could always move it to a multi-line string, but that’s not the same as it includes newline characters, and also because it breaks how people are used to operating on strings.

What was the thought behind why templates can’t support concatenation?

cdce8p · October 18, 2024, 2:29pm

I’d also appreciate if t-strings would support concatenation. It’s quite common to split a simple string over multiple lines without using triple quotes. E.g.

s = (
    "This is some long "
    "comment"
)

The PEP explicitly mentions that [...] empty strings are added to the sequence when the template begins or ends with an interpolation, [...]. With that in mind, wouldn’t the implicit concatenation of t-strings just concatenate normal strings?

template = t"Hello " "World"
assert template.args == ["Hello World"]

# --
name = "World"
template2 = t"Hello {name}!" " Some more text"
assert template2.args == ["Hello", Interpolation(value="World"), " Some more text"]

pf_moore · October 18, 2024, 2:55pm

Looks good! Like others, I’m not entirely happy with the prohibition on concatenation. I agree with the logic for prohibiting + (the values are Template objects, so there should be no expectation that they support addition). However, I would like t"..." "..." to be supported, as a special case. This would be particularly useful in its multi-line form

some_var = t"..." \
            "..."

or

some_var = (t"..."
             "..."
)

While it’s true that triple-quoted strings are available as an alternative, the fact that they can’t be easily indented to line up with the surrounding code (and the common solution of using dedent doesn’t work for t-strings) means that implicit concatenation does have its place.

I’d limit it explicitly to only allowing "..." (with no prefix) to be concatenated with a t-string. Yes, that’s a special case rule, but so is the “no concatenation” rule.

ncoghlan · October 18, 2024, 2:58pm

One thing to note about concatenation: even if it is initially left out, adding it later is now straightforward.

That wasn’t the case with the previous version of PEP 750.

Separately from that discussion, @nhumrich and I have also agreed that given the updates to PEP 750 there’s no longer any differences we feel strongly enough about to champion an alternative, so we’ll be withdrawing PEP 501 in favour of PEP 750.

dkp · October 18, 2024, 3:38pm

Thank you! Will revisit your broader comments soon but, to address one specific point: we removed the use of @dataclass (and the corresponding caveat) from the PEP.

oscarbenjamin · October 18, 2024, 6:03pm

I don’t see what the difficulty is in concatenating templates like:

t"A{B}C" + t"D{E}F"   -->   t"A{B}CD{E}F"

The PEP says that the template always starts and ends with a string part. Why can’t the string parts just be concatenated?

Is that restriction a hangover from previous versions of the PEP where the template was not returned directly because the tag function would process it first?

(P.S. Hi @dkp and many thanks for your excellent Go website that I have used many times over the years!)

jankatins · October 18, 2024, 8:00pm

One additional interesting use case would be output switching for cli apps like libxo provides: libxo: The Easy Way to Generate text, XML, JSON, and HTML output

xo_emit("Connecting to {:host}.{:domain}...\n", host, domain);

Depending on a output switch, that gets rendered as

TEXT:
      Connecting to my-box.example.com...
XML:
      <host>my-box</host>
      <domain>example.com</domain>
JSON:
      "host": "my-box",
      "domain": "example.com"

jankatins · October 18, 2024, 8:13pm

One counterpoint: I always make my linter forbid this kind of concatenation and only allow ("..." + "...") (across two lines) because I was once bitten by adding a comma in a (" ..." , "..."). explicit is better than implicit, at least for me, so I would love to see + being allowed.

ilotoki0804 · October 19, 2024, 8:23am

How are Template types handled in terms of hashability and mutability?

Immutable and always hashable (like a string)
Immutable and sometimes hashable (like a tuple)
Immutable but not hashable
Mutable but hashable (like a regular class)
Mutable and not hashable (like a list)

ncoghlan · October 19, 2024, 8:45am

As far as concatenation goes, the concern (at least from my PoV) isn’t with concatenating template instances with each other, or with concatenating regular strings, since those are both well-defined in a substitution-safe way:

t"some template" + t"other template" → Template(*lhs.args, *rhs.args)
"some string" + t"some template" → Template(lhs, *rhs.args)
t"some template" + "some string" → Template(*lhs.args, rhs)

The problem I see with allowing the latter two cases is when f-strings (and other forms of string formatting) get involved, since they look like the latter two arguably safe cases at runtime, but they’re actually bypassing the substitution safety features that the use of templates is supposed to be providing.

That concern only applies to string concatenation, though. If template concatenation were allowed (and I’m struggling to see any cases where it would be dangerous, since everything remains correctly escaped), then the two otherwise risky cases could be safely written as:

t"{"some string"}" + t"some template"
t"some template" + t"{"some string"}"

There are certainly cases where sequence concatenation will produce nonsense (such as combining multiple HTML body sections), but there are also plenty of cases where it will be valid (such as combining HTML paragraph sections, list sections, table sections).

ncoghlan · October 19, 2024, 3:04pm

PEP 501 has been officially withdrawn, referring readers to PEP 750 instead: PEP 501 – General purpose template literal strings | peps.python.org

(The PEP 750 update that produced the preview link is still going through its final prepublication review pass, so don’t be concerned about the fact that the live PEP index still has the previous iteration of the proposal up)

pitrou · October 19, 2024, 4:28pm

Perhaps even add a __str__ method for that?

UltimateLobster · October 20, 2024, 4:48am

I kinda worry about what this means for future compatibility. Any future addition of a new string prefix to the language risks backwards incompatibility.

It’s not like string prefixes are added every day but I it would be a shame if it would discourage the addition of a helpful builtin prefix to the language.

Is there a place in the PEP that addresses this?

Nineteendo · October 20, 2024, 7:30am

The PEP has been updated and only proposes a t-prefix now (no arbitrary prefixes), but this would definitely be worth considering when a future PEP proposes this addition.

Dutcho · October 20, 2024, 8:53am

Does that mean Template.__init__() “standardizes” its *args before assigning it to self.args, i.e. enforces the interleaving and odd length?

I didn’t see that logic in the examples’ __init__.py file and couldn’t find class Template in the reference implementation’s types.py module. But I think it would make a lot of sense.