PEP750: Template Strings (new updates)

markshannon · December 20, 2024, 9:03am

That None check would be annoying and expensive

It may well be annoying, but is None checks are really cheap.

ncoghlan · December 20, 2024, 11:14am

Considering the two main edge cases that make either of the pairwise iteration options awkward (assuming the “there are always leading and trailing string segments defined, even if one or both of them are empty” principle is retained):

"just a string"
- (string, interpolation): ("just a string", None)
- (interpolation, string): (None, "just a string")
"{"just an interpolation"}"
- (string, interpolation): ("", Interpolation(...)), ("", None)
- (interpolation, string): (None, ""), (Interpolation(...), "")

Given that there isn’t much of a difference in the level of awkwardness there, I find Guido’s point about the advantages of consumers being able to get the special case out of the way before starting the main iteration loop compelling.

The (interpolation, string) pair order does make it tempting to special case the leading string, though. Something like:

for field, string in tmpl:
    if field is not None:
        process_field(field)
    process_string(string)

process_string(tmpl.leading_string)
for field, string in zip(tmpl.interpolations, tmpl.trailing_strings):    
    process_field(field)
    process_string(string)

And then define the main iterator as skipping that first (None, "") pair when there’s no leading text:

def __iter__(self):
    if self.leading_string:
        yield None, self.leading_string
    yield from zip(self.interpolations, self.trailing_strings)

guido · December 20, 2024, 4:41pm

Hm, I prefer one right way rather than several different ways. I prefer the fewest number of field, with no redundancy.

steve.dower · December 20, 2024, 5:58pm

FWIW, we have some kind of precedent in re.split (which I was just using today to solve a similar kind of design).

>>> re.split(r"(\{.*?\})", "a string {with}{interp}olations{.}")
['a string ', '{with}', '', '{interp}', 'olations', '{.}', '']

There’s probably a rule defined somewhere, but just from observation, it guarantees a string at either end (since it’s splitting, this makes sense), and also in between adjacent matches.

Again, it’s not quite the same thing as what we’re doing here. But I’d feel pretty comfortable with saying “we’re just doing it this way because re.split does and we couldn’t agree on something better”.

Incidentally, I also wrote an alternating loop to process, in my case, and didn’t particularly have any trouble getting it right (my first thought was to do the zip/[::2] trick, but I wasn’t confident in my head so I just did the simple loop):

is_sep = False
for i in split_list:
    if is_sep:
        # do one thing
    else:
        # do the other
    is_sep = not is_sep

dkp · December 20, 2024, 7:26pm

I like this – an approach that forces developers to confront the oddball case up front!

Putting it all together, that leaves us with Template defined roughly as:

class Template:
    @property
    def strings(self) -> tuple[str, ...]:
        """Return N+1 strings, where N is the number of interpolations."""
        ...

    @property
    def interpolations(self) -> tuple[Interpolation, ...]:
        """Return N interpolations, N >= 0"
        ....

    @property
    def pairs(self) -> Iterable[tuple[Interpolation | None, str]]:
        """Return interpolation/string pairs; the first pair has no interpolation."""
        ...

    # `args` is no longer

Would we like to define an __iter__() directly on Template in addition to/rather than pairs()?

FWIW I do think that pretty much any approach to walking a Template’s contents is going to be awkward somehow. I suppose the silver lining is that writing code to process Templates is probably the uncommon case; most developers will instead use a pre-existing template processing library and won’t need to confront the awkwardness directly.

Nineteendo · December 20, 2024, 7:32pm

This definition means that the type checker will complain that the interpolation is potentially None, unless you use cast(Interpolation, interpolation) or assert isinstance(interpolation, Interpolation).

Edit: you can of course use a # type: ignore comment.

dkp · December 20, 2024, 7:54pm

Thank you for the deep dive, Mark!

Yes! It seems like we’re heading that way.

Fixed in the draft PEP; thanks.

Removed. (And: whoops! Thanks.)

Earlier versions of the PEP proposed types, collections, and even collections.abc (back when we were thinking the new types might be abstract or even protocols). Various feedback both in-person at the Bellevue core dev sprint and in previous discussion threads got us to avoid these.

This said, I think we’re happy with pretty much any choice here – types seems totally fine so long as the community can come to agreement.

(Hrm… about the only affirmative case I can make for templatelib is that there are possibly related methods (like a new convert(value: object, conv: str) -> str method being discussed earlier in this thread, or something like our current from_format() example) that might find a natural home there. Down the road, if we ever consider shipping template processing batteries with future versions of Python, maybe templatelib.shell, templatelib.sql, etc. would make some kind of sense. I don’t know how much I believe this affirmative case, though! )

I’ll update the PEP to use TBD for now. Perhaps this should be a question for the steering council?

We should probably reference an example. The from_format() method in our examples codebase shows a good use case: taking a string originally destined for use in str.format() and instead constructing a Template directly.

While I don’t expect it to be common, it probably makes sense to leave open the possibility of writing code that consumes a Template, transforms it in some generic way, and returns a new Template. (This sort of chaining is not unheard of in Javascript tagged template-processing libraries, for instance.)

For me, it starts from thinking of t-strings as an extension to f-strings. This assert holds:

cheese = "emmental"
assert f"Try some " + f"{cheese}" == f"Try some {cheese}"

I think most devs will intuitively expect this to also hold:

assert t"Try some " + t"{cheese}" == t"Try some {cheese}"

So we land on a fairly straightforward definition of __eq__() at which point we’re also basically cornered on our choice of __hash__().

As for __hash__() when interpolation values don’t support it: I don’t see this as much different than __hash__() for tuples when their contents don’t support it. A Template is hashable if and only if its interpolations and their values are; otherwise, not.

Working with Lysandros’ prototype implementation and building our examples repo this has all felt pretty good to work with in practice.

Given that we already have a format() built-in that only takes the format_spec, I feel like we probably want to keep these separate in the Interpolation type. (As an aside: there’s no convert() equivalent to format() in Python today; the PEP provides an example implementation. It’s tiny, but I could still see wanting to ship it with Python directly as discussed elsewhere in this thread.)

dkp · December 20, 2024, 7:58pm

Yes, devs will need to do some kind of check – is None seems simplest – or otherwise circumvent checking.

dkp · December 20, 2024, 8:00pm

Yes although, for that sort of caching, my hunch is that Template.strings will be the more commonly used key.

dkp · December 20, 2024, 10:36pm

As a follow up, we created a branch of the examples repo that implements pairs() and updates the examples to use it rather than walking template.args.

It seems fine!

For comparison, here’s the f() example from the PEP, walking args:

def f(template: Template) -> str:
    parts = []
    for arg in template.args:
        if isinstance(arg, str):
            parts.append(s)
        else:
            value = convert(arg.value, arg.conv)
            value = format(value, arg.format_spec)
            parts.append(value)
    return "".join(parts)

and here’s f() using pairs():

def f(template: Template) -> str:
    parts = []
    for i, s in template.pairs:
        if i is not None:
            value = convert(i.value, i.conv)
            value = format(value, i.format_spec)
            parts.append(value)
        parts.append(s)
    return "".join(parts)

and – why not? – here’s f() using pairs_s_i() which puts the exceptional case at the end instead:

def f(template: Template) -> str:
    parts = []
    for s, i in template.pairs_s_i:
        parts.append(s)
        if i is not None:
            value = convert(i.value, i.conv)
            value = format(value, i.format_spec)
            parts.append(value)
    return "".join(parts)

ncoghlan · December 20, 2024, 11:58pm

I understood Mark’s question here as being “Since template instances are immutable, why isn’t all of their initialisation being handled in __new__?”

Edit: To be clear, I have the same question, I just hadn’t noticed that the API summary in PEP 750 wasn’t already using __new__.

dkp · December 21, 2024, 12:16am

Oh, sheesh. And of course it is __new__; the PEP has it wrong.

Thanks! Will fix.

ncoghlan · December 21, 2024, 12:30am

I’ve been trying to work out why the “interpolation first” pairwise iteration idea inspires a feeling of “That’s really clever, I don’t like it”. Putting the edge case first rather than last has a genuine practical benefit in making it easy to lift the None check out of the loop (although typecheckers might complain about attempts to do that without appropriately typed properties), so if I’m going to object, I’d like my objection to have more behind it than just “it feels weird”.

I think it’s just the fact that we have plenty of precedent for padding with None values to make sequence lengths align, but little or no precedent for injecting values at the start of one of the sequences to make their lengths match.

From the point of view of forcing the exceptional case to be handled, the key there is to always emit it: even if a template ends with an interpolation field, the final ("", None) pair would trip up a naive loop just as effectively as a leading (None, "") pair when using the other order.

For the “avoid the inline None check” iteration pattern, I don’t see a straightforward way to cleanly enable it without dedicated appropriately typed properties anyway, and that works with either pairwise order (a tmpl.complete_pairs property together with either tmpl.leading_string or tmpl.trailing_string depending on the chosen pairwise order)

MegaIng · December 21, 2024, 12:47am

Maybe I am unique in this, but I really hate any of these suggestions to solve the iteration with zipped up tuples or a known interleaving patterns. IMO the only two clean suggestions I have seen in this thread are the ones using args directly and differentiated between literals and values with isinstance or with pattern matching. A few reasons why I don’t like the pairings:

The fact that you have to handle an exception case somewhere is IMO a big ward.
The two things you get in the same iteration are not related - this alone is IMO the biggest issue.
I will probably always have to look up the order of the pairs - is it first string or first interpolation? Where is the None edge case I have to handle?

IMO, the best solution is for the PEP to just say “args is a list of strings and values” - no promises about interleaving, no promises about being non-empty, no promise about contained or not containing the empty string. The constructor also shouldn’t adjust this in any way and any code that relies on the interleaving behavior ^[1] should be considered to rely on implementation defined behavior.

If you want to get just all strings, spell it out: tuple(s for s in tmpl.args if isinstance(s, str)). Yes, this is technically slower, but we shouldn’t compromise good API design for better performance.

This will lead to more readable, cleaner code where what it does is obvious to anyone reading it - IMO one of the core goals of “pythonic” code.

assuming it still gets implemented ↩︎

dkp · December 21, 2024, 12:48am

After updating that examples repo, I guess if I had an aesthetic preference between pair() and pair_s_i() it’d be for the s_i variant since I can’t shake the “first thing is a string” mindset for today’s Template.args. (But they’re all just fine!)

guido · December 21, 2024, 5:37pm

Yeah, I agree that just args is easier to use. It’s just that it’s harder to type statically — you end up having to use a runtime type check.

Also if you want just the strings it’s expensive to extract those from args, unless you have access to .strings as well. (args[::2] stills has the union type).

guido · December 21, 2024, 7:38pm

The thread is too long to try and dig up evidence, but IIRC I came up with the “interpolation first” idea in response to super-type-safe proposals where it was proposed to move the final string out of the loop, e.g.

for string, interp in tmpl.pairs:
    process_string(string)
    process_interp(interp)
process_string(tmpl.final_string)

This was flagged as a potential bug magnet because it’s easy to forget that final call (and I agree that it’s very awkward). I think the first response was to have the final string be included as a (string, None) tuple, which somehow still felt awkward (I don’t recall why) so I proposed to move the special case to the first (interp, string) tuple.

But to be honest, my heart isn’t into the type-safe solutions that much. I’d prefer args to produce strings and interpolations, leaving out empty strings – that’s how I personally apparently think about f-strings and t-strings. But the other PEP authors (presumably influenced by the previous discussion thread) seem to feel strongly about strictly alternating.

The worst of all worlds IMO would be to make args strictly alternating in the implementation without specifying that – in practice people use what works, not what the spec says, so if the implementation alternates (which is trivial to assess by observing its actual behavior), people will code according to that, regardless if the spec says in’s not guaranteed.

I’d be okay with having an args (specified either way) and separate strings (a tuple of tuples) and interpolations (a tuple of interpolations).

dkp · December 21, 2024, 9:29pm

FWIW: aesthetically I prefer walking over .args and type checking over either of the pairs() approaches – but I don’t think any of the three are bad, especially given that writing code to process templates (rather than calling code that processes them) is the uncommon case.

This said, the fact that the pairs() approaches hide args entirely and save us from specifying alternation outright seems like a win.

Maybe it’s worth stepping back a little?

Looking at javascript-land, tagged template functions are typed as tag(strings: TemplateStringsArray, ...values: Any) -> Any. Javascript guarantees strings is length n+1, where n is the length of values. strings so structured is very commonly used as a cache key by developers.

In a different direction, following the thread of “let’s just have args but not have alternation”. Some consequences:

Template equality is no longer tied to args. That is, assert t"hello " + t"world" == t"hello world" should probably still hold, but presumably in one case args is ("hello ", "world") and in the other case, args is ("hello world",) Under the hood, __eq__ will still need to coalesce neighboring strings. Also as an implementation detail, it can decide to use or drop empty strings.
The collection of strings culled from tuple(s for s in template.args if isinstance(s, str)) is no longer a useful cache key. For instance, if we’re dropping empty strings, it will be () for t"{foo}" and t"{foo}{bar}". Or if we aren’t dropping empty strings, it will be ("hello", "") for t"hello" + t"" and ("hello",) for t"hello" even those ideally result in the same key. Developers will need to implement coalescing and alternation themselves in order to derive a useful key. (Or we’d need to provide a separate strings that has this property.)

I don’t love either of these outcomes.

guido · December 21, 2024, 10:07pm

What’s wrong with letting + do the work instead of ==?

dkp · December 21, 2024, 10:08pm

But we can construct templates directly too. Shouldn’t assert Template(“hello “, “world”) == Template(“hello world”) hold as well?