That None check would be annoying and expensive
It may well be annoying, but is None
checks are really cheap.
That None check would be annoying and expensive
It may well be annoying, but is None
checks are really cheap.
Considering the two main edge cases that make either of the pairwise iteration options awkward (assuming the “there are always leading and trailing string segments defined, even if one or both of them are empty” principle is retained):
"just a string"
("just a string", None)
(None, "just a string")
"{"just an interpolation"}"
("", Interpolation(...))
, ("", None)
(None, "")
, (Interpolation(...), "")
Given that there isn’t much of a difference in the level of awkwardness there, I find Guido’s point about the advantages of consumers being able to get the special case out of the way before starting the main iteration loop compelling.
The (interpolation, string) pair order does make it tempting to special case the leading string, though. Something like:
for field, string in tmpl:
if field is not None:
process_field(field)
process_string(string)
process_string(tmpl.leading_string)
for field, string in zip(tmpl.interpolations, tmpl.trailing_strings):
process_field(field)
process_string(string)
And then define the main iterator as skipping that first (None, "")
pair when there’s no leading text:
def __iter__(self):
if self.leading_string:
yield None, self.leading_string
yield from zip(self.interpolations, self.trailing_strings)
Hm, I prefer one right way rather than several different ways. I prefer the fewest number of field, with no redundancy.
FWIW, we have some kind of precedent in re.split
(which I was just using today to solve a similar kind of design).
>>> re.split(r"(\{.*?\})", "a string {with}{interp}olations{.}")
['a string ', '{with}', '', '{interp}', 'olations', '{.}', '']
There’s probably a rule defined somewhere, but just from observation, it guarantees a string at either end (since it’s splitting, this makes sense), and also in between adjacent matches.
Again, it’s not quite the same thing as what we’re doing here. But I’d feel pretty comfortable with saying “we’re just doing it this way because re.split
does and we couldn’t agree on something better”.
Incidentally, I also wrote an alternating loop to process, in my case, and didn’t particularly have any trouble getting it right (my first thought was to do the zip/[::2]
trick, but I wasn’t confident in my head so I just did the simple loop):
is_sep = False
for i in split_list:
if is_sep:
# do one thing
else:
# do the other
is_sep = not is_sep
I like this – an approach that forces developers to confront the oddball case up front!
Putting it all together, that leaves us with Template
defined roughly as:
class Template:
@property
def strings(self) -> tuple[str, ...]:
"""Return N+1 strings, where N is the number of interpolations."""
...
@property
def interpolations(self) -> tuple[Interpolation, ...]:
"""Return N interpolations, N >= 0"
....
@property
def pairs(self) -> Iterable[tuple[Interpolation | None, str]]:
"""Return interpolation/string pairs; the first pair has no interpolation."""
...
# `args` is no longer
Would we like to define an __iter__()
directly on Template
in addition to/rather than pairs()
?
FWIW I do think that pretty much any approach to walking a Template’s contents is going to be awkward somehow. I suppose the silver lining is that writing code to process Templates is probably the uncommon case; most developers will instead use a pre-existing template processing library and won’t need to confront the awkwardness directly.
This definition means that the type checker will complain that the interpolation is potentially None, unless you use cast(Interpolation, interpolation)
or assert isinstance(interpolation, Interpolation)
.
Edit: you can of course use a # type: ignore
comment.
Thank you for the deep dive, Mark!
Yes! It seems like we’re heading that way.
Fixed in the draft PEP; thanks.
Removed. (And: whoops! Thanks.)
Earlier versions of the PEP proposed types
, collections
, and even collections.abc
(back when we were thinking the new types might be abstract or even protocols). Various feedback both in-person at the Bellevue core dev sprint and in previous discussion threads got us to avoid these.
This said, I think we’re happy with pretty much any choice here – types
seems totally fine so long as the community can come to agreement.
(Hrm… about the only affirmative case I can make for templatelib
is that there are possibly related methods (like a new convert(value: object, conv: str) -> str
method being discussed earlier in this thread, or something like our current from_format()
example) that might find a natural home there. Down the road, if we ever consider shipping template processing batteries with future versions of Python, maybe templatelib.shell
, templatelib.sql
, etc. would make some kind of sense. I don’t know how much I believe this affirmative case, though! )
I’ll update the PEP to use TBD
for now. Perhaps this should be a question for the steering council?
We should probably reference an example. The from_format()
method in our examples codebase shows a good use case: taking a string originally destined for use in str.format()
and instead constructing a Template
directly.
While I don’t expect it to be common, it probably makes sense to leave open the possibility of writing code that consumes a Template, transforms it in some generic way, and returns a new Template. (This sort of chaining is not unheard of in Javascript tagged template-processing libraries, for instance.)
For me, it starts from thinking of t-strings as an extension to f-strings. This assert holds:
cheese = "emmental"
assert f"Try some " + f"{cheese}" == f"Try some {cheese}"
I think most devs will intuitively expect this to also hold:
assert t"Try some " + t"{cheese}" == t"Try some {cheese}"
So we land on a fairly straightforward definition of __eq__()
at which point we’re also basically cornered on our choice of __hash__()
.
As for __hash__()
when interpolation values don’t support it: I don’t see this as much different than __hash__()
for tuples when their contents don’t support it. A Template
is hashable if and only if its interpolations and their values are; otherwise, not.
Working with Lysandros’ prototype implementation and building our examples repo this has all felt pretty good to work with in practice.
Given that we already have a format()
built-in that only takes the format_spec
, I feel like we probably want to keep these separate in the Interpolation
type. (As an aside: there’s no convert()
equivalent to format()
in Python today; the PEP provides an example implementation. It’s tiny, but I could still see wanting to ship it with Python directly as discussed elsewhere in this thread.)
Yes, devs will need to do some kind of check – is None
seems simplest – or otherwise circumvent checking.
Yes although, for that sort of caching, my hunch is that Template.strings
will be the more commonly used key.
As a follow up, we created a branch of the examples repo that implements pairs()
and updates the examples to use it rather than walking template.args.
It seems fine!
For comparison, here’s the f()
example from the PEP, walking args
:
def f(template: Template) -> str:
parts = []
for arg in template.args:
if isinstance(arg, str):
parts.append(s)
else:
value = convert(arg.value, arg.conv)
value = format(value, arg.format_spec)
parts.append(value)
return "".join(parts)
and here’s f()
using pairs()
:
def f(template: Template) -> str:
parts = []
for i, s in template.pairs:
if i is not None:
value = convert(i.value, i.conv)
value = format(value, i.format_spec)
parts.append(value)
parts.append(s)
return "".join(parts)
and – why not? – here’s f()
using pairs_s_i()
which puts the exceptional case at the end instead:
def f(template: Template) -> str:
parts = []
for s, i in template.pairs_s_i:
parts.append(s)
if i is not None:
value = convert(i.value, i.conv)
value = format(value, i.format_spec)
parts.append(value)
return "".join(parts)
I understood Mark’s question here as being “Since template instances are immutable, why isn’t all of their initialisation being handled in __new__
?”
Edit: To be clear, I have the same question, I just hadn’t noticed that the API summary in PEP 750 wasn’t already using __new__
.
I understood Mark’s question here as being “Since template instances are immutable, why isn’t all of their initialisation being handled in
__new__
?”
Oh, sheesh. And of course it is __new__
; the PEP has it wrong.
Thanks! Will fix.
I’ve been trying to work out why the “interpolation first” pairwise iteration idea inspires a feeling of “That’s really clever, I don’t like it”. Putting the edge case first rather than last has a genuine practical benefit in making it easy to lift the None
check out of the loop (although typecheckers might complain about attempts to do that without appropriately typed properties), so if I’m going to object, I’d like my objection to have more behind it than just “it feels weird”.
I think it’s just the fact that we have plenty of precedent for padding with None
values to make sequence lengths align, but little or no precedent for injecting values at the start of one of the sequences to make their lengths match.
From the point of view of forcing the exceptional case to be handled, the key there is to always emit it: even if a template ends with an interpolation field, the final ("", None)
pair would trip up a naive loop just as effectively as a leading (None, "")
pair when using the other order.
For the “avoid the inline None
check” iteration pattern, I don’t see a straightforward way to cleanly enable it without dedicated appropriately typed properties anyway, and that works with either pairwise order (a tmpl.complete_pairs
property together with either tmpl.leading_string
or tmpl.trailing_string
depending on the chosen pairwise order)
Maybe I am unique in this, but I really hate any of these suggestions to solve the iteration with zipped up tuples or a known interleaving patterns. IMO the only two clean suggestions I have seen in this thread are the ones using args
directly and differentiated between literals and values with isinstance
or with pattern matching. A few reasons why I don’t like the pairings:
None
edge case I have to handle?IMO, the best solution is for the PEP to just say “args is a list of strings and values” - no promises about interleaving, no promises about being non-empty, no promise about contained or not containing the empty string. The constructor also shouldn’t adjust this in any way and any code that relies on the interleaving behavior [1] should be considered to rely on implementation defined behavior.
If you want to get just all strings, spell it out: tuple(s for s in tmpl.args if isinstance(s, str))
. Yes, this is technically slower, but we shouldn’t compromise good API design for better performance.
This will lead to more readable, cleaner code where what it does is obvious to anyone reading it - IMO one of the core goals of “pythonic” code.
assuming it still gets implemented ↩︎
Putting the edge case first rather than last has a genuine practical benefit
After updating that examples repo, I guess if I had an aesthetic preference between pair()
and pair_s_i()
it’d be for the s_i
variant since I can’t shake the “first thing is a string” mindset for today’s Template.args
. (But they’re all just fine!)
Yeah, I agree that just args is easier to use. It’s just that it’s harder to type statically — you end up having to use a runtime type check.
Also if you want just the strings it’s expensive to extract those from args, unless you have access to .strings as well. (args[::2] stills has the union type).
The thread is too long to try and dig up evidence, but IIRC I came up with the “interpolation first” idea in response to super-type-safe proposals where it was proposed to move the final string out of the loop, e.g.
for string, interp in tmpl.pairs:
process_string(string)
process_interp(interp)
process_string(tmpl.final_string)
This was flagged as a potential bug magnet because it’s easy to forget that final call (and I agree that it’s very awkward). I think the first response was to have the final string be included as a (string, None) tuple, which somehow still felt awkward (I don’t recall why) so I proposed to move the special case to the first (interp, string) tuple.
But to be honest, my heart isn’t into the type-safe solutions that much. I’d prefer args to produce strings and interpolations, leaving out empty strings – that’s how I personally apparently think about f-strings and t-strings. But the other PEP authors (presumably influenced by the previous discussion thread) seem to feel strongly about strictly alternating.
The worst of all worlds IMO would be to make args strictly alternating in the implementation without specifying that – in practice people use what works, not what the spec says, so if the implementation alternates (which is trivial to assess by observing its actual behavior), people will code according to that, regardless if the spec says in’s not guaranteed.
I’d be okay with having an args (specified either way) and separate strings (a tuple of tuples) and interpolations (a tuple of interpolations).
FWIW: aesthetically I prefer walking over .args
and type checking over either of the pairs()
approaches – but I don’t think any of the three are bad, especially given that writing code to process templates (rather than calling code that processes them) is the uncommon case.
This said, the fact that the pairs()
approaches hide args
entirely and save us from specifying alternation outright seems like a win.
Maybe it’s worth stepping back a little?
Looking at javascript-land, tagged template functions are typed as tag(strings: TemplateStringsArray, ...values: Any) -> Any
. Javascript guarantees strings
is length n+1
, where n
is the length of values
. strings
so structured is very commonly used as a cache key by developers.
In a different direction, following the thread of “let’s just have args
but not have alternation”. Some consequences:
Template equality is no longer tied to args
. That is, assert t"hello " + t"world" == t"hello world"
should probably still hold, but presumably in one case args
is ("hello ", "world")
and in the other case, args
is ("hello world",)
Under the hood, __eq__
will still need to coalesce neighboring strings. Also as an implementation detail, it can decide to use or drop empty strings.
The collection of strings culled from tuple(s for s in template.args if isinstance(s, str))
is no longer a useful cache key. For instance, if we’re dropping empty strings, it will be ()
for t"{foo}"
and t"{foo}{bar}"
. Or if we aren’t dropping empty strings, it will be ("hello", "")
for t"hello" + t""
and ("hello",)
for t"hello"
even those ideally result in the same key. Developers will need to implement coalescing and alternation themselves in order to derive a useful key. (Or we’d need to provide a separate strings
that has this property.)
I don’t love either of these outcomes.
I don’t love either of these outcomes.
What’s wrong with letting + do the work instead of ==?
What’s wrong with letting + do the work instead of ==?
But we can construct templates directly too. Shouldn’t assert Template(“hello “, “world”) == Template(“hello world”)
hold as well?