Relaxing `t-string` conversion restriction

Currently:

>>> a = 1
>>> t'{a!something}'
SyntaxError: t-string: invalid conversion character 'something': expected 's', 'r', or 'a'

I propose relaxing this restriction and allowing any valid id. I.e. [a-zA-Z_][a-zA-Z0-9_]*.


Why this might be a good idea (besides the use cases):

  1. If one wants to have this restriction, adding it is trivial: assert conversion in 'sra'.
  2. Any further conversion extensions to str.format / f-strings / etc will not require extra work for t-strings - given this is a customisable toolbox (at least this is a promise that I read when I think about them) it would be less coupled and more generic.

At the same time, could do the same for _string.formatter_parser.
Which currently accepts single character without any subset restrictions.

This way, 2 tools that are used for advanced / customisable formatting would be in sync:

  1. t-strings (immediate variable substitution)
  2. string.Formatter (delayed variable substitution)

In a way that both provide a consistent backdoor for custom conversions.

1 Like

But you haven’t told us about the use cases.

I believe that in most cases you can simply use format_spec instead, which does allow any arbitrary string, i.e. t'{a:something}'.

That said, I do agree that this self-imposed limitation on the conversion string seems unnecessary given that the motivation behind the t-string is to generalize the f-string. But it’s just that in the use cases I can come up with myself a custom format_spec is enough.

It’s also worth noting that in the Rejected Ideas section of PEP-750 it does say that the rejection of an arbitrary conversion string is simply to adhere to the specs of the f-string, and that an arbitrary conversion string can be proposed in a separate PEP.

5 Likes

E.g.: Type convertion for f-string
C formatting has many more, thus I suspect although they might not be so useful to add to f-strings, allowing users to extend in appropriate places could be an option.

Personally, I have implemented some of them for my string.Formatter.

Thanks for this.

1 Like

I see in that thread you suggested the following conversions:

To me, those belong in the format spec rather than a conversion.

A format spec tells the formatter how you want an object to be formatted into a string, while a conversion is more about encoding the output after the formatting is done.

But then yeah I can see how it might make sense to allow custom encoding.

In C formatting the conversion specifier is part of the format string, so it isn’t really a case that supports your proposal.

PEP 737 proposes adding a !t formatter, but this idea landed in the Rejected Ideas.

1 Like

I think it is the other way round.

Conversion is something that fundamentally converts the object.
Format spec is just details how to format final object (+alignment).

Their order is also this way. Conversion is applied before formatting.

Well… Ok, there is no one-to-one relationship. But if it was found to be useful in C, it kind of gives a bit of weight to functionality itself, regardless in what form it materialises in Python.

In either case, the main point is pure extensibility for me.


And I think it might be a good move in general.

  1. t-strings obey more general protocol.
  2. string.Formatter allows extensions according to it as well.
  3. While f-strings, str.format, CustomStringFormatter, TStringHandler are applications.

So this is really about relaxing restriction for more general protocol as appropriate.
Whether it is a single character (as per current string.Formatter) / identifier or something else - I am open to ideas.

P.S. identifier string would break backwards compatibility for string.Formatter, but I really doubt that anyone has implemented any custom conversion that is punctuation or number.

1 Like

Ah you’re right. I got the order backwards. I was confused by the fact that the format spec in f'{1:f}' converts the integer to a float so there’s no need for a conversion spec there. But this confusion also shows how easy it is for a format_spec to do the job of conversion.

So the question remains as to what role a custom conversion is meant to play when a custom format_spec can be made to convert types just as well.

2 Likes

I think it should ideally retain its current purpose as closely as possible:
a) conversion is for fundamental conversions
b) format_spec is type specific

(b) has no restrictions and theoretically everything can be done with it.

However, as conversion is already there, it can be helpful to make things cleaner by separation of concerns.


I think happy middle would be to relax restriction to a single letter: [a-zA-Z].

Providing more freedom would likely cause more confusion than benefit.
If there are 2 ways, where both can be parameterised, I think it introduces a risk of people picking either one arbitrarily without giving enough thought.

These I make use of frequently with string.Formatter and it would be lovely if I could continue using these with t-strings:

!n = obj.__qualname__
!N = obj.__module__:obj.__qualname__
!t = type(obj).__qualname__
!T = type(obj).__module__:type(obj).__qualname__

The 4 above is my current set. But there can be many others. e.g.:

!v = getattr(obj, '__version__', '0.0.0')

I am not sure what I will add in the future, but being able to have this set in general is an advantage to me.


So this is not proposing to add extra conversions to existing end applications (f-strings, str.format, etc), but rather relaxing the restriction in places that are designed for user customisation. Namely, t-strings and string.Formatter.

string.Formatter already allows any single character.

I propose relaxing t-string conversion to any single letter - [a-zA-Z].
I think string.Formatter is more permissive than it needs to be in this regard:

In [31]: list(_string.formatter_parser('{!' + chr(1) + '}'))
Out[31]: [('', '', '', '\x01')]

Did a bit more work on this and I think that any-single-character might be a good option.
Same as string.Formatter allows.
And in line with format_spec, which has no restrictions on its contents.


format_spec is quite syntax heavy already.
Incorporating new things into it without breaking existing functionality can be a piece of work.

Thus, conversion can be a fairly good place to add custom things. E.g. comma delimited iterable values:

iterable = [1, 0, 1, 2]
# deired: '1, 0, 1, 2'
# possible implementation:
t = t'{iterable!,}'

So I think there is no harm in allowing any single character for the user to play with.

Also, string.Formatter already allows exactly that - this would make these 2 in sync without any extra work.

And the check (or the absence of it) itself is much simpler - just take the next character.

1 Like

Well, not exactly the same.
format_spec just consumes until the next }.

conversion could do the same in the future if needed - consume until : or }.
Then it could be argued that any single character is too loose and need to exclude : and }.
However, this is not needed as the routine for more than 1 character can be added and it would look like:

  1. Take any next character
  2. Consume until : or }

Thus:
{expr!:chars:.2f} would be a valid format with conversion = :chars
{expr!}chars:.2f} would be a valid format with conversion = }chars

So not suggesting to do this.
Just to note that allowing any single character is future proof in this regard.


Thus, my last proposition stands - any single character for conversion of t-strings.

To be specific: today you can implement this in your template processing function:

t'{obj:!n}' # obj.__qualname__
t'{obj:!n:^20}' # format(obj.__qualname__, '^20')
t'{iterable:!,}'  # my_comma_formatter(iterable)

As far as I can see, this proposal has the following advantages:

  • saves one character
  • makes it straightforward to use format specs that start with ! (but, how common are those?)

Are there any more? These two seem to be worth the extra complexity/extensibility.

2 Likes

It also prevents the user from using multiple conversions - e.g. with current syntax you could do t'{obj!r:!n}', which is very unclear (to me, at least) as to what actually happens - does the repr get applied, and then obj.__qualname__ (which would error); or does the !r get ignored, or the !n is ignored? Or the obj.__qualname__ is applied and then its repr is?

Obviously, if you want to allow mutliple conversions for some reason, you can already do so with the current syntax and you could do the same trick even if arbitrary conversions were allowed. But at least you’re explicitly making that an option.

1 Like

Yeah, it is doable via format_spec.
However, there is inconvenience of:

[[fill]align]
fill    - any char
align   - "<" | ">" | "=" | "^"

thus, {:!>} is ambiguous.
Then need to prohibit ^<>= and add extra logic for it.
Doable of course, but this a bit of effort.
From users POV, I think it is much more convenient to keep it consistent with string.Formatter and be able to add single character conversion, which is simple and robust.

And of course, the issue above is still there for extensions to format_spec, but at least there is some separation of concerns and it makes things cleaner by being able to source some of appropriate conversions to conversion. And maybe that is enough for a decent part of cases for which, given conversion is extensible, dabbling with format_spec is not needed at all.


I quite like being able to rely on mental model:

  1. conversion is transformation
  2. format_spec is formatting of final type

I think it is good to be able to rely on this when extending without needing to hack things.


Yeah, there is a bit of that.
And as usual, a bit more than I initially expected.

But nothing major really.
Decoupling from f-string’s fstring_conversion and introducing tstring_conversion.
ast.Inerpolation holds string character’s ordinal without restrictions - so no issues there.

I think it’s a mistake to diverge the conversion characters from what is supported in f-strings and str.format. I’m currently working on a PEP that will probably add a conversion character. I haven’t thought through if it makes sense for t-strings, but surely other conversions will make sense for them. For example, what if t-strings were in the language when !a was added, but this proposal had been in effect. Would t-strings just never be able to support ascii conversions?

I think conversions should be in the realm of the implementation, and user extensions should only be in the format specs.

2 Likes

But format_spec is already permissive.
How is conversion case any different in this regard?

I don’t think t-strings support or not support anything - they just parse the string and let user decide what to do with the parts. There is a utility function to apply conversion, but it is optional default, nothing more.

All this is suggesting is to relax restriction for conversion to pass-through any character.

Can you disclose what it is?

Speaking here just for f-strings and str.format: Because format_spec is tied to a type, and conversion can be applied to any and every type. For example, there’s a different ‘language’ for format_spec for str than there is for datetime.datetime. The format_spec is designed to be implemented by each type.

Good point: I’d forgotten that t-strings assign no meaning to the conversion. Still, I think that conceptually it’s related to !r, !s, and !a from str.format and f-strings (else why was it included at all?). But this does weaken my argument considerably.

Soon. I don’t want to jump the gun on the other PEP contributors, and I don’t want to sidetrack this discussion.

1 Like

Yes, this is how I see it as well.
Each serves its purpose.

And while t-string allows to pass-through any format_spec, conversion is restricted.

Of course, making things restricted is safer in general, but the cost seems reasonable:

  1. If implemented it just falls into the same category as format_spec. i.e. say user has his own conversion t implemented, stdlib implements t, user needs to change the letter. But it is already the same with format_spec, so I think it adds more value than causes actual problems. e.g. how many new conversions were added to stdlib in last 10 years?
  2. string.Formatter has been allowing any character for a long time now and I don’t think there were any issues.

Thus, I think this is fairly safe.

You get to specify that. For a precedent: in f-strings, the conversion is applied before formatting.

This – however you spell it – is a very useful operation. I use it whenever there should be quotes around the name; it gives me handling of “weird” non-identifier names for free.

Either prohibit ^<>= (or non-identifiers in general), or always parse !> as conversion and require !s:!> for filling with exclamation marks.[1]


  1. Again, you don’t need it that often!!‍!!‍!!‍!!‍!!‍!!‍!!‍!!‍!!‍!!‍!!‍!!‍!!‍!!‍!!‍!!‍!!‍!!‍!!‍!!‍!!‍!!‍!!‍!!‍!!‍!!‍!!‍!!‍!!‍!!‍!!‍!!‍! ↩︎

1 Like

I think this is the case for all {} formatting instances in stdlib.

I think that relaxing !c restriction would provide some convenience for simpler cases (and make things consistent with string.Formatter).

However, for implementing more advanced features, one will inevitably have to venture into format_spec. E.g.:

  1. Multiple conversions
  2. Parameterised conversions

Regarding, consistency.

I think consistent restriction level across {} extensibility tools is a good thing.

Currently, there is a bit of friction.

e.g. Someone implementing (actual) templates can make use of parts from both. e.g. _string.formatter_parser can be replaced with ast.parse to allow full syntax variable pick-up from namespace - rest of the functionality staying the same.

format_string = '{a[0](1, 2) + 1!r:fmt}'
ast_tstring = ast.parse(f't{format_string!r}', mode='eval').body

# assert_safe(ast_tstring)

for ast_interpolation in ast_tstring.values:
    expr = ast_interpolation.str
    conversion = ast_interpolation.conversion
    ...
...

namespace = {'a': {0: lambda a, b: a + b}}
value = eval(expr, namespace)
converted = apply_custom_conversion(value, chr(conversion))
...

If all things consistent, then it is smooth sailing.
Otherwise, if there are inconsistencies, then one needs to investigate, consider more variables, weigh additional pros and cons.


So I don’t think that this is something that would “allow something that can not be currently done”, but rather “something that would enhance user experience and eliminate friction”.