Relaxing `t-string` conversion restriction

Currently:

>>> a = 1
>>> t'{a!something}'
SyntaxError: t-string: invalid conversion character 'something': expected 's', 'r', or 'a'

I propose relaxing this restriction and allowing any valid id. I.e. [a-zA-Z_][a-zA-Z0-9_]*.


Why this might be a good idea (besides the use cases):

  1. If one wants to have this restriction, adding it is trivial: assert conversion in 'sra'.
  2. Any further conversion extensions to str.format / f-strings / etc will not require extra work for t-strings - given this is a customisable toolbox (at least this is a promise that I read when I think about them) it would be less coupled and more generic.

At the same time, could do the same for _string.formatter_parser.
Which currently accepts single character without any subset restrictions.

This way, 2 tools that are used for advanced / customisable formatting would be in sync:

  1. t-strings (immediate variable substitution)
  2. string.Formatter (delayed variable substitution)

In a way that both provide a consistent backdoor for custom conversions.

1 Like

But you haven’t told us about the use cases.

I believe that in most cases you can simply use format_spec instead, which does allow any arbitrary string, i.e. t'{a:something}'.

That said, I do agree that this self-imposed limitation on the conversion string seems unnecessary given that the motivation behind the t-string is to generalize the f-string. But it’s just that in the use cases I can come up with myself a custom format_spec is enough.

It’s also worth noting that in the Rejected Ideas section of PEP-750 it does say that the rejection of an arbitrary conversion string is simply to adhere to the specs of the f-string, and that an arbitrary conversion string can be proposed in a separate PEP.

3 Likes

E.g.: Type convertion for f-string
C formatting has many more, thus I suspect although they might not be so useful to add to f-strings, allowing users to extend in appropriate places could be an option.

Personally, I have implemented some of them for my string.Formatter.

Thanks for this.

1 Like

I see in that thread you suggested the following conversions:

To me, those belong in the format spec rather than a conversion.

A format spec tells the formatter how you want an object to be formatted into a string, while a conversion is more about encoding the output after the formatting is done.

But then yeah I can see how it might make sense to allow custom encoding.

In C formatting the conversion specifier is part of the format string, so it isn’t really a case that supports your proposal.

PEP 737 proposes adding a !t formatter, but this idea landed in the Rejected Ideas.

1 Like

I think it is the other way round.

Conversion is something that fundamentally converts the object.
Format spec is just details how to format final object (+alignment).

Their order is also this way. Conversion is applied before formatting.

Well… Ok, there is no one-to-one relationship. But if it was found to be useful in C, it kind of gives a bit of weight to functionality itself, regardless in what form it materialises in Python.

In either case, the main point is pure extensibility for me.


And I think it might be a good move in general.

  1. t-strings obey more general protocol.
  2. string.Formatter allows extensions according to it as well.
  3. While f-strings, str.format, CustomStringFormatter, TStringHandler are applications.

So this is really about relaxing restriction for more general protocol as appropriate.
Whether it is a single character (as per current string.Formatter) / identifier or something else - I am open to ideas.

P.S. identifier string would break backwards compatibility for string.Formatter, but I really doubt that anyone has implemented any custom conversion that is punctuation or number.

1 Like

Ah you’re right. I got the order backwards. I was confused by the fact that the format spec in f'{1:f}' converts the integer to a float so there’s no need for a conversion spec there. But this confusion also shows how easy it is for a format_spec to do the job of conversion.

So the question remains as to what role a custom conversion is meant to play when a custom format_spec can be made to convert types just as well.

1 Like

I think it should ideally retain its current purpose as closely as possible:
a) conversion is for fundamental conversions
b) format_spec is type specific

(b) has no restrictions and theoretically everything can be done with it.

However, as conversion is already there, it can be helpful to make things cleaner by separation of concerns.


I think happy middle would be to relax restriction to a single letter: [a-zA-Z].

Providing more freedom would likely cause more confusion than benefit.
If there are 2 ways, where both can be parameterised, I think it introduces a risk of people picking either one arbitrarily without giving enough thought.

These I make use of frequently with string.Formatter and it would be lovely if I could continue using these with t-strings:

!n = obj.__qualname__
!N = obj.__module__:obj.__qualname__
!t = type(obj).__qualname__
!T = type(obj).__module__:type(obj).__qualname__

The 4 above is my current set. But there can be many others. e.g.:

!v = getattr(obj, '__version__', '0.0.0')

I am not sure what I will add in the future, but being able to have this set in general is an advantage to me.


So this is not proposing to add extra conversions to existing end applications (f-strings, str.format, etc), but rather relaxing the restriction in places that are designed for user customisation. Namely, t-strings and string.Formatter.

string.Formatter already allows any single character.

I propose relaxing t-string conversion to any single letter - [a-zA-Z].
I think string.Formatter is more permissive than it needs to be in this regard:

In [31]: list(_string.formatter_parser('{!' + chr(1) + '}'))
Out[31]: [('', '', '', '\x01')]

Did a bit more work on this and I think that any-single-character might be a good option.
Same as string.Formatter allows.
And in line with format_spec, which has no restrictions on its contents.


format_spec is quite syntax heavy already.
Incorporating new things into it without breaking existing functionality can be a piece of work.

Thus, conversion can be a fairly good place to add custom things. E.g. comma delimited iterable values:

iterable = [1, 0, 1, 2]
# deired: '1, 0, 1, 2'
# possible implementation:
t = t'{iterable!,}'

So I think there is no harm in allowing any single character for the user to play with.

Also, string.Formatter already allows exactly that - this would make these 2 in sync without any extra work.

And the check (or the absence of it) itself is much simpler - just take the next character.

1 Like