Format String Syntax specification differs from actual behaviour

pyctrl · February 23, 2024, 4:24pm

Hello,

I’m working with “Format String Syntax” specification, more specifically the grammar part:

So I’m talking about new-style formating (example: "{} some template with {data}".format(1, data=[1, 2, 3])).

As I see the “field_name” (if present) starts with “arg_name”.
And “arg_name” can only be

empty
identifier
digits

So I don’t expect any other “arg_names” values passed to str.format() method to be valid.
Following the grammar and new-style formatting mechanics all named fields from template would be treated with “arg_names” option №2 from the list above (identifiers).

And I expected all “identifier” named fields are validated with str.isidintifier() – any of them having False for this expression would produce error similar to this:

In [30]: def f(**kwargs):
    ...:     print(kwargs)
    ...:

In [31]: d = {None: 12, 12: 212121}

In [32]: f(**d)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[32], line 1
----> 1 f(**d)

TypeError: keywords must be strings

But I found out that this is not true. Here is an example:

In [29]: "{a-b} {-}".format(**{"-": "_-_", "a-b": "ABC"})
Out[29]: 'ABC _-_'

I wrote a template that is invalid (in terms of grammar above) and formatted it using kwargs mechanics –and it succeeded.

I think it’s related to **kwargs mechanics in function mechanics which allows passing any dict having only string keys. This kwargs mechanics have no “isidentifier” constraint.

Seems like such “isidentifier” constraint is missing in str.format() implementation.
Or there is an issue in documentation.

Can you please comment this situation?
I’m wondering is this a bug in implementation and could be fixed or it’s expected behavior with docs issue.

P.S. It really matters for library I’m working on.

MegaIng · February 23, 2024, 4:48pm

The actual behavior is that attribute_name and the first field can be anything that doesn’t contain . or ], for example "{-.-1[-].0}" is valid, if a bit hard to read and unlikely to work ^[1]. Also note that negative numbers ("{-1}") are treated as attempted attribute/map key access, not as numeric indices.

I don’t know if the behavior is intended, but I doubt that it will be changed. I would say the docs should be updated.

replacement_field ::=  "{" [field_name] ["!" conversion] [":" format_spec] "}"
field_name        ::=  arg_name ("." attribute_name | "[" element_index "]")*
arg_name          ::=  [digit+ | attribute_name]
attribute_name    ::=  <any source character except "]" or "."> +
element_index     ::=  digit+ | index_string
index_string      ::=  <any source character except "]"> +
conversion        ::=  "r" | "s" | "a"
format_spec       ::=  format-spec:format_spec

This should be more accurate to the implementation.

May I ask hat you are doing so that this behavior is relevant for you?

But you can construct an object so that it does work: s.format(**{"-":SimpleNamespace(**{"-1":{'-':SimpleNamespace(**{'0':"value"})}})}) ↩︎

pyctrl · February 23, 2024, 5:11pm

May I ask hat you are doing so that this behavior is relevant for you?

Sure, I’m working on this library called “izulu” (it’s still WIP but not for long).

In short, it’s about making exception classes more advanced. And one of core parts is error message template (based on new-style string formatting).
One of features I’m thinking of is to provide some template validations. And the grammar is important for this feature.

MegaIng · February 23, 2024, 5:14pm

I would suggest to use the string.Formatter class for this purpose and not try to parse the grammar yourself. The .parse function should give you all the information you need to validate.

pyctrl · February 23, 2024, 5:15pm

I don’t know if the behavior is intended, but I doubt that it will be changed. I would say the docs should be updated.

I expected this.

What should I do as a next to sync docs with behavior?
Is this post is enough? Or I should tag some person directly?
Or maybe go to another resource and do something there?

(This is my first interaction with Python community – I would appreciate guidance)

MegaIng · February 23, 2024, 5:19pm

I think making a post (or moving this post ^[1]) into the Documentation region of this page should be enough, there you can get more guidance. If others agree (especially people more qualified than me) that the docs should be changed, you can open a PR.

I can do that if you can’t do it ↩︎

pyctrl · February 23, 2024, 5:20pm

Yep, I know and already use this method.
But it does not all the things I want/need.

In [39]: f = string.Formatter()

In [40]: list(f.parse("{a.b.c}"))
Out[40]: [('', 'a.b.c', '', None)]

I’d like to validate first part of named field (“a” in this case). So I was wondering about “isidentifier” constraint.
Named fields in template match kwargs from __init__ and I want to do some validations over recieved kwargs.

string.Formatter.parse() doesn’t have detailed field processing within this context. So this local check is relevant for my feature and manual processing.

I would be happy to use something from builtin.

MegaIng · February 23, 2024, 5:25pm

Aha, that part isn’t exposed. Look at the source code for the string stdlib module. It uses the internal _string module. it might be a good FR to ask to expose the _string.formatter_field_name_split method for this kind of usage.

pyctrl · February 23, 2024, 5:34pm

@MegaIng thank you! Noted all advices.

For FR should I start new post in Core Development? Or there is a better place?

MegaIng · February 23, 2024, 5:39pm

I don’t know if this is a large enough change to require a discussion in Ideas (but Core Development wouldn’t be the correct place either way). But I think making a post there doesn’t hurt.

steve.dower · February 26, 2024, 4:49pm

As a general rule, our documentation specifies what must be accepted, but doesn’t set an upper limit, since that’s how Python tends to operate.

For example, plenty of APIs will specify that they accept a list or a string, but will also accept other objects that look like those. Users may be “out of spec” when they use them like that, but we don’t see any reason to actively block them - we assume they know what they’re doing (often called the “we’re all consenting adults here” principle).

This means that occasionally we get stuck in a place where, when people make a fuss about the gap,^[1] we either close it in one direction or the other. Sometimes, if we can draw the exact boundary and are confident it doesn’t have any negative impacts, we’ll update the documentation. Other times, we’ll add errors and break some users so that the complainers stop complaining. At a guess, this is in the first category, but it should definitely be raised in Core Development to get a proper consensus.

If the goal is to raise an error for an invalid template, the best way to do this is to instantiate the template. If you’ve got enough information to do it on construction, then go for it. Otherwise, rely on your users doing their testing, as they should, and they’ll find their own mistakes that way.

At some point, as a library author, you get to say “this uses Python’s string formatting and follows the same rules” without having to be responsible for reimplementing them.

Not saying you’re making a fuss this time, but sometimes people do demand that we change something… anything! ↩︎

Topic		Replies	Views
Non-identifier names of kwargs, attributes, variables etc Core Development	10	1835	September 26, 2022
I'd like to understand the PEP-501 array-of-2-tuples representation Ideas	5	1076	November 8, 2018
Bad representation of expressions with "-" Documentation documentation	9	533	January 27, 2024
PEP 501: (reopen) General purpose string template literals PEPs	20	2670	April 29, 2023
A Python Syntax question [Answered] Python Help	3	501	May 31, 2022

Format String Syntax specification differs from actual behaviour

Related Topics