Enhance type name formatting when raising an exception: add %T format in C, and add type.fullyqualname

vstinner · November 7, 2023, 10:39am

Hi,

tl; dr: I propose:

Python: Add type.__fullyqualname__ read-only attribute: type.__module__ + '.' + type.__qualname__, or type.__qualname__ is type.__module__ is equal to "builtins".
C API: Add %T (type(obj).__name__) and %#T (type(obj).__fullyqualname__) formats to PyUnicode_FromFormat(), and so to PyErr_Format()

What do you think of that?

In C, it’s common to format a type name with code like:

PyErr_Format(PyExc_TypeError,
             "__format__ must return a str, not %.200s",
             Py_TYPE(result)->tp_name);

This code has multiple issues:

It cannot be used with the limited C API which cannot access PyTypeObject.tp_name member.
It’s inefficient: tp_name is a UTF-8 bytes string, it must be decoded at each call to create the Unicode error message.
(Minor issue?) Py_TYPE() returns a borrowed reference. In more complicated code, the pointer can become a dangling pointer before the type name is formatting and so we may or may not crash.
By the way, %.200s format to truncate the name to 200 characters comes from an old (fixed) limitation of CPython which used buffer of fixed size (ex: 500 bytes). IMO it’s bad to truncate a name without indicating that the string is truncated. Moreover, we don’t do that in Python, so we should not do it in C: code written in C should have the same behavior then Python code (see PEP 399 which is related).

I propose adding %T format to PyUnicode_FromFormat(): format the object type name: similar to type(obj).__name__ in Python. The example becomes just:

PyErr_Format(PyExc_TypeError,
             "__format__ must return a str, not %T", result);

Simpler, safer, faster and shorter code!

Note: my implementation supports %.200T format is you really love truncating type names

In some cases, we might want to display more information about the type: the module where the type was defined and the qualified name. I propose to add also %#T format to get type.__module__ + '.' + type.__qualname__, or just type.__qualname__ is type.__module__ is equal to "builtins".

It’s bad to add an API only accessible in C, so I also propose adding a read-only type.__fullyqualname__ attribute which formats the type name the same way: similar to repr(type) without <class ' prefix and '> suffix

I’m not sure about the name, in the past, type.__fqn__ was proposed, but this accronym doesn’t fit with other type attributes: none of them are acronyms. Such short name might sound cryptic.

The opposite would be a fully expanded name: type.__fullyqualifiedname__ (qualified instead of qual).

I’m not fully convinced that formatting a “fully qualified type name” is needed. In general, it’s rare to define two types with the same name in a project. But it may be helpful to distinguish two types with the same “short name” (type.__name__).

There is already type.__qualname__. Maybe the C format #T should use this one instead?

Note: in C, %t is now used by ptrdiff_t type (ex: ptrdiff x = 1; printf("x=%td\n", x);).

Some people also asked to add a similar API to format a type name in Python, but I’m not sure about that.

In Python, it’s easy and reliable to get an object type: type(obj) (or obj.__class__). It’s rare (and a bad idea) to override built-in type() function in a function (and if you do it, you may have other issues).

Moreover, formatting a type name in Python is also easy and straightforward: type.__name__. Done!

Full examples (extracted from the stdlib):

raise TypeError('expected AST, got %r' % node.__class__.__name__)
raise TypeError("key: expected bytes or bytearray, but got %r" % type(key).__name__)

Still, some people asked to “a new API” to format a type name. Well, it would be possible to add T and #T formats to type.__format__. Example:

raise TypeError(f'expected AST, got {node.__class__:T}')
raise TypeError(f"key: expected bytes or bytearray, but got {type(key):T}")  # remove quotes

I’m not convinced that “a magic T format” is better or more explicit than the short and straightforward type.__name__ code.

For the “fully qualified name”, you will be able to write:

raise TypeError(f'expected AST, got {node.__class__.__fullyqualname__}')
raise TypeError(f"key: expected bytes or bytearray, but got {type(key).__fullyqualname__}")

For me, the main problem of adding a new API to Python is that the proposed API for C expects an object, whereas here in Python I’m proposing a new API for types (type(obj)). It can be surprising or be error-prone to have a similar API (T format) in C and Python, but expect a different argument (object vs type).

It was proposed to add !t formatter to get the type of an object, but Eric Smith was against this. As I wrote, getting an object type is simple in Python, especially in f-string.

See also:

New Issue: PyUnicode_FromFormat(): Add %T format to format the type name of an object · Issue #111696 · python/cpython · GitHub
My previous issue in 2018: PyUnicode_FromFormat(): add %T format for an object type name · Issue #78776 · python/cpython · GitHub
python-dev discussion in 2018: Mailman 3 bpo-34595: How to format a type name? - Python-Dev - python.org

encukou · November 7, 2023, 1:27pm

%T sounds good, I often wished we had that
Instead of adding a computed attribute, __fullyqualname__, let’s maybe put a function in e.g. inspect? It could handle functions as well as types.
The f"{type(x):#T}" sounds great too, except T might not be the right choice for a type-specific directive.

storchaka · November 7, 2023, 1:38pm

We need API for objects and for types in C, unless you want to keep explicit Py_TYPE() calls. I propose to use # to distinguish these two variations.

For different kinds of names we can use the “size” modifier. Currently l, ll, z, t, j are supported, h and hh can be added if this is not enough. We need the following kinds:

t.__name__
t.__qualname__
t.__module__ + '.' + __qualname__
Same as the previous, but omit the module name if it is “builtins” or “__main__”.

It covers virtually all of current uses.

encukou · November 7, 2023, 2:46pm

There’s also __module__ + ':' + __qualname__, separated by semicolon rather than a dot.
Separating the module from the “path” to follow using getattr eliminates guesswork when you want to import the name, see pkgutil.resolve_name.

vstinner · November 7, 2023, 5:47pm

Do you have examples where you have a type instead of an object? Is it common enough? You can use PyType_GetName() for these cases, no?

There are more than 400 lines using Py_TYPE(obj)->tp_name for format error messages in the C code of Python.

storchaka · November 7, 2023, 6:20pm

There are more than 120 lines using ->tp_name without Py_TYPE(). Ratio is about 1:3.

find -name '*.c' -exec egrep '[a-z0-9]->tp_name' '{}' +

vstinner · November 7, 2023, 10:43pm

If there is a lot of C code which needs to format a type name, maybe a more generic solution would be to put the Py_TYPE() borrowed reference aside and use %T format for types. So replace:

PyErr_Format(PyExc_TypeError,
             "__format__ must return a str, not %.200s",
             Py_TYPE(result)->tp_name);

with:

PyErr_Format(PyExc_TypeError,
             "__format__ must return a str, not %T",
             Py_TYPE(result));

Example using directly a type:

            PyErr_Format(PyExc_TypeError,
                         "%.500s() takes a %zd-sequence (%zd-sequence given)",
                         type->tp_name, min_len, len);

would become:

            PyErr_Format(PyExc_TypeError,
                         "%T() takes a %zd-sequence (%zd-sequence given)",
                         Py_TYPE(type), min_len, len);

da-woods · November 8, 2023, 8:14pm

The %T format is definitely something we could have used in Cython (especially for the limited API where tp_name isn’t available).

We have usable workarounds now, but I imagine we’d switch to using it as things upgrade.

vstinner · November 9, 2023, 3:12pm

Since Python 3.9, a type instance holds a strong reference to its type. While formatting an error message with PyErr_Format(), we are already making the assumption that we are holding a strong reference to the object that we are formatting, and so indirectly to its type.

For static types, well, the reference count doesn’t matter since it’s not possible to delete/deallocate a static type. Python built-in types are even immortal (What Is Dead May Never Die).

In my previous attempt to avoid tp_name in 2018, I tried eaggerly to avoid any possible borrowed references. But well, maybe some borrowed references are safe “under some conditions”. Using a borrowed reference to a type while formatting an object type name sounds safe for example.

vstinner · November 9, 2023, 3:13pm

“Making the limited API more usable” is my main motivation for this change. Currently, it’s a burden to format an object type in an error message. I’m facing this issue in code generated by Argument Clinic when targetting the limited C API (the current implementation is broken, it doesn’t compile, but it’s not used so it’s ok-ish).

vstinner · November 9, 2023, 10:57pm

I propose adding the bare minimum, non controversial and most important API: add %T format to PyUnicode_FromFormat() to format a type name (get type.__name__). The argument must be a type, not an object. It would benefit immediately to Python (for Argument Clinic with limited C API, and consider converting the grp extension to the limited C API) and Cython.

Once the %T format will be added and used, we may see better what are the “remaining use cases”, not covered by %T format, and discuss if it’s worth it to extend the API. Currently, none of discussed API exist, and people already manage to write code creating error messages which format type names So there is not a strong need to add more APIs.

For me, the most important use case here is to get rid of code reading directly PyTypeObject.tp_name member directly. To make %T format usable in the limited C API and prepare a migration path to make the PyTypeObject structure opaque (remove members from the public API).

Later, we can extend the API to add %#T format, consider adding formats to type.__format__(), add variants, add “fully qualified name”, etc. It’s compatible with adding %T right now.

I’m still not convinced yet that it is worth it to add f"expect str, got {type(obj):T}" (add T format to type.__format__()), since f"expect str, got {type(obj).__name__}" works and already exist.

I’m not against it, I’m just not convinced. Maybe if we add an alternative #T format which would format a type name as module.qualname, it would be worth it. If this format is available through an inspect function or a type attribute (such as type.__fullyqualname__), why not calling inspect function or reading the type attribute instead?

Multiple formats were proposed for a “fully qualified name”:

module.qualname, or qualname if module is equal to "builtins"
module.qualname, or qualname if module is equal to "builtins" or "__main__"
Variant using colon: module:qualname or qualname if module is equal to "builtins" (or "__main__")

I proposed adding type.__fullyqualname__ attribute, @encukou would prefer an inspect function.

Something was not mentioned recently: we can also consider changing str(type) to return the fully qualified name, so similar output than repr(type) but without <class ' prefix and '> suffix.

Then @storchaka asked for more format such as qualname (type.__qualname__).

malemburg · November 10, 2023, 9:13am

I like %T, but none of the other options, changes or additions

vstinner · November 14, 2023, 10:56am

I now invite interested people to review my PR which adds %T format to PyUnicode_FromFormat(): gh-111696: Add %T format to PyUnicode_FromFormat() by vstinner · Pull Request #111703 · python/cpython · GitHub

storchaka · November 14, 2023, 11:40am

The size modifiers can be used as format specifiers in type.__format__(), without “T”. E.g. PyUnicode_FromFormat("%zT", Py_TYPE(obj)) in C and f"{type(obj):z}" in Python. Empty format specifier still should be equalent to str().

It means that the size modifiers should be mandatory for %T. It is also less ambiguous. Currently the C code uses Py_TYPE(obj) which in some cases is equalent to the fully qualified name, and in other cases to the short name.

vstinner · November 14, 2023, 12:11pm

I would prefer to have a short and simply %T format for the most common case: render type.__name__.

If you want to make sure that tomorrow it’s possible to add new format specifiers without breaking backward compatibility, I can add explicit checks to reject format specifiers by raising an exception.

I propose replacing all Py_TYPE(obj)->tp_name with %T (so type.__name__). Later, we can decide if in some cases, rendering type.__qualname__ or even include the type module (“fully qualified name”) would be better. I propose to reserve %#T for such future usage if we decide to do that.

An alternative option would be render exactly Py_TYPE(obj)->tp_name when the %T format is used. But I dislike this option since as I wrote before, we should have C code which behave as Python code: it should not be possible to do something in C which is not possible in Python. In Python, the most common pattern is type(obj).__name__ (or variants of that which give the same string).

storchaka · November 14, 2023, 12:26pm

But in C the most common pattern is different, and we discuss the C API feature. It is equivalent to the fully qualified name for extension types. Even some Python code tries to emulate this, but it is cumbersome.

vstinner · November 14, 2023, 12:53pm

How is f"{type(obj):z}" better than f"{type(obj).__name__}"? Is it because it’s shorter? Is it more convenient? Do you want to replace existing f"{type(obj).__name__}" code with f"{type(obj):z}"?

I don’t see the need to change the Python API. I don’t think that it’s important to have exactly the same API in Python and in C.

storchaka · November 14, 2023, 1:10pm

What if it is f"{type(obj).__module__}.{type(obj).__qualname__}"?

vstinner · November 15, 2023, 11:02pm

I wrote a draft PR to change str(type). The PR is backward incompatible and requires to change many modules and tests:

enum
functools
optparse
pdb
xmlrcp.server
test_dataclasses
test_descrtut
test_cmd_line_script

Having to replace type(value) with repr(type(value)) and replace f"{cls} ..." with f"{cls!r} ..." to keep the same behavior than Python 3.12 sounds “unpleasant”.

In the past, when similar changes were done, we got many complaints. Sometimes changes were reverted, like str() and/or repr() changes in the enum module.

I don’t think that changing str(type) is a reasonable approach.

I propose a PR to add type.__fullyqualname__ read-only attribute and PyType_GetFullyQualName() function. This PR is fully backward compatible: it doesn’t impact existing code.

If this PR is merged, we can consider adding two formats to PyUnicode_FromFormat():

"%T" formats type.__name__
"%#T" formats type.__fullyqualname__

Currently, PyErr_Format(exc, "... %s ...", type->name_) is different depending on the type:

type.__name__ for types implemented in Python (class MyType: ...).
type.__fullyqualname__ for static types and heap types.

I don’t think that type.__qualname__ is currently used when a type name is formatted in C using type->tp_name.

In C, PyType_GetQualName(type) can be called to format type.__qualname__.

vstinner · November 16, 2023, 2:23am

I tried to implement that, but it looks surprising if repr(type) and type.__fullyqualname__ don’t use the same separator:

>>> import collections
>>> collections.OrderedDict
<class 'collections.OrderedDict'>
>>> collections.OrderedDict.__fullyqualname__
'collections:OrderedDict'

The colon in collections:OrderedDict looks like a typo. I prefer consistency and use a dot (.) in __fullyqualname__ as well:

>>> collections.OrderedDict
<class 'collections.OrderedDict'>
>>> collections.OrderedDict.__fullyqualname__
'collections.OrderedDict'

I’m not sure about the “parse a type name” use case, since a type already has separated attributes to get the different parts of its name:

>>> collections.OrderedDict.__module__
'collections'
>>> collections.OrderedDict.__qualname__
'OrderedDict'
>>> collections.OrderedDict.__name__
'OrderedDict'

I looked at existing stdlib code formatting a fully qualified type name using __module__ and __qualname__: all code using the dot (.) as separator. See my PR (especially the second commit).

Enhance type name formatting when raising an exception: add %T format in C, and add type.__fullyqualname__

Enhance type name formatting when raising an exception: add %T format in C, and add type.fullyqualname