Enhance type name formatting when raising an exception: add %T format in C, and add type.__fullyqualname__

Hi,

tl; dr: I propose:

  • Python: Add type.__fullyqualname__ read-only attribute: type.__module__ + '.' + type.__qualname__, or type.__qualname__ is type.__module__ is equal to "builtins".
  • C API: Add %T (type(obj).__name__) and %#T (type(obj).__fullyqualname__) formats to PyUnicode_FromFormat(), and so to PyErr_Format()

What do you think of that?


In C, it’s common to format a type name with code like:

PyErr_Format(PyExc_TypeError,
             "__format__ must return a str, not %.200s",
             Py_TYPE(result)->tp_name);

This code has multiple issues:

  • It cannot be used with the limited C API which cannot access PyTypeObject.tp_name member.
  • It’s inefficient: tp_name is a UTF-8 bytes string, it must be decoded at each call to create the Unicode error message.
  • (Minor issue?) Py_TYPE() returns a borrowed reference. In more complicated code, the pointer can become a dangling pointer before the type name is formatting and so we may or may not crash.
  • By the way, %.200s format to truncate the name to 200 characters comes from an old (fixed) limitation of CPython which used buffer of fixed size (ex: 500 bytes). IMO it’s bad to truncate a name without indicating that the string is truncated. Moreover, we don’t do that in Python, so we should not do it in C: code written in C should have the same behavior then Python code (see PEP 399 which is related).

I propose adding %T format to PyUnicode_FromFormat(): format the object type name: similar to type(obj).__name__ in Python. The example becomes just:

PyErr_Format(PyExc_TypeError,
             "__format__ must return a str, not %T", result);

Simpler, safer, faster and shorter code!

Note: my implementation supports %.200T format is you really love truncating type names :slight_smile:


In some cases, we might want to display more information about the type: the module where the type was defined and the qualified name. I propose to add also %#T format to get type.__module__ + '.' + type.__qualname__, or just type.__qualname__ is type.__module__ is equal to "builtins".

It’s bad to add an API only accessible in C, so I also propose adding a read-only type.__fullyqualname__ attribute which formats the type name the same way: similar to repr(type) without <class ' prefix and '> suffix :slight_smile:

I’m not sure about the name, in the past, type.__fqn__ was proposed, but this accronym doesn’t fit with other type attributes: none of them are acronyms. Such short name might sound cryptic.

The opposite would be a fully expanded name: type.__fullyqualifiedname__ (qualified instead of qual). :smiley:

I’m not fully convinced that formatting a “fully qualified type name” is needed. In general, it’s rare to define two types with the same name in a project. But it may be helpful to distinguish two types with the same “short name” (type.__name__).

There is already type.__qualname__. Maybe the C format #T should use this one instead?

Note: in C, %t is now used by ptrdiff_t type (ex: ptrdiff x = 1; printf("x=%td\n", x);).


Some people also asked to add a similar API to format a type name in Python, but I’m not sure about that.

In Python, it’s easy and reliable to get an object type: type(obj) (or obj.__class__). It’s rare (and a bad idea) to override built-in type() function in a function (and if you do it, you may have other issues).

Moreover, formatting a type name in Python is also easy and straightforward: type.__name__. Done!

Full examples (extracted from the stdlib):

raise TypeError('expected AST, got %r' % node.__class__.__name__)
raise TypeError("key: expected bytes or bytearray, but got %r" % type(key).__name__)

Still, some people asked to “a new API” to format a type name. Well, it would be possible to add T and #T formats to type.__format__. Example:

raise TypeError(f'expected AST, got {node.__class__:T}')
raise TypeError(f"key: expected bytes or bytearray, but got {type(key):T}")  # remove quotes

I’m not convinced that “a magic T format” is better or more explicit than the short and straightforward type.__name__ code.

For the “fully qualified name”, you will be able to write:

raise TypeError(f'expected AST, got {node.__class__.__fullyqualname__}')
raise TypeError(f"key: expected bytes or bytearray, but got {type(key).__fullyqualname__}")

For me, the main problem of adding a new API to Python is that the proposed API for C expects an object, whereas here in Python I’m proposing a new API for types (type(obj)). It can be surprising or be error-prone to have a similar API (T format) in C and Python, but expect a different argument (object vs type).

It was proposed to add !t formatter to get the type of an object, but Eric Smith was against this. As I wrote, getting an object type is simple in Python, especially in f-string.


See also:

2 Likes

%T sounds good, I often wished we had that :slight_smile:
Instead of adding a computed attribute, __fullyqualname__, let’s maybe put a function in e.g. inspect? It could handle functions as well as types.
The f"{type(x):#T}" sounds great too, except T might not be the right choice for a type-specific directive.

1 Like

We need API for objects and for types in C, unless you want to keep explicit Py_TYPE() calls. I propose to use # to distinguish these two variations.

For different kinds of names we can use the “size” modifier. Currently l, ll, z, t, j are supported, h and hh can be added if this is not enough. We need the following kinds:

  1. t.__name__
  2. t.__qualname__
  3. t.__module__ + '.' + __qualname__
  4. Same as the previous, but omit the module name if it is “builtins” or “__main__”.

It covers virtually all of current uses.

1 Like

There’s also __module__ + ':' + __qualname__, separated by semicolon rather than a dot.
Separating the module from the “path” to follow using getattr eliminates guesswork when you want to import the name, see pkgutil.resolve_name.

Do you have examples where you have a type instead of an object? Is it common enough? You can use PyType_GetName() for these cases, no?

There are more than 400 lines using Py_TYPE(obj)->tp_name for format error messages in the C code of Python.

There are more than 120 lines using ->tp_name without Py_TYPE(). Ratio is about 1:3.

find -name '*.c' -exec egrep '[a-z0-9]->tp_name' '{}' +

If there is a lot of C code which needs to format a type name, maybe a more generic solution would be to put the Py_TYPE() borrowed reference aside and use %T format for types. So replace:

PyErr_Format(PyExc_TypeError,
             "__format__ must return a str, not %.200s",
             Py_TYPE(result)->tp_name);

with:

PyErr_Format(PyExc_TypeError,
             "__format__ must return a str, not %T",
             Py_TYPE(result));

Example using directly a type:

            PyErr_Format(PyExc_TypeError,
                         "%.500s() takes a %zd-sequence (%zd-sequence given)",
                         type->tp_name, min_len, len);

would become:

            PyErr_Format(PyExc_TypeError,
                         "%T() takes a %zd-sequence (%zd-sequence given)",
                         Py_TYPE(type), min_len, len);
1 Like

The %T format is definitely something we could have used in Cython (especially for the limited API where tp_name isn’t available).

We have usable workarounds now, but I imagine we’d switch to using it as things upgrade.

3 Likes

Since Python 3.9, a type instance holds a strong reference to its type. While formatting an error message with PyErr_Format(), we are already making the assumption that we are holding a strong reference to the object that we are formatting, and so indirectly to its type.

For static types, well, the reference count doesn’t matter since it’s not possible to delete/deallocate a static type. Python built-in types are even immortal (What Is Dead May Never Die).

In my previous attempt to avoid tp_name in 2018, I tried eaggerly to avoid any possible borrowed references. But well, maybe some borrowed references are safe “under some conditions”. Using a borrowed reference to a type while formatting an object type name sounds safe for example.

“Making the limited API more usable” is my main motivation for this change. Currently, it’s a burden to format an object type in an error message. I’m facing this issue in code generated by Argument Clinic when targetting the limited C API (the current implementation is broken, it doesn’t compile, but it’s not used so it’s ok-ish).

I propose adding the bare minimum, non controversial and most important API: add %T format to PyUnicode_FromFormat() to format a type name (get type.__name__). The argument must be a type, not an object. It would benefit immediately to Python (for Argument Clinic with limited C API, and consider converting the grp extension to the limited C API) and Cython.

Once the %T format will be added and used, we may see better what are the “remaining use cases”, not covered by %T format, and discuss if it’s worth it to extend the API. Currently, none of discussed API exist, and people already manage to write code creating error messages which format type names :slight_smile: So there is not a strong need to add more APIs.

For me, the most important use case here is to get rid of code reading directly PyTypeObject.tp_name member directly. To make %T format usable in the limited C API and prepare a migration path to make the PyTypeObject structure opaque (remove members from the public API).

Later, we can extend the API to add %#T format, consider adding formats to type.__format__(), add variants, add “fully qualified name”, etc. It’s compatible with adding %T right now.


I’m still not convinced yet that it is worth it to add f"expect str, got {type(obj):T}" (add T format to type.__format__()), since f"expect str, got {type(obj).__name__}" works and already exist.

I’m not against it, I’m just not convinced. Maybe if we add an alternative #T format which would format a type name as module.qualname, it would be worth it. If this format is available through an inspect function or a type attribute (such as type.__fullyqualname__), why not calling inspect function or reading the type attribute instead?

Multiple formats were proposed for a “fully qualified name”:

  • module.qualname, or qualname if module is equal to "builtins"
  • module.qualname, or qualname if module is equal to "builtins" or "__main__"
  • Variant using colon: module:qualname or qualname if module is equal to "builtins" (or "__main__")

I proposed adding type.__fullyqualname__ attribute, @encukou would prefer an inspect function.

Something was not mentioned recently: we can also consider changing str(type) to return the fully qualified name, so similar output than repr(type) but without <class ' prefix and '> suffix.

Then @storchaka asked for more format such as qualname (type.__qualname__).

I like %T, but none of the other options, changes or additions :slight_smile:

I now invite interested people to review my PR which adds %T format to PyUnicode_FromFormat(): gh-111696: Add %T format to PyUnicode_FromFormat() by vstinner · Pull Request #111703 · python/cpython · GitHub

The size modifiers can be used as format specifiers in type.__format__(), without “T”. E.g. PyUnicode_FromFormat("%zT", Py_TYPE(obj)) in C and f"{type(obj):z}" in Python. Empty format specifier still should be equalent to str().

It means that the size modifiers should be mandatory for %T. It is also less ambiguous. Currently the C code uses Py_TYPE(obj) which in some cases is equalent to the fully qualified name, and in other cases to the short name.

I would prefer to have a short and simply %T format for the most common case: render type.__name__.

If you want to make sure that tomorrow it’s possible to add new format specifiers without breaking backward compatibility, I can add explicit checks to reject format specifiers by raising an exception.

I propose replacing all Py_TYPE(obj)->tp_name with %T (so type.__name__). Later, we can decide if in some cases, rendering type.__qualname__ or even include the type module (“fully qualified name”) would be better. I propose to reserve %#T for such future usage if we decide to do that.


An alternative option would be render exactly Py_TYPE(obj)->tp_name when the %T format is used. But I dislike this option since as I wrote before, we should have C code which behave as Python code: it should not be possible to do something in C which is not possible in Python. In Python, the most common pattern is type(obj).__name__ (or variants of that which give the same string).

But in C the most common pattern is different, and we discuss the C API feature. It is equivalent to the fully qualified name for extension types. Even some Python code tries to emulate this, but it is cumbersome.

How is f"{type(obj):z}" better than f"{type(obj).__name__}"? Is it because it’s shorter? Is it more convenient? Do you want to replace existing f"{type(obj).__name__}" code with f"{type(obj):z}"?

I don’t see the need to change the Python API. I don’t think that it’s important to have exactly the same API in Python and in C.

1 Like

What if it is f"{type(obj).__module__}.{type(obj).__qualname__}"?

I wrote a draft PR to change str(type). The PR is backward incompatible and requires to change many modules and tests:

  • enum
  • functools
  • optparse
  • pdb
  • xmlrcp.server
  • test_dataclasses
  • test_descrtut
  • test_cmd_line_script

Having to replace type(value) with repr(type(value)) and replace f"{cls} ..." with f"{cls!r} ..." to keep the same behavior than Python 3.12 sounds “unpleasant”.

In the past, when similar changes were done, we got many complaints. Sometimes changes were reverted, like str() and/or repr() changes in the enum module.

I don’t think that changing str(type) is a reasonable approach.

I propose a PR to add type.__fullyqualname__ read-only attribute and PyType_GetFullyQualName() function. This PR is fully backward compatible: it doesn’t impact existing code.

If this PR is merged, we can consider adding two formats to PyUnicode_FromFormat():

  • "%T" formats type.__name__
  • "%#T" formats type.__fullyqualname__

Currently, PyErr_Format(exc, "... %s ...", type->name_) is different depending on the type:

  • type.__name__ for types implemented in Python (class MyType: ...).
  • type.__fullyqualname__ for static types and heap types.

I don’t think that type.__qualname__ is currently used when a type name is formatted in C using type->tp_name.

In C, PyType_GetQualName(type) can be called to format type.__qualname__.

I tried to implement that, but it looks surprising if repr(type) and type.__fullyqualname__ don’t use the same separator:

>>> import collections
>>> collections.OrderedDict
<class 'collections.OrderedDict'>
>>> collections.OrderedDict.__fullyqualname__
'collections:OrderedDict'

The colon in collections:OrderedDict looks like a typo. I prefer consistency and use a dot (.) in __fullyqualname__ as well:

>>> collections.OrderedDict
<class 'collections.OrderedDict'>
>>> collections.OrderedDict.__fullyqualname__
'collections.OrderedDict'

I’m not sure about the “parse a type name” use case, since a type already has separated attributes to get the different parts of its name:

>>> collections.OrderedDict.__module__
'collections'
>>> collections.OrderedDict.__qualname__
'OrderedDict'
>>> collections.OrderedDict.__name__
'OrderedDict'

I looked at existing stdlib code formatting a fully qualified type name using __module__ and __qualname__: all code using the dot (.) as separator. See my PR (especially the second commit).