PEP 737 – Unify type name formatting

vstinner · November 29, 2023, 1:00pm

PEP 737 – Unify type name formatting is awaiting for your review

Abstract:

Add new convenient APIs to format type names the same way in Python and in C. No longer format type names differently depending on how types are implemented. Also, put an end to truncating type names in C. The new C API is compatible with the limited C API.

I proposed these changes 3 times: 2011 (no longer truncate type names), 2018 and more recently (2023). What changes since my first proposition is that the rationale is now stronger with the limited C API. The limited C API lacks a convenient API to format a type name. Moreover, I decided to keep Py_TYPE() borrowed reference, since it’s safe in fact to use a borrowed reference to the type of an object to format an error message. Using Py_TYPE() makes the C API more convenient to use, it covers more cases (see details in the PEP).

Previously, I tried to discuss each issue separately: no longer truncate type name, format type name in Python (ex: add __fully_qualified_name__ attribute), format type name in C (add %T format). Quickly, the discussions gone into details, but missed the overall picture. So I wrote down a PEP to make the overall picture easier to get (provide a single document containing all information).

There are two topics that I didn’t include in the PEP.

(1) Add __fully_qualified_name__ attribute to more types: functions and methods (which have a __module__ attribute), and generators and coroutines (which don’t have a __module__ attribute). I chose to limit the scope of the PEP to types to make it easier to take a decision on the PEP.

(2) Recommend using the type fully qualified name in error messages and in __repr__() methods. Changing existing error message to replace the short name with the fully qualified name is a backward incompatible change. Moreover, some people may prefer to stick to the type short name. Depending who read error messages, users or sysadmins/developers, the short name or the fully qualified name may be more appropriate.

ronaldoussoren · November 29, 2023, 3:05pm

I like the general idea because it will make formatting type names more consistent and can remove some cargo culting w.r.t. truncated type names.

The PEP states that the borrowed reference returned by Py_TYPE is safe to use during formatting. I’m pretty sure that’s not necessarity true, in edge cases user code can run between the call to Py_TYPE and usage of the result.

For example:

PyErr_Format(PyExc_ValueError,
             "Unexpected value %R of type %T",
             result, Py_TYPE(result));

Here the __repr__ of result is evaluated before the type is used and can result in the type being garbage collected.


class Innocent: pass

def _helper():
     class Guilty:
          def __repr__(self):
                self.__class__ = Innocent
                return "A Guilty Instance"
     return Guilty()

value = _helper()

Using value with the PyErr_Format call mentioned earlier should result in using a garbage collected value.

eric.snow · November 29, 2023, 5:00pm

The type qualified name (type.__qualname__ ) is only used at a single place, by the type.__repr__() implementation.

It is also used by PyErr_Display() (AKA sys.excepthook) and traceback.TracebackException.format_exception_only() (and thus .format(), etc.). ^[1]

The type name should be read using PyType_GetName() , PyType_GetQualName() and PyType_GetModule() functions which are less convenient to use.

FYI, I recently added _PyType_GetModuleName() to the internal API. It may make sense to move that to the public API and even the limited API.

Add type.__fully_qualified_name__ attribute.

For the bike shed: why not make it shorter and closer to the existing attribute, e.g. __full_qualname__?

or type.__qualname__ if type.__module__ is not a string or is equal to "builtins" .

Shouldn’t that also apply for the __main__ module?

You do mention __main__ later in the rejected ideas section, but the point of the PEP seems to be that we want a consistent presentation of the “full” qualified name for types. You mention that pdb omits __main__. Well, so does traceback.TracebackException (and PyErr_Display()).

If the goal is consistency, shouldn’t we then normalize for __main__, either updating pdb/traceback to show it or updating the rest to not?

As of the recent core sprint, traceback.TracebackException.format() is now used for the bulk of the PyErr_Display() implementation ↩︎

Jelle · November 29, 2023, 5:25pm

We discussed some options in a PR previously. Victor originally had __fullyqualname__, but Guido and I both thought that sounded off. He also suggested __fqname__, but I thought that was too cryptic. I suggested __fully_qualified_name__ as an unambiguous, clear name, but I acknowledge it’s rather long.

I could get behind __full_qualname__.

We shouldn’t need to worry about backward compatibility in the text of error messages. I do think it’s appropriate for the PEP to recommend that error messages should by default use the fully qualified name, though there may be circumstances where a different approach works better.

vstinner · November 29, 2023, 6:27pm

I tested your code and by adding gc.collect(), sadly, I confirm that I can crash Python with it:

def _helper():
     class Guilty:
          def __repr__(self):
                self.__class__ = Innocent
                gc.collect()
                return "A Guilty Instance"
     return Guilty()

I tried hard to convince myself that using borrowed references is fine in this case. Sadly, you proved that it’s wrong “Safe” and “borrowed references” don’t seem to go together.

An alternative is to have two formats, one for objects (%T, T stands for Type), one for types (%N, N stands for Name):

%T formats type(arg).__name__
%#T formats type(arg).__fully_qualified_name__
%N formats arg.__name__ – arg must be a type
%#N formats arg.__fully_qualified_name__ – arg must be a type

Maybe %N and %#N can have a fast-path for types, but otherwise get the __name__ and __fully_qualified_name__ attribute, and so work on any object which has these attributes: functions, methods, coroutines and generators. I don’t know, maybe it makes no sense.

vstinner · November 29, 2023, 6:39pm

I wasn’t aware that exception class names are formatted without "__main__." prefix if an exception module is __main__ by these functions. That’s good to know, thanks! You can add sys.unraisablehook to your list: it also skips the __main__ module when formatting the exception class name.

I’m not sure about unifying this.

Skipping __main__ in type.__fully_qualified_name__ would make error messages and repr() shorter for types and exceptions defined in the __main__ script.

If we change type.__fully_qualified_name__, should we also modify type.__repr__() to skip the __main__ module for consistency? In short, type.__repr__() returns f"<class '{self.__fully_qualified_name}'>".

vstinner · November 29, 2023, 6:46pm

Should we actively update the stdlib to replace the type short name with the type fully qualified name in all error messages? Even if we can argue that error messages are not part of the Python backward compatibility contract, I expect that any error message change will impact at least one project Same remark if we change a type __repr__() method to replace the type short name with the type fully qualified name. Some tests rely on the exact/full representation of an object (“even if they should not”).

When I wrote the PEP, I was surprised by the error message:

list indices must be integers or slices, not date

What is “date”? Where does it come from? “date” name is quite generic, I expect that in any large project, you can have multiple different types with the same (short) name, defined in different modules.

I would prefer to get the fully qualified name datetime.date, rather than the short name date.

ronaldoussoren · November 30, 2023, 8:49am

Sorry .

That would work and simplifies formatting even more.

ronaldoussoren · November 30, 2023, 8:52am

What’s the preferred format to use when formatting errors? I’d expect that using the fully qualified name is often the most useful, maybe switch around the two options, e.g. %T for the fully qualified name and %#T for the short name.

vstinner · November 30, 2023, 12:05pm

It’s hard to answer to this question: “it depends” (see below).

IMO the best we can do is to define some recommendations for new code. I would prefer to recommend using the type fully qualified name for repr(). For error messages, as I wrote previously, I think that I also prefer the fully qualified name: format datetime.date, instead of just date which can be misleading.

Error messages in the stdlib:

Python usually uses __name__: the type short name.
C uses the tp_name member:
- the type fully qualified name for C types
- the type short name for Python types.

__repr__() methods in the stdlib:

Python: some methods use __name__: short name
Python: some methods use the fully qualified name
C: it seems like most methods use __name__: short name.
By the way, some methods hardcode their type name, but the short name, not the fully qualified name (the module is omitted).

Currently, it’s hard to format a type name with its fully qualified name. So I don’t think that we should look at how types are formatted right now. But instead think about how what should be the “right” formatting for most use cases.

There is no public nor internal C API to get a type fully qualified name. repr(type) is not what you want: it’s formatted as <class '...'> which cannot be used directly to format a repr() string. Without an API, it don’t see how C code can prefer fully qualified name. But as Eric wrote above, some C functions do format types with their fully qualified name, manually!

encukou · November 30, 2023, 4:24pm

Thank you for writing a PEP to focus the discussion!

As you know, my main concern is about the rejection of the colon separator – the format used by pkgutil.resolve_name or python -m inspect CLI, but also packaging entry points for example. (Interestingly, the format was also added as a unification…)

It is already tricky to get a type from its qualified name. The type qualified name already uses the dot (.) separator between different parts: class name, <locals>, nested class name, etc.

The colon separator is not consistent with dot separator used in a module fully qualified name (module.__name__).

This is not true. The colon separates the module you import from the qualname where you use getattr for the individual parts. Each of the halves uses dots internally.
(And yes, you can’t use <locals> for obvious reasons. But nested classes? Those are actually the main reason this format was added.)

My worry is that if we add the kind of shortcuts this PEP specifies, we’re implicitly saying this is the “one obvious way to do it”. We are discouraging all other formats. If we add __fully_qualified_name__, it will be much harder to add a similar attribute in the future. And the colon-separated format cannot be replaced by one of the 3 unified formats in the PEP: it’s not just a stylistic variant.

I know the PEP focuses on solving issues for the standard library only. But the reach of a new public attribute is much wider than that.

From elsewhere in the PEP:

Type names must not be truncated. For example, the %.100s format should be avoided: use the %s format instead (or %T and %#T formats in C).

Is this a recommendation for new (or touched) code, or an invitation to change all/most occurences of %.100s?
(I’m worried that while the PEP claims to be backwards compatible, related changes could not be. Error message texts aren’t covered by PEP-387, but I think that large-scale changes should be discussed even – or especially – if they aren’t covered by a policy. The discussion here makes me less worried, but it would be better if the PEP said this explicitly.)

Reusing dot and colon characters for a different purpose can be misleading and make the format parser more complicated.

Which format parser are you talking about here?

IMO, it would be good to make assigning to __class__ expensive in order to make Py_TYPE correct. Py_TYPE is used all over the place, after all. See Idea: Make `Py_TYPE(obj)` outlive `obj` · Issue #38 · capi-workgroup/api-evolution · GitHub

eric.snow · December 1, 2023, 4:57pm

Somewhat related: PEP 395 – Qualified Names for Modules | peps.python.org (withdrawn).

vstinner · December 1, 2023, 5:08pm

I prepared a PEP 737 change to address comments.

Changes:

Add %N and %#N formats.
The %T and %#T formats now expect an object instead of a type.
Exchange %T and %#T formats: %T now formats the fully qualified name.
Recommend using the type fully qualified name in error messages and in __repr__() methods when writing new code.

If __main__ is omitted, is there a risk to have other types with the same short name in other modules? Does "__main__." makes the type unambiguous?

Another attribute name which was not proposed so far: type.__fullname__

I updated the PEP to recommend using the fully qualified name when writing new code. I modified %T format to use the fully qualified name.

In my update, I made it explicit: I would like to modify the whole stdlib to no longer truncate type names. Type names longer than 100 characters are unlikely, so this specific change should not affect anyway in practice.

I’m thinking about Python/formatter_unicode.c which uses a regular specification for format() string. The grammar of this specification can be found in the Python documentation.

encukou · December 4, 2023, 7:30am

Aha, the “standard format specifier” for “string, int, and float”. I agree that adding type formatting would complicate it, but I don’t see a reason to add it to this parser.
(FWIW, the grammar for this parser is elsewhere in the documentation.)

vstinner · December 5, 2023, 11:20am

I updated PEP 737:

Add %N and %#N formats.
The %T and %#T formats now expect an object, instead of a type.
Exchange %T and %#T formats: %T now formats the fully qualified name.
Recommend using the type fully qualified name in error messages and in __repr__() methods in new code.
Skip the __main__ module in the fully qualified name. Recommend calling repr(type) or using f"{type.__module__}.{type.__qualname__}" format to include the __main__ module.
Add “Code in the standard library is updated to no longer truncate type names.” Make the plan more explicit.
Complete the “Backward Compatibility” section.

Read PEP 737 – Unify type name formatting for the full rationale on these changes.

The code:

    PyErr_Format(PyExc_TypeError,
                 "__format__ must return a str, not %.200s",
Py_TYPE(result)->tp_name);

becomes:

    PyErr_Format(PyExc_TypeError,
                 "__format__ must return a str, not %T",
                 result);

The Py_TYPE() call is gone! No more borrowed references (no more risk of crashes related to borrowed references).

encukou · December 5, 2023, 12:29pm

I see my main concern did not make it to the PEP, so, consider it repeated here.

vstinner · December 5, 2023, 1:54pm

I suppose that your main concern is using the colon as separator. I recorded your suggestion in the Use colon separator in fully qualified name section. What do you mean by repeating here? Do you mean that the section doesn’t summarize well your arguments? Or that you disagree that dot separator should be the recommended format?

The PEP is about unifying existing code formatting type names. Extract of this section:

In the standard library, no code formats a type fully qualified name this way.

So right, pkgutil.resolve_name() and python -m inspect expect a type fully qualified name using a colon separator. I understand that it’s more convenient to split the “type module” part from the “type qualified name” part in an unambigious way, and it avoids to import modules and to get attributes.

But that format is unique to inspect+pkgutil, everything else in the stdlib uses the dot separator, no?

Also, it’s already possible to split a fully qualified name at the dot separator, and then try to import one part, or use getattr(), to get a type. Example:

import importlib

def resolve(fully_qualified_name):
    parent = None
    obj = None
    use_import = True
    for name in fully_qualified_name.split('.'):
        if use_import:
            if parent:
                module_name = f'{parent}.{name}'
            else:
                module_name = name
            try:
                obj = importlib.import_module(module_name)
                continue
            except ImportError:
                pass

        use_import = False
        obj = getattr(obj, name)
    return obj

print(resolve('datetime.timedelta'))

(Is there already a function doing that in the stdlib?)

I would also prefer type.__fully_qualified_name__ to be close to repr(type), even if there is now a difference: type.__fully_qualified_name__ omits the "__main__." prefix for the __main__ module.

repr(type) output can be copied/pasted in Python REPL, and you get the type (if you already imported the right module). I mean, copy the string between quotes of repr(type) output. It’s somehow a “standard” for repr() in Python (more or less respected) that the output can be used directly in regular code to get/create the same object. With type.__fully_qualified_name__, it’s even simpler: you can just copy/paste type.__fully_qualified_name__ value in the REPL.

Example:

>>> import datetime

# using repr()
>>> datetime.timedelta
<class 'datetime.timedelta'>

# copy/paste
>>> datetime.timedelta
<class 'datetime.timedelta'>

# using __fully_qualified_name__
>>> datetime.timedelta.__fully_qualified_name__
'datetime.timedelta'

# copy/paste (well,it's the same string)
>>> datetime.timedelta
<class 'datetime.timedelta'>

It’s not only about the REPL, you can also paste datetime.timedelta in your source code and “it just works” (again, if you imported the expected module).

Programming languages such as C++ and PHP use namespace::name syntax, whereas Python uses module.name syntax.

By the way, I was always confused by the thin difference between a package sub-module and a module attribute. For example, import os; os.listdir gets an attribute of a module. But from os import listdir works as well. And it’s just the same syntax to get a sub-module: from os import path. Wait, is os.path a module attribute or a sub-module? Well, does it really matter? import os; os.path just works which makes things even more confusing.

Right, the PEP intent is to use a the same (or at least similar) format for type names in the stdlib.

You can already write f"{type.__module__}:{type.__qualname__}". You’re right that not providing a built-in method or attribute discourage using this format.

Copy of merwok’s message (highlight in mine):

The need to resolve a dotted name to a Python object is spreading in the stdlib: pydoc has locate and resolve, packaging has util.resolve_name, unittest has something else, etc. For the benefit of stdlib maintainers as well as the community, I think such functionality should be exposed publicly by the inspect module.

If the inspect module uses this format, why not adding a function to the inspect module to format a type name in the format that it expects, instead of adding an attribute or a method to type? Example, add this function to inspect:

def type_name(cls):
    return f"{type.__module__}:{type.__qualname__}"

The unittest module uses the dot separator. It has the unittest.util function:

def strclass(cls):
    return "%s.%s" % (cls.__module__, cls.__qualname__)

The typing module also uses the dot separator. It has a private _type_repr() function.

PEP 737 scope is limited to types: coroutines, generators, functions and methods are not covered by the PEP on purpose. Entry points can be functions, and maybe other types. Maybe an utility to format all accepted types to the expected format is needed? Or maybe the hypothetical inspect function discussed above can also cover this use case as well?

encukou · December 6, 2023, 1:39pm

Thanks for the reply.

Yes, colon as the separator. Let me summarize to make sure we understand each other, even if you reject my thinking.

Adding an attribute to a core type (especially an attribute that doesn’t expose new information) is, IMO, a rather big deal: I don’t think the rationale should be limited only to the needs of the standard library, nor only to formatting.
Adding an attribute implicitly discourages any other way to format types.

Yes. That’s the “guesswork” that the colon format eliminates.

Yes, pkgutil.resolve_name: this format is intended for backward compatibility only, the docs explain why it’s inadequate.

Yup! Maybe it is.

There are (at least) two variants of a “fully qualified name” (though you’re right that the one with the dot . is better for human-readable descriptions and error messages).
Come to think of it, I also can’t recall a dunder attribute that exposes information readily available in other dunders.
In my mind, this situation maps better to __format__ directives than a new attribute. So forgive me for scrutinizing the reasoning against format directives:

The PEP still says that using dot and colon are already used in “the format specification” and reusing them can “make the format parser more complicated”, without mentioning which specification or parser is meant (the one that’s only for used for str, int, float and complex, despite current docs claiming it’s for “most built-in types”) or why that parser is relevant to type.

That leaves the claim that using short formats requires users to refer to format documentation. That’s a good point – though it doesn’t seem to hurt the str/int/float mini-language.
This line of reasoning leads to options like f"{type(obj):%name}" and f"{type(obj):%module.%name}".

vstinner · December 7, 2023, 9:24am

The question of the separator reminds me the datetime.datetime.isoformat() method: even if it’s an ISO format (ISO 8601), the method has an option to change the separator between the date and the time. The default separator is the ISO 8601 “T” separator.

Since the type name is being discussed (since 2018), the only variant of the fully qualified name that was proposed is the colon separator. Maybe type.__fully_qualified_name__ attribute can become a method with an optional separator: type.fully_qualified_name(sep='.'). It would also address Petr’s concern that it’s the first time that an attribute only computes a value based on other attributes (format a string).

type.fully_qualified_name(':') would format module:qualname.

encukou · December 7, 2023, 10:52am

And that option should be compared with another rejected idea – adding an inspect.fully_qualified_name function.

I can’t quickly decide if the need to format types is

specialized enough that you should explicitly import it when you need it, or
everyday enough to get an abbreviated __format__ directive.

Perhaps these categories overlap, and either (or both) of those would be good.

Note that an inspect.fully_qualified_name would also work on functions (and anything else with the name-related dunders).

Topic		Replies	Views
Enhance type name formatting when raising an exception: add %T format in C, and add type.__fullyqualname__ Core Development	25	1572	November 29, 2023
PEP 697 – Limited C API for Extending Opaque Types PEPs	29	3196	March 7, 2023
Type annotations, PEP 649 and PEP 563 Core Development	25	6572	October 4, 2023
PEP 695: Type Parameter Syntax PEPs	139	19136	September 6, 2023
PEP 688: Making the buffer protocol accessible in Python PEPs	34	4147	March 7, 2023

PEP 737 – Unify type name formatting

Related Topics