PEP 737 – Unify type name formatting

vstinner · December 7, 2023, 2:04pm

The drawback of adding a function to the inspect module (compared to a type method) is the need to import the inspect function to use it.

Adding an inspect import can slow down the startup of an application (if inspect wasn’t already imported). By default, it’s not imported. Importing it imports 35 modules in total (*).

Moreover, if it’s added to inspect, users may want to pass other objects, not only types. Which may not be an issue, maybe later we will support more object types such as functions?

Maybe adding both, type.fully_qualified_name(sep='.') (dot separator by default) and inspect.fully_qualified_name() (colon separator), makes sense. I don’t know.

(*) 35 modules:

_ast
_collections
_functools
_opcode, _opcode_metadata
_operator
_sre
_tokenize
ast
collections, collections.abc
contextlib
copyreg
dis
encodings.latin_1
enum
functools
importlib, importlib._bootstrap, importlib._bootstrap_external, importlib.machinery
inspect
itertools
keyword
opcode
operator
re, re._casefix, re._compiler, re._constants, re._parser
reprlib
token
tokenize
types

vstinner · December 8, 2023, 2:20pm

Draft of the 3rd PEP version: PEP 737: type.fully_qualified_name() method by vstinner · Pull Request #3572 · python/peps · GitHub

Replace type.__fully_qualified_name__ attribute with type.fully_qualified_name(sep='.') method.
Add PyType_GetModuleName() function.

The last point which is left aside (put in the “Rejected Ideas”) is to add formats to type.__format__(), such as:

raise TypeError(f"expected str, got {type(obj):T}")
# or
raise TypeError(f"expected str, got {type(obj):z}")

instead of:

raise TypeError(f"expected str, got {type(obj).fully_qualified_name()}")
# or
raise TypeError(f"expected str, got {type(obj).__name__}"``)

It would be nice to decide if it’s worth it or not to be able to finalize the PEP.

Oh right, I added it to the PEP as well. I was confused by existing PyType_GetModule() which returns a module object, instead of the module name (string).

encukou · December 12, 2023, 3:34pm

Thank you. This addresses my concerns. I still think it should be a function or format specifier, but now that you have all the arguments for those ideas you’re in good place to reject them.

Why does it not have double underscores?
There is precedent for that: type.mro. But that looks like a historical accident; usually the non-dunder namespace should be left to users.

vstinner · December 12, 2023, 4:27pm

It’s explained in the section: Other proposed APIs to get a type fully qualified name.

The trend is to move away from legacy naming convention without underscores towards underscore separator:

sys: old getrecursionlimit(), new get_int_max_str_digits()
threading: old isAlive(), new is_alive()
https://discuss.python.org/t/change-environment-variable-style/35180: old PYTHONPATH, new PYTHON_CPU_COUNT
etc.

AlexWaygood · December 12, 2023, 7:49pm

That section seems to explain why the name has underscores between the words – i.e. why the method is named type.fully_qualified_name() rather than type.fullyqualifiedname(). But I believe @encukou is asking (and I maybe have the same question) why the method doesn’t have double underscores at the beginning and end – i.e. why the method is named type.fully_qualified_name() rather than type.__fully_qualified_name__()?

h-vetinari · December 12, 2023, 10:35pm

I think it’s a bit counter-intuitive that the shorter name has the longer format string. Why not make %T expand to the short name and %#T expand to the FQN? Same goes for %N.

ronaldoussoren · December 14, 2023, 9:23am

Because the long name is preferred for messages.

vstinner · December 14, 2023, 11:32am

Oh. Most Python built-in types have methods without leading/trailing underscores: int.from_bytes(), float.as_hex(), str.strip(), dict.items(), etc. Usually, the __xxx__() name is more used for protocols: methods which should not be called directly. Such as __add__() which is used with x + y, or __enter__() which is used with with obj: ....

In Python, type is the base type of all types. Adding an attribute or a method to type adds it to all types. If adding a method (without dunder) to type is an issue, I would prefer to stick to the type.__fully_qualified_name__ attribute (reject the draft PR which switch to a method). I suppose that adding a dunder (__xxx__) attribute to type is less risky to impact existing code.

In addition to the type.__fully_qualified_name__ attribute, an option would be to add also a inspect.fully_qualified_name(type) function to format a type fully qualified name with the colon separator.

The PEP title is “Unify type name formatting”. The PEP recommends using the type fully qualified name in error messages and in __repr__() methods in new code: Recommend using the type fully qualified name. So the %T and %N formats use the fully qualified name to implement this recommendation.

AlexWaygood · December 14, 2023, 1:24pm

If we were designing Python from scratch, maybe it would have made sense to call the method fully_qualified_name(). But adding it now, I think there is too much risk of it conflicting with pre-existing user-defined methods or attributes with that name on type subclasses (and there are obviously many subclasses of type) if we give it that name. I think any new method or attribute should have a dunder name, since dunder names are explicitly reserved for use by Python’s internals and the stdlib.

pitrou · December 14, 2023, 1:53pm

While I agree that dunder names are less likely to collide with third-party usage, is there a reference for dunder names being “explicitly reserved for use by Python’s internals and the stdlib”? I can definitely point to third-party dunder methods and related protocols (for example The array interface protocol — NumPy v1.26 Manual).

AlexWaygood · December 14, 2023, 1:58pm

Yes, it’s documented here: 2. Lexical analysis — Python 3.12.1 documentation. Admittedly, I always find it far harder than it should be to dig out that link (I also agree that it’s a shame that lots of third-party libraries are apparently unaware of this )

vstinner · December 14, 2023, 9:29pm

And now something completely different! I wrote a draft PR to add formats. Changes:

Add type.__format__() method.
Add more formats to PyUnicode_FromFormat().
Add PyType_GetModuleName() function.

Before, I tried hard to keep the proposed API as simple as possible. I failed to satisfy all use cases. So I gave up and added format strings. Just two examples (there are more in the PEP):

Python: f"{type(obj):N}" formats type(obj).__fully_qualified_name__.
C: PyErr_Format(PyExc_TypeError, "got %hT", obj) formats type(obj).__name__.

In C, I tried to avoid conflicts with existing printf() formats. I reused existing h (short name), l (qualified name) and z (module name) size modifiers, as proposed by Serhiy before.

In Python, I tried to pick letters easier to remember: n (short name), q (qualified name), m (module name), N (fully qualified Name).

C and Python formats are different. C format combines a size modified and a format, such as %hT. Python format (for now) is always just a letter, such as n.

The # alternative form now uses colon (#) separator between the module name and the qualified name.

Format a type:

C object	C type	Python	Format
`%hT`	`%hN`	`:n`	Type short name
`%lT`	`%lN`	`:q`	Type qualified name
`%zT`	`%zN`	`:m`	Type module name
`%T`	`%N`	`:N`	Type fully qualified name
`%#T`	`%#N`	`:#N`	Type fully qualified name, colon separator

I’m not sure about n and N in Python, lower-case and upper-case N. Maybe n should be replaced with s, where s stands for short name, to avoid confusion? It can already be tricky to distinguish n and m letters which can look similar.

In Python, these proposed formats are just a compact syntax to format a type. You can obviously access directly type attributes in f-strings. Examples:

f"{type(obj).__name__}"
f"{type(obj).__qualname__}"
f"{type(obj).__fully_qualified_name__}"
f"{type(obj).__module__}:{type(obj).__qualname__}"

I prefer accessing directly attributes, but the compact syntax has enough supporters, so I decided to propose these formats

jamestwebber · December 14, 2023, 9:50pm

Are you proposing to add five specifiers for types to the string format specification? That seems excessive. I’ll also add that n is already used for number. edit: and s is string, of course

vstinner · December 14, 2023, 10:02pm

The existing string format specification is used by numbers: __format__() methods of int, float, complex, and decimal.Decimal.

I propose to have a different format spec only used by type.__format__(). Are there reasons to use the same format spec for all stdlib types?

There are other existing stdlib types which use a different format spec, such as datetime.datetime:

>>> import datetime; d=datetime.datetime.now(); f"{d:at %Hh%M}"
'at 22h56'

Other examples:

ipaddress.IPv4Address: b, X, x and n formats
fractions.Fraction: use the same format spec than int
enum.Enum: similar to format(str(value), format_spec)

jamestwebber · December 14, 2023, 10:28pm

Ah I think I had a bad mental model for how formatting works… It calls __format__ on the object with any relevant code and if the code is invalid that’d be an error.

As opposed to my incorrect idea that it sees a code and checks for the correct type. Never mind!

tungol · December 25, 2023, 2:38am

My two cents: It’s true that recent dunders have internal underscores, and that’s a good thing for clarity, but they all have only a single internal underscore whereas __fully_qualified_name__ has two, on top of ‘qualified’ being somewhat long already. I like __fullname__: it seems perfectly clear, it’s concise, and I think the symmetry with the existing __qualname__ justifies skipping the internal underscore. __full_qualname__ also seems like an improvement over __fully_qualified_name__ to me, but not a strong improvement.

vstinner · January 9, 2024, 2:34pm

@encukou @eric.snow @storchaka: It seems like you are in favor of “adding formats”. You didn’t react to these specific proposed formats so far. Currently, it’s still a draft to update the PEP. Do you have an opinion on it?

encukou · January 9, 2024, 4:00pm

It seems to me that these aren’t really worth it:

{type:n} instead of {type.__name__}
{type:q} instead of {type.__qualname__}
{type:m} instead of {type.__module__}

But the ones for the fully-qualified name hide enough complexity to make the shortcut worth it. And they’re also the ones we want to encourage – what better way to do that than give them a shortcut:

{type:N} instead of {type.__module__}.{type.__qualname__} (with module omitted for builtins/main)
{type:#N} instead of {type.__module__}:{type.__qualname__} (with module omitted for builtins/main)

(N might not be the best name, that’s up for bikeshedding.)

Similarly for C – I don’t think the shorter ones should be used widely (for new code at least). Also, Py_TYPE(obj) is rather trivial to do, I don’t think shortening it to a single letter buys us much. (Unless you want to deprecate Py_TYPE – that’s a whole other discussion.)

vstinner · January 9, 2024, 5:09pm

Getting a type name in C with the limited C API to format an error message is not trivial, see details in the PEP. For example, it requires more code for error handling which is not trivial. The limited C API doesn’t have access to PyTypeObject members.

encukou · January 10, 2024, 9:05am

Sure, getting an attribute in C is, sadly, not trivial. But why add a shortcut for __name__, in the same PEP that recommends not using __name__ in new code?