Naming consistency for various types

tungol · November 7, 2024, 3:49am

A number of classes self-report names which are not an importable name for that type.
The inconsistency means that they can’t be pickled, typeshed can’t match their display name, and can cause minor confusion. I’m interested in getting these to be consistent where possible, so this thread is for any discussion before I go anywhere with that. The issue can be solved by either adding an additional importable alias for the type, or changing what the type names itself. Both of these approaches have sometimes been done in similar situations in the past.

Here’s all the ones I am aware of, broken up by category. I’m not expecting or proposing to create consistent naming for all of these, just trying to be complete:

These are types defined in C where the assigned name doesn’t match the import name:

_json.make_encoder calls itself _json.Encoder
_json.make_scanner calls itself _json.Scanner
pyexpat.XMLParserType / xml.parsers.expat.XMLParserType calls itself pyexpat.xmlparser
signal.ItimerError calls itself signal.itimer_error
_tkinter.TkappType calls itself _tkinter.tkapp
_tkinter.TkttType calls itself _tkinter.tktimertoken

The json types make sense to me as Encoder/Scanner and I think those should be added as a new alias. For the others, the current importable name seems more standard and I’d lean towards updating the internal name to match.

Several mismatches related to the new _interpqueues and _interpreters modules:

_interpqueues.QueueError calls itself test.support.interpreters.QueueError
_interpqueues.QueueNotFoundError calls itself test.support.interpreters.QueueNotFoundError
_interpreters.InterpreterError calls itself interpreters.InterpreterError
_interpreters.InterpreterNotFoundError calls itself interpreters.InterpreterNotFoundError
_interpreters.NotShareableError calls itself interpreters.NotShareableError

The interpreters.* errors won’t be an issue anymore once a interpreters module is added and they can be imported from there. I’m not certain what’s going on with the naming on the queue errors.

These are named tuples. They’re all private, and probably low importance but low risk to harmonize the names.

functools._CacheInfo calls itself functools.CacheInfo
shutil._ntuple_diskusage calls itself shutil.usage
urllib.parse._DefragResultBase calls itself urllib.parse.DefragResult
urllib.parse._ParseResultBase calls itself urllib.parse.ParseResult
urllib.parse._SplitResultBase calls itself urllib.parse.SplitResult

After that, there’s a bunch of builtin types which can’t be imported from builtins but have been given an official importable location in types. These are probably somewhere between touchy to impossible to make their self-name match an importable name because of how old and important they are. But it would be nice.

types.AsyncGeneratorType calls itself builtins.async_generator
types.BuiltinFunctionType / types.BuiltinMethodType calls itself builtins.builtin_function_or_method
types.CellType calls itself builtins.cell
types.ClassMethodDescriptorType calls itself builtins.classmethod_descriptor
types.CodeType calls itself builtins.code
types.CoroutineType calls itself builtins.coroutine
types.FrameType calls itself builtins.frame
types.FunctionType / types.LambdaType calls itself builtins.function
types.GeneratorType calls itself builtins.generator
types.GetSetDescriptorType calls itself builtins.getset_descriptor
types.MappingProxyType calls itself builtins.mappingproxy
types.MemberDescriptorType calls itself builtins.member_descriptor
types.MethodDescriptorType calls itself builtins.method_descriptor
types.MethodType calls itself builtins.method
types.MethodWrapperType calls itself builtins.method-wrapper
types.ModuleType calls itself builtins.module
types.TracebackType calls itself builtins.traceback
types.WrapperDescriptorType calls itself builtins.wrapper_descriptor
types.CapsuleType calls itself builtins.PyCapsule

These also are builtins that are not importable by the name they report, but they have
have special handling in pickle to make them pickleable:

types.EllipsisType calls itself builtins.ellipsis
types.NoneType calls itself builtins.NoneType
types.NotImplementedType calls itself builtins.NotImplementedType

The final category is a bunch of types in ctypes.wintypes which are created using ctypes.POINTER(). I won’t list them all here, but a representative example is ctypes.wintypes.PULONG which calls itself ctypes.wintypes.LP_c_ulong.

An additional, slightly related category named tuples which are never assigned to
an importable name because they’re defined inline with a class definition that inherits from them. I only noticed these because of work validating inheritance in typeshed.

ssl._ASN1Object inherits from a namedtuple that calls itself ssl._ASN1Object; typeshed calls it ssl._ASN1ObjectBase
tokenize.TokenInfo inherits from a namedtuple that calls itself tokenize.TokenInfo; typeshed calls it tokenize._TokenInfo
platform.uname_result inherits from a namedtuple that calls itself platform.uname_result_base; typeshed does not currently represent this base.
tkinter._VersionInfoType inherits from a namedtuple that calls itself tkinter._VersionInfoType; typeshed does not currently represent this base.
doctest.TestResults inherits from a namedtuple that calls itself doctest.TestResults; typeshed does not currently represent this base.

If they were given a name other than that of the class they provide a base for, typeshed’s representation could be a little closer. platform.uname_result_base is different, but it’s still tricky because it’s not a private name. Even so, these won’t be importable regardless of their name, so they’ll never be pickle-able and that small runtime effect is irrelevant here.

For comparison, these are previous mismatches that have been resolved which I know of:

_thread.LockType calls itself _thread.lock (_thread.lock was added in 3.13)
_ssl.SSLSession / ssl.SSLSession used to call itself _ssl.Session (changed to _ssl.SSLSession in 3.10)
_thread._ExceptHookArgs used to call itself _thread.ExceptHookArgs (changed in 3.10)
_ctypes.CFuncPtr used to call itself _ctypes.PyCFuncPtr (changed in 3.10)
weakref.CallableProxyType used to call itself builtins.weakcallableproxy (changed in 3.10)
weakref.ProxyType used to call itself builtins.weakproxy (changed in 3.10)
weakref.ReferenceType / weakref.ref used to call itself builtins.weakref (changed to weakref.ReferenceType in 3.10)
contextvars.Context used to call itself builtins.Context (changed in 3.10)
contextvars.ContextVar used to call itself builtins.ContextVar (changed in 3.10)
contextvars.Token used to call itself builtins.Token (changed in 3.10)
_struct.Struct / struct.Struct used to call itself builtins.Struct (changed to _struct.Struct in 3.9)

AlexWaygood · November 7, 2024, 10:58pm

Thanks for the great summary!

For these specifically, there’s some previous discussion at Change names of builtin types exposed in the types module · Issue #100129 · python/cpython · GitHub

FWIW these feel like typeshed bugs to me: even if we can’t get the name in the stub exactly the same as the name the base class claims to have at runtime, we should at least try to incorporate the base class into the MRO in the stub with a slightly different name, unless it’s very hard to do so

tungol · November 8, 2024, 1:35am

Agreed; I mostly mentioned it that way here to make it clear that there’s not an alternate name for the same class already in use.

encukou · November 8, 2024, 9:32am

Some of these might be intentional.
Some types shouldn’t be pickled (like a JSON encoder or a tk app).
In a lot of these cases it seems to me that typeshed should use a protocol with only the documented attributes. Some classes are private implementation detail and subject to change – especially if there’s a leading underscore in the name. Making such classes easier to work with might be leading users into a trap.

tungol · November 8, 2024, 7:27pm

Maybe, but there’s little harm in it. We’re talking about pickling the class itself here, not an instance of it. At any rate, I’m not personally concerned with whether or not someone can pickle the class itself (and I’m not sure why someone would want to), but that’s one of the few real runtime effects of the name inconsistency so I think it merits pointing out.

Ultimately I am more concerned with two things: the potential confusion of names that don’t match and the effects of (in)consistency on automated validation of typeshed (aka stubtest). Stubtest requires more exceptions and special handling when cpython itself is inconsistent, and I think that automated validation of the stubs is very important for maintaining their quality and fidelity to runtime. Where private implementation details do make it in to typeshed, stubtest is one of the best tools for ensuring that any relevant changes are tracked.

I don’t think I believe that the fact that having a __name__ which is different from the name used to expose a class is a meaningful speedbump against bad decisions.