The Steering Council was recently asked to decide whether arbitrary strings should be allowed as **
arguments in calls, attributes and elsewhere.
The SC ruled that allowing arbitrary strings is a feature of Python, rather than an implementation detail.
The Ideas thread has a lot of background discussion, but now that we have a SC ruling, I think a new thread is in order.
I wrote up a text about how the details should work, which goes beyond a simple SC ruling. Members of the SC generally agree with this direction, but not necessarily all the details. The nitpicking is better done in public, so here goes:
Let’s separate names and identifiers.
The name of a keyword argument, attribute, function, class, module, variable and similar can be any string. This includes, for example:
- the empty string
- a string with dots, dashes, dollars or other symbols
- a language keyword (e.g.
for
) - a string with
\0
or other control characters - emoji!
This is a feature of Python’s object model. It is not CPython-specific.
(The term name can be confusing when used alone. It should generally be qualified as attribute name, argument name, variable name etc. I’ll make an exception in this thread, which lumps this kind of names together, and isn’t about other kinds of names. Better terminology would be welcome.)
Identifiers, currently documented as a synonym for “name”, are a feature of the Python syntax – a part of Python that’s separate from the object model.
While non-identifier names are inaccessible using the Python syntax, in many cases there is a string-based API to work with them, like getattr
/setattr
, importlib.import_module
, call(**...)
.
Implementation-specific alternative ways to work with objects, like CPython’s C API or the AST, are also not limited to the Python syntax.
Allowing arbitrary strings should help make implementations simpler (as we don’t need potentially expensive checks), and allows straightforward bindings to other languages and object systems.
The following are implementation details, which may be different across implementations, and might change in future CPython versions (with an appropriate deprecation process):
- Allowing non-strings that compare equal to strings (including subclasses of
str
) as names. - Allowing non-strings in namepaces (like
__dict__[3.14]
). Non-strings are not considered to be names. - Preserving the identity of strings used as names. (For example, namespace implementations may intern the names, or not store names as Python objects at all.)
Some kinds of names may have additional restrictions. For example, module names containing a dot (.
) will not work well with the import machinery, since the dot separates package names.
Since we’re only writing this down now, CPython might contain bugs and omissions around non-identifier names, especially ones with embedded NULs. Similarly, the documentation currently doesn’t use the terms “name” and “identifier” as defined here. These should be reported and fixed, eventually.
PEP 8 could be clarified to specify that “all names in the Python standard library MUST be ASCII-only non-keyword identifiers” (except in tests for unusual names). Third-party projects are encouraged to adopt this policy as well.
Note that Python implementations can vary in details of what is considered a string – for example, we currently don’t specify if surrogates or “characters” outside the Unicode range are allowed. This means that the exact set of allowed names is, technically, also implementation-specific.
What are y’all’s thoughts?