It seems to be a deliberate design for Python 3 to support anything in string literals that is valid in C, and with the same meanings of course. Particularly with escape sequences. For instance, in both languages, '\n'
is the same as chr(0x0A)
. Thus C programmers will have no trouble understanding Python literals.
Python has already moved to include some C++ features that are not in C (yet). Notably the \N{name}
escape. This is new to C++ 23.
I am simply saying that Python should incorporate the other new features of C++ 23. These would be part of the unicode-escape
encoding, which is used by both the Python parser and the codecs.decode()
function.
See this example of new features that work and don’t work on Python currently:
>>> '\N{NBSP}'
'\xa0'
>>> '\u{0000}'
File "<python-input-3>", line 1
'\u{0000}'
^^^^^^^^^^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape
The NBSP
is actually an alias for NO-BREAK SPACE
, and is normative in the Unicode standard.
The escape sequences in C++ 23 which are not in C are:
\N{name or alias}
\u{hex digit(s)}
\x{hex digit(s)}
\o{octal digit(s)}
I argue that these delimited escape sequences are somewhat easier to read and make it clear just what the escape is composed of. For instance, compare
"ab\xcdef"
= "ab\x{cd}ef"
, 5 characters
"\12a"
= "\o(12}a"
, 2 characters
"\1234"
= "\o{123}4
", 2 characters
Furthermore, these escapes can represent values larger than 255. For example, a character for the Greek letter δ
can be written as "\x{3B4}"
, rather than "\u03B4"
, to convey the meaning that the value is simply a number rather than a unicode codepoint.
Someone please adopt this proposal
I don’t have the means to do any more than just post this suggestion. So if you, dear reader, see that there is consensus for the idea, then please file an issue. Thanks.