s='\hello'
s.__str__()
'\\hello'
s.__repr__()
"'\\\\hello'"
Why there are four backslash in s.__repr__()
instead of two?
s='\hello'
s.__str__()
'\\hello'
s.__repr__()
"'\\\\hello'"
Why there are four backslash in s.__repr__()
instead of two?
This is reminiscent of The Backslash plague
Maybe the repr would be more helpful if it returned r"'\\hello"'
instead, but that would break things.
Anyway, older Python versions (e.g. 3.9, but not in 3.12) ‘helpfully’ insert another backslash to the first one, escaping it as \h
is not an escape sequence.
So to represent s
properly without relying on the automagik, two back slashes must be typed into the string literal. But to represent the python code containing this string literal, as another string literal, each of those back slashes must again be escaped (doubling the count to 4).
On python 3.12 I see this:
:>>> s = '\hello'
<unknown>:1: SyntaxWarning: invalid escape sequence '\h'
That is because you need to double up the \
if you mean the string to start with one of them.
:>>> s = '\\hello'
:>>> print(s)
\hello
:>>> print(repr(s))
'\\hello'
If you want to see the characters in a string in the REPL, this can be helpful:
>>> s = '\hello'
>>> list(s)
['\\', 'h', 'e', 'l', 'l', 'o']
Notice that the first character is reproduced for you as '\\'
because that is how you would type a string containing one backslash. I am using 3.11 here, so it has been “kind” by guessing what I (and you) meant in the the original assignment. 3.12 refuses to guess.
The REPL shows you the repr()
of your result. s.__str__()
is just s
, so you see repr(s)
. s.__repr__()
is repr(s)
, so you see repr(repr(s))
.
s.__str__
returns s
itself. s.__repr__
returns a string representation of s
. In both cases, you are additionally seeing the string representation of those strings. Compare with using print
:
>>> print(s.__str__())
\hello
>>> print(s.__repr__())
'\\hello'
In the end, s
contains exactly one backslash. The string representation of that string uses the escape sequence \\
to represent the backslash. Since s.__repr__()
contains two backslashes, each of them is represented with \\
in its own representation.
Exercise for the reader:
s = r'\"'"'"
(that’s the string containing a backslash, a double quote, and a single quote, in that order)repr
to that string: s = repr(s)
, and count the number of backslashes that occur at each iteration>>> s = r'\"'"'"
>>> s
'\\"\''
>>> repr(s)
'\'\\\\"\\\'\''
>>> repr(repr(s))
'\'\\\'\\\\\\\\"\\\\\\\'\\\'\''
>>> repr(repr(repr(s)))
'\'\\\'\\\\\\\'\\\\\\\\\\\\\\\\"\\\\\\\\\\\\\\\'\\\\\\\'\\\'\''
>>> for i in range(11):
... print(i, s.count('\\'))
... s = repr(s)
...
0 1
1 3
2 9
3 23
4 53
5 115
6 241
7 495
8 1005
9 2027
10 4073
You lost me here.
Why did python go with some scheme involving mixed '
and "
in its raw format?
By every understanding I currently have of how strings are defined within source code:
<quote><string contents go here><matching quote to close>'
But this…
r'\"'"'"
Opens with a '
quote and closes with a "
quote.
Which leaves me bamboozled.
It’s actually two strings, joined into one by implicit concatenation in the compiler. There’s an imaginary +
between the second '
and "
characters: r'\"' + "'"
It’s because it was bound to follow a reasonably simple rule and the OEIS has a name for every last one of them.
Is this part of the language specification or implementation dependent?
Ok, I’ll bite. No idea what a derangement is, but …
The double quote is just there to make sure str.__repr__
wraps the string in single quotes. Let s(n) be such a string containing a(n) backslashes and b(n) single quotes and s(n) = repr(
s(n-1))
. Then repr
turns every backslash into two backslashes, adds a backslash to every single quote, and adds two more single quotes. So:
a(n) = 2a(n-1) + b(n-1), and
b(n) = b(n-1) + 2
The second, with b(1)=1 leads to b(n) = 2n - 1 and by substitution a(n) = 2a(n-1) + 2n - 3, which is the recursion at reference given a(1) = 1.
It is used to mean that the order is not preserved. Derangements are permutations where all elements are not in their original position. Partial derangement, when only some are not in their original order. Well, terminology is not always set in stone. Some might use it to mean that it is only a partial function, some for when it is not really a one-to-one function.
You can also add just whitespace to make the juxtaposition easier to see.
r'\"' "'"