Why '\hello' `s __repr__ is "'\\\\hello'"?

s='\hello'
s.__str__()
'\\hello'
s.__repr__()
"'\\\\hello'"

Why there are four backslash in s.__repr__() instead of two?

1 Like

This is reminiscent of The Backslash plague

Maybe the repr would be more helpful if it returned r"'\\hello"' instead, but that would break things.

Anyway, older Python versions (e.g. 3.9, but not in 3.12) ‘helpfully’ insert another backslash to the first one, escaping it as \h is not an escape sequence.

So to represent s properly without relying on the automagik, two back slashes must be typed into the string literal. But to represent the python code containing this string literal, as another string literal, each of those back slashes must again be escaped (doubling the count to 4).

1 Like

On python 3.12 I see this:

:>>> s = '\hello'
<unknown>:1: SyntaxWarning: invalid escape sequence '\h'

That is because you need to double up the \ if you mean the string to start with one of them.

:>>> s = '\\hello'
:>>> print(s)
\hello
:>>> print(repr(s))
'\\hello'
1 Like

If you want to see the characters in a string in the REPL, this can be helpful:

>>> s = '\hello'
>>> list(s)
['\\', 'h', 'e', 'l', 'l', 'o']

Notice that the first character is reproduced for you as '\\' because that is how you would type a string containing one backslash. I am using 3.11 here, so it has been “kind” by guessing what I (and you) meant in the the original assignment. 3.12 refuses to guess.

The REPL shows you the repr() of your result. s.__str__() is just s, so you see repr(s). s.__repr__() is repr(s), so you see repr(repr(s)).

1 Like

s.__str__ returns s itself. s.__repr__ returns a string representation of s. In both cases, you are additionally seeing the string representation of those strings. Compare with using print:

>>> print(s.__str__())
\hello
>>> print(s.__repr__())
'\\hello'

In the end, s contains exactly one backslash. The string representation of that string uses the escape sequence \\ to represent the backslash. Since s.__repr__() contains two backslashes, each of them is represented with \\ in its own representation.

2 Likes

Exercise for the reader:

  1. start with the three-character string s = r'\"'"'" (that’s the string containing a backslash, a double quote, and a single quote, in that order)
  2. repeatedly apply repr to that string: s = repr(s), and count the number of backslashes that occur at each iteration
  3. explain why the numbers you get exactly match those of OEIS A183155, “The number of order-preserving partial isometries (of an n-chain) of fix zero (fix of alpha = 0).”
>>> s = r'\"'"'"
>>> s
'\\"\''
>>> repr(s)
'\'\\\\"\\\'\''
>>> repr(repr(s))
'\'\\\'\\\\\\\\"\\\\\\\'\\\'\''
>>> repr(repr(repr(s)))
'\'\\\'\\\\\\\'\\\\\\\\\\\\\\\\"\\\\\\\\\\\\\\\'\\\\\\\'\\\'\''
>>> for i in range(11):
...     print(i, s.count('\\'))
...     s = repr(s)
... 
0 1
1 3
2 9
3 23
4 53
5 115
6 241
7 495
8 1005
9 2027
10 4073
3 Likes

You lost me here.

Why did python go with some scheme involving mixed ' and " in its raw format?

By every understanding I currently have of how strings are defined within source code:

<quote><string contents go here><matching quote to close>'

But this…

r'\"'"'"

Opens with a ' quote and closes with a " quote.

Which leaves me bamboozled.

1 Like

It’s actually two strings, joined into one by implicit concatenation in the compiler. There’s an imaginary + between the second ' and " characters: r'\"' + "'"

5 Likes

It’s because it was bound to follow a reasonably simple rule and the OEIS has a name for every last one of them. :wink:

Is this part of the language specification or implementation dependent?

Ok, I’ll bite. No idea what a derangement is, but …

The double quote is just there to make sure str.__repr__ wraps the string in single quotes. Let s(n) be such a string containing a(n) backslashes and b(n) single quotes and s(n) = repr(s(n-1)). Then repr turns every backslash into two backslashes, adds a backslash to every single quote, and adds two more single quotes. So:

a(n) = 2a(n-1) + b(n-1), and
b(n) = b(n-1) + 2

The second, with b(1)=1 leads to b(n) = 2n - 1 and by substitution a(n) = 2a(n-1) + 2n - 3, which is the recursion at reference given a(1) = 1.

1 Like

It is used to mean that the order is not preserved. Derangements are permutations where all elements are not in their original position. Partial derangement, when only some are not in their original order. Well, terminology is not always set in stone. Some might use it to mean that it is only a partial function, some for when it is not really a one-to-one function.

1 Like

You can also add just whitespace to make the juxtaposition easier to see.

r'\"' "'"
2 Likes