This is not changed behaviour. It has always worked this way.
The string that you are seeing is correct. You asked it to take a string consisting of the letter a
, a pound sign and the letter b
, and replace the pound sign with a single backslash followed by a pound sign. The expected result, of course, is the letter a
, a backslash, a pound sign, and b
, in order.
The output that you get is exactly that. It contains one backslash. The REPL displays two backslashes because it is showing you a Python-source-code-compatible representation of the string.
Just like how none of the strings involved in this example contain any quotation marks, and in particular don’t start or end with quotation marks. Those are only part of Python syntax needed to mark that it’s a string.
When you use the r
prefix to describe a string, that is not a different kind of string. It is a different syntax for describing strings. When you perform string operations with that string, therefore, the computed result won’t be a different kind of string, because there aren’t different kinds of strings to choose from in the first place.
When Python displays the representation of a string, it chooses one particular canonical form. It doesn’t matter how the string was constructed; it only matters what the string contains. It doesn’t matter that there are countless ways to create the same string; Python will choose the same representation for given string contents, and its rules for choosing that representation have been the same for as long as I can remember.
Some other examples:
>>> r'x' # like I was saying, not a different kind of string
'x'
>>> '\x40' # contains a symbol that can be shown directly, so it is
'@'
>>> '\x09' # converted to the preferred form to escape a tab
'\t'
>>> '\"' # this does not actually have a backslash in it!
'"'
>>> '\'' # the quote type changes, to simplify the representation
"'"
>>> "'\"" # single quotes are preferred when the string includes both types
'\'"'
Those come from my 3.8 installation, BTW. But testing it on 2.7, or even older versions, wouldn’t change anything - aside from the availability of raw string literals, but even those are older than 2.7 IIRC.
Here are some more tests we can do (roughly the ones that were suggested on the issue you opened on the tracker):
>>> text = 'a#b'.translate(str.maketrans({'#': r'\#'}))
>>> text
'a\\#b'
>>> len(text)
4
>>> list(text)
['a', '\\', '#', 'b']
>>> for c in text:
... print(c)
...
a
\
#
b
The string contains four characters. When we split the string into a list of characters (implicitly, by asking list
to iterate over it and make a list from each element), we see that the second one (in index 1
) is displayed as '\\'
. That means a single backslash, not two. We can see that clearly when we use an explicit for
loop to print the characters of the string one at a time. We only see one backslash, because there is only one backslash.