Python str.translate() changed behaviour

Using Ubuntu 20.04 I did not have this problem. Possibly I run Python 3.8.2, too late to find out.
Now I changed over to Debian 11. When I open a terminal and run:

janne@ubuntu:~$ lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 11 (bullseye)
Release:	11
Codename:	bullseye
janne@ubuntu:~$ python3 --version
Python 3.9.2
janne@ubuntu:~$ python3
Python 3.9.2 (default, Feb 28 2021, 17:03:44) 
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 'a#b'.translate(str.maketrans({'#': r'\#'}))
'a\\#b'
>>> 
janne@ubuntu:~$

I expected a single backslash. With Python 3.9.2 I got two backslashes.
Don’t know how to fix this.

See Changed behaviour of str.translate() #106889

That is the repr of the value you are seeing.
Try both these to see the difference:


x = ‘a#b'.translate(str.maketrans({'#': r'\#'}))
print(x)
print(repr(x))

This is not changed behaviour. It has always worked this way.

The string that you are seeing is correct. You asked it to take a string consisting of the letter a, a pound sign and the letter b, and replace the pound sign with a single backslash followed by a pound sign. The expected result, of course, is the letter a, a backslash, a pound sign, and b, in order.

The output that you get is exactly that. It contains one backslash. The REPL displays two backslashes because it is showing you a Python-source-code-compatible representation of the string.

Just like how none of the strings involved in this example contain any quotation marks, and in particular don’t start or end with quotation marks. Those are only part of Python syntax needed to mark that it’s a string.

When you use the r prefix to describe a string, that is not a different kind of string. It is a different syntax for describing strings. When you perform string operations with that string, therefore, the computed result won’t be a different kind of string, because there aren’t different kinds of strings to choose from in the first place.

When Python displays the representation of a string, it chooses one particular canonical form. It doesn’t matter how the string was constructed; it only matters what the string contains. It doesn’t matter that there are countless ways to create the same string; Python will choose the same representation for given string contents, and its rules for choosing that representation have been the same for as long as I can remember.

Some other examples:

>>> r'x' # like I was saying, not a different kind of string
'x'
>>> '\x40' # contains a symbol that can be shown directly, so it is
'@'
>>> '\x09' # converted to the preferred form to escape a tab
'\t'
>>> '\"' # this does not actually have a backslash in it!
'"'
>>> '\'' # the quote type changes, to simplify the representation
"'"
>>> "'\"" # single quotes are preferred when the string includes both types
'\'"'

Those come from my 3.8 installation, BTW. But testing it on 2.7, or even older versions, wouldn’t change anything - aside from the availability of raw string literals, but even those are older than 2.7 IIRC.

Here are some more tests we can do (roughly the ones that were suggested on the issue you opened on the tracker):

>>> text = 'a#b'.translate(str.maketrans({'#': r'\#'}))
>>> text
'a\\#b'
>>> len(text)
4
>>> list(text)
['a', '\\', '#', 'b']
>>> for c in text:
...     print(c)
... 
a
\
#
b

The string contains four characters. When we split the string into a list of characters (implicitly, by asking list to iterate over it and make a list from each element), we see that the second one (in index 1) is displayed as '\\'. That means a single backslash, not two. We can see that clearly when we use an explicit for loop to print the characters of the string one at a time. We only see one backslash, because there is only one backslash.