Confused about documentation of io.TextIOWrapper encoding


The io.TextIOWrapper documentation claims that the default encoding is locale.getencoding(). The getencoding function ignores UTF-8 mode. But I thought the whole point of UTF-8 mode was to make UTF-8 encoding the default, especially for text IO.


$ LC_ALL=en_US.ISO8859-1 python -X utf8
Python 3.11.0rc2 (main, Sep 13 2022, 00:00:00) [GCC 12.2.1 20220819 (Red Hat 12.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from locale import getencoding
>>> from pathlib import Path
>>> getencoding()
>>> Path("utf8.txt").read_text(encoding='ISO-8859-1')
>>> Path("utf8.txt").read_text(encoding='UTF-8')
>>> Path("utf8.txt").read_text()

I see that locale.getpreferredencoding(False) was changed to locale.getencoding() in this part of the documentation when encoding="locale" was made to ignore UTF-8 mode (@methane). But this means encoding="locale" makes the encoding locale.getencoding(), not encoding=None, right?

I may be missing something, but as far as I can tell, it would seem like that particular line should indeed say locale.getpreferredencoding(False), The code in question indeed appears to behave as locale.getpreferredencoding(False), i.e. not ignoring UTF-8 mode, and this also matches the relevant PEP and the locale.getpreferredencoding and locale.getencoding documentation.

IIRC, @vstinner was also involved in requesting this particular change, along with Inada-san, so perhaps he would also be able to comment.

TextIOWrapper uses UTF-8 if UTF-8 mode is enabled, otherwise uses locale.getencoding().
So it is just a documentation issue.

1 Like