Switching to UTF-8 as the default file encoding could unmask an encoding problem that went unnoticed when the default was the process ANSI code page. Python should try to make it easier to diagnose and resolve such problems. This helps to reduce the pain of switching to UTF-8 as the default.
One thing we can improve, and something we really should have implemented from the outset, is to provide a simple way to use the active code page(s) of the current console session for standard I/O, which relates to the suggestion to work around problems by setting PYTHONIOENCODING
. Using the active code page is a paradigm from MS-DOS. It’s still used by some Windows console applications, so it needs to be supported. (I don’t want Python to use this behavior by default, however. I prefer to use UTF-8 or the ANSI code page of either the user locale or the system locale, depending on the context.)
I suggested the addition of a “console” pseudo-encoding for this. It’s not a real encoding because it resolves to the current input or output code-page encoding of the console session (e.g. “cp850”). This makes it simple to work around a legacy encoding problem by setting PYTHONIOENCODING=console
, or by spawning a child process with subprocess.Popen(args, stdin=PIPE, stdout=PIPE, stderr=PIPE, encoding='console')
.
The “console” encoding could be evaluated internally by a new _Py_console_encoding(fd)
function, for Windows only. This would always call GetConsoleCP()
for stdin (0) and GetConsoleOutputCP()
for stdout (1) and stderr (2), regardless of the file’s existence or type. If there’s no console session, return None
, and let the caller decided how to handle it. For file numbers above 2, explicitly check for an existing input or output console file to determine whether to use GetConsoleCP()
or GetConsoleOutputCP()
.
Since 3.8, the interpreter initialization only supports one standard I/O encoding (i.e. config->stdio_encoding
), which defaults to the locale encoding. That there’s only one standard I/O encoding is a non-issue given the evaluation of the “console” encoding. The same applies to the single encoding
parameter of subprocess.
The “console” encoding could be supported more generally by a codec search function that calls _winapi.GetConsoleCP()
. This could also support “conin” and “conout” encodings that respectively use _winapi.GetConsoleCP()
and _winapi.GetConsoleOutputCP()
. Python standard I/O providers such as subprocess.Popen
could evaluate the generic “console” encoding as “conin” for stdin and “conout” for stdout and stderr.
I suggested exposing the standard I/O encoding via sys.getstdioencoding()
and sys.getstdioencodeerrors()
. Setting PYTHONIOENCODING
would thus override UTF-8 mode for not only sys.std*
, but also anything that uses sys.getstdioencoding()
. I think this should include the default encoding used by subprocess.Popen
.