Switching to UTF-8 as the default file encoding could unmask an encoding problem that went unnoticed when the default was the process ANSI code page. Python should try to make it easier to diagnose and resolve such problems. This helps to reduce the pain of switching to UTF-8 as the default.
One thing we can improve, and something we really should have implemented from the outset, is to provide a simple way to use the active code page(s) of the current console session for standard I/O, which relates to the suggestion to work around problems by setting
PYTHONIOENCODING. Using the active code page is a paradigm from MS-DOS. It’s still used by some Windows console applications, so it needs to be supported. (I don’t want Python to use this behavior by default, however. I prefer to use UTF-8 or the ANSI code page of either the user locale or the system locale, depending on the context.)
I suggested the addition of a “console” pseudo-encoding for this. It’s not a real encoding because it resolves to the current input or output code-page encoding of the console session (e.g. “cp850”). This makes it simple to work around a legacy encoding problem by setting
PYTHONIOENCODING=console, or by spawning a child process with
subprocess.Popen(args, stdin=PIPE, stdout=PIPE, stderr=PIPE, encoding='console').
The “console” encoding could be evaluated internally by a new
_Py_console_encoding(fd) function, for Windows only. This would always call
GetConsoleCP() for stdin (0) and
GetConsoleOutputCP() for stdout (1) and stderr (2), regardless of the file’s existence or type. If there’s no console session, return
None, and let the caller decided how to handle it. For file numbers above 2, explicitly check for an existing input or output console file to determine whether to use
Since 3.8, the interpreter initialization only supports one standard I/O encoding (i.e.
config->stdio_encoding), which defaults to the locale encoding. That there’s only one standard I/O encoding is a non-issue given the evaluation of the “console” encoding. The same applies to the single
encoding parameter of subprocess.
The “console” encoding could be supported more generally by a codec search function that calls
_winapi.GetConsoleCP(). This could also support “conin” and “conout” encodings that respectively use
_winapi.GetConsoleOutputCP(). Python standard I/O providers such as
subprocess.Popen could evaluate the generic “console” encoding as “conin” for stdin and “conout” for stdout and stderr.
I suggested exposing the standard I/O encoding via
PYTHONIOENCODING would thus override UTF-8 mode for not only
sys.std*, but also anything that uses
sys.getstdioencoding(). I think this should include the default encoding used by