PEP 597: Enable UTF-8 mode by default on Windows

No, it can only be changed in exactly two ways. An administrator can change the system encoding to UTF-8, but this will likely break many applications. Or an individual application can set its “activeCodePage” to UTF-8 in its manifest.

I brought up locale conversion because it’s a significant component of the startup behavior in Unix, which tries to coerce to a UTF-8 locale if “C” or “POSIX” is configured, and it only falls back on UTF-8 mode if coercing fails or is disabled. To do something similar in Windows, we would need to set the “activeCodePage” to UTF-8 in an alternate base executable and use a “python.exe” launcher to execute the required executable depending on whether coercion is enabled (e.g. “python_utf8.exe” vs “python_locale.exe”). This affects the entire process. It sets the Windows multibyte API to use UTF-8, and the CRT will also set its default locale to use UTF-8 in this case. It’s the closest we can get to the effect of locale coercion in Unix, except since there’s no environment-variable support, we have no way consistent, reliable way to influence child processes.

Maybe you’re confusing the console’s input and output codepages with the active codepage of a process (CP_ACP). Once set at startup, the active codepage is locked in for the remainder of the process lifetime (barring low-level hacking of the PEB and private data structures in ntdll). It’s the encoding used by the Windows multibyte-string (i.e. ANSI) API, and many programs use it as a default encoding for files and other I/O, but it is not related to console I/O.

The console is in its own process (conhost.exe) and has an unrelated system for its multi-byte string API (i.e. ReadConsoleA – despite the “A” suffix – is not necessarily using our process active codepage, CP_ACP). Console files (i.e. files opened on “\Device\ConDrv” that are used for I/O, such as “Input”, “Output”, “CurrentIn”, and “CurrentOut”) are internally UTF-16 buffers in the console host process (well UCS-2, but they should be UTF-16), but for convenience of programs that use multibyte strings, the console host has an input (reading) and an output (writing) codepage for use with multibyte-string API functions such as generic ReadFile and WriteFile and console-specific ReadConsoleA and WriteConsoleA. Since 3.6, Python core no longer uses these console codepages because we do our own transcoding between UTF-8 and UTF-16 and use the console’s wide-character API.