PEP 597: Use UTF-8 for default text file encoding

With PowerShell, or Python, or any other application that supports the Unicode APIs for writing to/reading from the console.

cmd.exe (and many of the tools you’d use from it) are not such applications.

(If you’re an end-user, then you don’t get to choose to use the console in any way other than how your applications will let you. It’s only as a developer that you get to choose.)

Just my anecdotal 2c:

As a Windows-only user for the past 10+ years, the absolute only time I’ve written/read things in something other than UTF-8 was when burning in subtitles to video that were created by others. In these cases one can only guess and therefore chardet was used.

Having the default be UTF-8 would have saved me lots of pain over the years.

2 Likes

Thanks for your reaction.
When I saw aws-cli repository for discussion in other thread, I found this issue too.

It’s very obvious that this is common bug, and many Windows users are suffered by default encoding is not UTF-8.

On the other hand, it’s very unobvious that how many (or how few) Windows users are suffered by the backward incompatible change in mid-2020s. It’s devil’s proof.

So my PEP 597 (2nd) propose environment variable to configure default encoding. If it is accepted, you can change the default text encoding. We can postpone the discussion about when change the default of “default text encoding”.

But we have PYTHONUTF8 already. The most important part of PEP 597 is why UTF-8 mode is not enough for Windows users.

So, if you would like to contribute this discussion, it’s very helpful that trying UTF-8 mode now (maybe with chcp 65001).
If it is enough, we don’t need to add yet another configuration option.