Before updating the PEP, I want to reply this part.
TL;DR: There are so many non-UTF-8 files in Japanese. It made the strong motivation for “UTF-8 by default everywhere, everytime”. No contradiction here.
I and many experienced developer in Japan are all suffered by complex situation before UTF-8 dominates.
We used, at least three major encodings. (cp932 for Windows, euc_jp for Unix, and ISO-2022-JP for IRC, e-mail, etc). cp932 used 0x9c (\
) in multibyte characters. Many applications were broken because they treats it as escape character. iso-2022 is stateful. Converting between encodings are always lossy (no round tripping). And legacy mobile phone had added custom "emoji"s in unused area. It was nightmare age.
Now we (“modern” users) are happy with UTF-8. We use UTF-8 every day. We teach to new programmers with only UTF-8. Of course, we are forced to use cp932 on Windows sometime. (And that’s one of major reason Windows is hated…) But we use UTF-8 in all other cases.
And changing default encoding doesn’t mean we can not handle legacy text files anymore.
Note that text files are living longer than “locale”.
About 2005~2015, some user used UTF-8 locale and others used euc_jp locale even on same server, and there were cp932 and ISO-2022-JP files on the same server.
Single default text encoding never worked well for legacy text files in such systems. We checked the encoding of the text files manually, always. That’s why I thought “specify encoding always when handling legacy text files” make sense.
Of course, there are many systems where “single legacy encoding” works fine. We shouldn’t dismiss them. But Python is not a language only for them. We shouldn’t dismiss people who are happy by “UTF-8 by default” too.
This is background of my motivation. I hope this explains why “believe UTF-8 is the best default encoding” didn’t mean “dismiss non-UTF-8 users”.