PEP 597: Enable UTF-8 mode by default on Windows

steve.dower · February 6, 2020, 11:49pm

The core of the problem is that encoding is an application setting, not a system setting. Any signal from the system about what encoding should be used refers to how your application communicates with the system. (And Windows prefers UTF-16 for that, which is why the configuration setting/code page is deprecated.)

What encoding an application uses is up to the application. In Python’s case, we only get to choose the default, but then we should expect the application (script) to override it. Unfortunately, that rarely happened.

Changing the default is a massively breaking change. It’s a 4.x change, in my opinion, not a 3.y. It’ll break cache and configuration files that apps expect to survive updates, or cross versions (pip.ini, tox.ini, setup.cfg are easy examples).

To change the default, we need to start warning when people use the default. Because it’s not just an application setting, but a library and framework setting too. Every level of program has an expectation of how to read its own files, and they all need to be prepared for it to change - the user can’t just override it one day and expect everything to work. We need to be telling library developers that they need to specify an encoding, and show them how to handle forwards compatibility if they need to handle old versions of their own files.

Alternatively, we could make a more intelligent default decoder for Windows that will read UTF-8 until it fails, then assume ACP. Because that’s what we’re going to tell libraries to do anyway, so may as well make it easy on them.

subprocess is a special case, because the encoding there is an agreement between two applications, including if they both agree to use the ACP. In that case, I’d prefer to not have any default encoding, so if you want str rather than bytes you have to specify it yourself.

That’s also what I’d like for open(), but it’s far too late to force that.

I regularly make the argument that if you don’t specify an encoding when you write text to a file, you can’t possibly read it back. If we change the default, everyone is going to learn that very quickly and painfully.

-1 on this PEP.