In Python 3.12, the legacy_text_encoding=1 by default. So the default behavior is not changed.
Users can set legacy_text_encoding=0 to test their application with new behavior.
At some version, Python will change the legacy_text_encoding=0 by default and make UTF-8
default. I don’t chose any specific version for now. We need to discuss about deprecation period in other topic.
Relation to PEP 540 UTF-8 Mode.
legacy_text_encoding=0 is very similar to UTF-8 mode. But there is significant difference:
legacy_text_encoding=0 just change the default text encoding. locale encoding is still used in several places (e.g. fsencoding and TTY). locale.getpreferredencoding(False) returns locale encoding (e.g. LC_CTYPE)
UTF-8 mode emulates that Python runs on UTF-8 locale. fsencoding, TTY, and locale.getpreferredencoding(False) is UTF-8 regardless actual locale.
legacy_text_encoding will cover most use cases of UTF-8 mode.
But UTF-8 mode would be still useful for some environments like Android.
How do you think this idea?
Of course, adding yet another encoding option may confuse users.
We can change UTF-8 mode behavior, but it will break some existing use case of UTF-8 mode.
This sounds a bit fragile to me. Whether or not a stream is a TTY can depend on subtleties (for example piping to cat or grep) and it’s not very nice to users if the default encoding changes based on such subtleties.
I don’t really have an opinion here (and hence haven’t voted) other than to say we seem to be continually adding more and more complexity (UTF-8 mode, this new text encoding mode, etc, etc). I feel like we’d be better just making this a clean break. If we’re confident that the end result (UTF-8 everywhere) is worth the cost, then let’s just get on with it and do it. If we’re not confident, then let’s wait.
At a minimum, can anyone clearly state what conditions would have to apply for us to simply switch to UTF-8 everywhere? (“All operating systems that we care about use UTF-8 throughout”, for example).
People can already experiment with UTF-8 mode to figure out whether
their applications work in this eventually new default and we should
instead point people in that direction, rather than introducing a new
way to keep the existing behavior.
FWIW: I have been using UTF-8 mode for several years now and it works
much better than relying on locales, OS env vars, UI settings, etc.