Hi, all.
Here is the 2nd edition of the PEP 597. (The 1st edition is here)
I had proposed to change the default text encoding to UTF-8 in the previous edition. But it was backward incompatible change. And raising DeprecationWarning
when encoding
is ommitted is huge pain.
This time, I am proposing to utilize the UTF-8 mode by enabling it by default on Windows.
I don’t propose it for Python 3.9 because I want to get feedback about UTF-8 mode on Windows from users. I documented the UTF-8 mode on Windows (link) already. I want to recommend the UTF-8 mode for Windows users in 2020.
Abstract
This PEP proposes to make UTF-8 mode [#]_ enabled by default on
Windows.
The goal of this PEP is providing “UTF-8 by default” experience to
Windows users like Unix users.
Motivation
UTF-8 is the best encoding nowdays
Popular text editors like VS Code uses UTF-8 by default.
Even Microsoft Notepad uses UTF-8 by default since the Windows 10
May 2019 Update.
Additionally, the default encoding of Python source files is UTF-8.
We can assume that most Python programmers use UTF-8 for most text
files.
Python is one of the most popular first programming languages.
New programmers may not know about encoding. If the default encoding
for text files is UTF-8, they can learn about encoding when they need
to handle legacy encoding.
People assume the default encoding is UTF-8 already
Developers using macOS or Linux may forget that the default encoding
is not always UTF-8.
For example, long_description = open("README.md").read()
in
setup.py
is a common mistake. Many Windows users can not install
the package if there is at least one emoji or any other non-ASCII
character in the README.md
file.
Even Python experts assume that default encoding is UTF-8.
It creates bugs that happen only on Windows. See [#]_ and [#]_.
Changing the default text encoding to UTF-8 will help many Windows
users.
Specification
Enable UTF-8 mode on Windows unless it is disabled explicitly.
UTF-8 mode affects these areas:
-
locale.getpreferredencoding
returns “UTF-8”.-
open
,subprocess.Popen
,pathlib.Path.read_text
,
ZipFile.open
, and many other functions use UTF-8 when
theencoding
option is omitted.
-
-
The stdio uses “UTF-8” always.
- Console I/O uses “UTF-8” already [#]_. So this affects
only when the stdio are redirected.
- Console I/O uses “UTF-8” already [#]_. So this affects
On the other hand, UTF-8 mode doesn’t affect to “mbcs” encoding.
Users can still use system encoding by choosing “mbcs” encoding
explicitly.
Backwards Compatibility
Some existing applications assuming the default text encoding is the
system encoding (a.k.a. ANSI encoding) will be broken by this change.
Users can disable the UTF-8 mode by environment variable
(PYTHONUTF8=0
) or command line option (-Xutf8=0
) for backward
compatibility.
Rejected Ideas
Change the default encoding of TextIOWrapper to “UTF-8”
This idea changed the default encoding to UTF-8 always, regardless of
platform, locale, and environment variables.
While this idea looks ideal in terms of consistency, it will cause
backward compatibility problems.
Utilizing the UTF-8 mode seems better than adding one more backward
compatibility option like PYTHONLEGACYWINDOWSSTDIO
.
Reference Implementation
To be written.
References
… [#] PEP 540 -- Add a new UTF-8 Mode <https://www.python.org/dev/peps/pep-0540/>
_
… [#] https://github.com/pypa/packaging.python.org/pull/682
… [#] https://bugs.python.org/issue33684
… [#] PEP 528 -- Change Windows console encoding to UTF-8 <https://www.python.org/dev/peps/pep-0528/>
_