This is a draft of the third version of the PEP 597.
In this version, I propose just adding an option to raise a warning.
I am still considering introducing a new option to opt-in this warning, or just use dev mode.
This PEP proposes:
encodingoption is not specified, and dev mode is enabled.
TextIOWrapper. It behaves
encoding=Nonebut don’t raise a warning.
People assume the default encoding is UTF-8
Developers using macOS or Linux may forget that the default encoding
is not always UTF-8.
long_description = open("README.md").read() in
setup.py is a common mistake. Many Windows users can not install
the package if there is at least one emoji or any other non-ASCII
character in the
I found 489 packages that use non-ASCII characters in README,
and 82 packages of them can not be installed from source package
when locale encoding is ASCII [#]_.
Another example is
Some people expect UTF-8 is used by default, but locale encoding is
used actually. [#]_
Even Python experts assume that default encoding is UTF-8.
It creates bugs that happen only on Windows. See [#]_ [#]_.
Raising a warning when the
encoding option is omitted will
help to find such mistakes.
Prepare to change the default encoding to UTF-8
We chose to use locale encoding for the default text encoding
in Python 3.0. But UTF-8 has been adopted very widely since then.
We might change the default text encoding to UTF-8 in the future.
But this change will affect many applications and libraries.
DeprecationWarning will be raised if we start raising
the warning by default. It will be too noisy.
While this PEP doesn’t cover the change, this PEP will help to reduce
the number of
DeprecationWarning in the future.
Raising a PendingDeprecationWarning
TextIOWrapper raises the
PendingDeprecationWarning when the
encoding option is omitted, and dev mode is enabled.
encoding="locale" is specified to the
behaves same to
encoding=None. In detail, the encoding is
This option can be used to suppress the
TextIOWrapper is used indirectly in most case. For example,
pathlib.Path.read_text() use it. Warning to these
functions doesn’t make sense. Caller of these functions should be warned instead.
io.text_encoding(encoding, stacklevel=1) is a helper function for it.
Pure Python implementation will be like this::
def text_encoding(encoding, stacklevel=1): """ Helper function to choose the text encoding. When encoding is not None, just return it. Otherwise, return the default text encoding ("locale" for now), and raise a PendingDeprecationWarning in dev mode. This function can be used in APIs having encoding=None option. But please consider encoding="utf-8" for new APIs. """ if encoding is None: if sys.flags.dev_mode: import warnings warnings.warn( "'encoding' option is not specified. The default encoding " "will be changed to 'utf-8' in the future", PendingDeprecationWarning, stacklevel + 2) encoding = "locale" return encoding
pathlib.Path.read_text() can use this function like this::
def read_text(self, encoding=None, errors=None): """ Open the file in text mode, read it, and close the file. """ encoding = io.text_encoding(encoding) with self.open(mode='r', encoding=encoding, errors=errors) as f: return f.read()
While subprocess module uses TextIOWrapper, it doesn’t raise
PendingDeprecationWarning. It uses the “locale” encoding
“locale” is not a codec alias
We don’t add the “locale” to the codec alias because locale can be
changed in runtime.
encoding=None. This behavior can not be implemented in
subprocess module doesn’t warn
The default encoding for PIPE is relating to the encoding of the stdio.
It should be discussed later.
This document has been placed in the public domain.