This is a draft of the third version of the PEP 597.
In this version, I propose just adding an option to raise a warning.
I am still considering introducing a new option to opt-in this warning, or just use dev mode.
Abstract
This PEP proposes:
-
TextIOWrapper
raises aPendingDeprecationWarning
when the
encoding
option is not specified, and dev mode is enabled. -
Add
encoding="locale"
option toTextIOWrapper
. It behaves
likeencoding=None
but don’t raise a warning.
Motivation
People assume the default encoding is UTF-8
Developers using macOS or Linux may forget that the default encoding
is not always UTF-8.
For example, long_description = open("README.md").read()
in
setup.py
is a common mistake. Many Windows users can not install
the package if there is at least one emoji or any other non-ASCII
character in the README.md
file.
I found 489 packages that use non-ASCII characters in README,
and 82 packages of them can not be installed from source package
when locale encoding is ASCII [#]_.
… [#] GitHub - methane/pep597-pypi-ascii
Another example is logging.basicConfig(filename="log.txt")
.
Some people expect UTF-8 is used by default, but locale encoding is
used actually. [#]_
… [#] Issue 37111: Logging - Inconsistent behaviour when handling unicode - Python tracker
Even Python experts assume that default encoding is UTF-8.
It creates bugs that happen only on Windows. See [#]_ [#]_.
… [#] Use utf-8 to read README by methane · Pull Request #682 · pypa/packaging.python.org · GitHub
… [#] Issue 33684: parse failed for mutibytes characters, encode will show in \xxx - Python tracker
Raising a warning when the encoding
option is omitted will
help to find such mistakes.
Prepare to change the default encoding to UTF-8
We chose to use locale encoding for the default text encoding
in Python 3.0. But UTF-8 has been adopted very widely since then.
We might change the default text encoding to UTF-8 in the future.
But this change will affect many applications and libraries.
Many DeprecationWarning
will be raised if we start raising
the warning by default. It will be too noisy.
While this PEP doesn’t cover the change, this PEP will help to reduce
the number of DeprecationWarning
in the future.
Specification
Raising a PendingDeprecationWarning
TextIOWrapper
raises the PendingDeprecationWarning
when the
encoding
option is omitted, and dev mode is enabled.
encoding="locale"
option
When encoding="locale"
is specified to the TextIOWrapper
, it
behaves same to encoding=None
. In detail, the encoding is
chosen by:
os.device_encoding(buffer.fileno())
locale.getpreferredencoding(False)
This option can be used to suppress the PendingDeprecationWarning
.
io.text_encoding
TextIOWrapper
is used indirectly in most case. For example, open
, and pathlib.Path.read_text()
use it. Warning to these
functions doesn’t make sense. Caller of these functions should be warned instead.
io.text_encoding(encoding, stacklevel=1)
is a helper function for it.
Pure Python implementation will be like this::
def text_encoding(encoding, stacklevel=1):
"""
Helper function to choose the text encoding.
When encoding is not None, just return it.
Otherwise, return the default text encoding ("locale" for now),
and raise a PendingDeprecationWarning in dev mode.
This function can be used in APIs having encoding=None option.
But please consider encoding="utf-8" for new APIs.
"""
if encoding is None:
if sys.flags.dev_mode:
import warnings
warnings.warn(
"'encoding' option is not specified. The default encoding "
"will be changed to 'utf-8' in the future",
PendingDeprecationWarning, stacklevel + 2)
encoding = "locale"
return encoding
pathlib.Path.read_text()
can use this function like this::
def read_text(self, encoding=None, errors=None):
"""
Open the file in text mode, read it, and close the file.
"""
encoding = io.text_encoding(encoding)
with self.open(mode='r', encoding=encoding, errors=errors) as f:
return f.read()
subprocess module
While subprocess module uses TextIOWrapper, it doesn’t raise
PendingDeprecationWarning
. It uses the “locale” encoding
by default.
Rationale
“locale” is not a codec alias
We don’t add the “locale” to the codec alias because locale can be
changed in runtime.
Additionally, TextIOWrapper
checks os.device_encoding()
when encoding=None
. This behavior can not be implemented in
the codec.
subprocess module doesn’t warn
The default encoding for PIPE is relating to the encoding of the stdio.
It should be discussed later.
Reference Implementation
Copyright
This document has been placed in the public domain.