PEP-597: Raise a Warning when encoding is omitted

This is a draft of the third version of the PEP 597.

In this version, I propose just adding an option to raise a warning.

I am still considering introducing a new option to opt-in this warning, or just use dev mode.


Abstract

This PEP proposes:

  • TextIOWrapper raises a PendingDeprecationWarning when the
    encoding option is not specified, and dev mode is enabled.

  • Add encoding="locale" option to TextIOWrapper. It behaves
    like encoding=None but don’t raise a warning.

Motivation

People assume the default encoding is UTF-8

Developers using macOS or Linux may forget that the default encoding
is not always UTF-8.

For example, long_description = open("README.md").read() in
setup.py is a common mistake. Many Windows users can not install
the package if there is at least one emoji or any other non-ASCII
character in the README.md file.

I found 489 packages that use non-ASCII characters in README,
and 82 packages of them can not be installed from source package
when locale encoding is ASCII [#]_.

… [#] https://github.com/methane/pep597-pypi-ascii

Another example is logging.basicConfig(filename="log.txt").
Some people expect UTF-8 is used by default, but locale encoding is
used actually. [#]_

… [#] https://bugs.python.org/issue37111

Even Python experts assume that default encoding is UTF-8.
It creates bugs that happen only on Windows. See [#]_ [#]_.

… [#] https://github.com/pypa/packaging.python.org/pull/682
… [#] https://bugs.python.org/issue33684

Raising a warning when the encoding option is omitted will
help to find such mistakes.

Prepare to change the default encoding to UTF-8

We chose to use locale encoding for the default text encoding
in Python 3.0. But UTF-8 has been adopted very widely since then.

We might change the default text encoding to UTF-8 in the future.
But this change will affect many applications and libraries.
Many DeprecationWarning will be raised if we start raising
the warning by default. It will be too noisy.

While this PEP doesn’t cover the change, this PEP will help to reduce
the number of DeprecationWarning in the future.

Specification

Raising a PendingDeprecationWarning

TextIOWrapper raises the PendingDeprecationWarning when the
encoding option is omitted, and dev mode is enabled.

encoding="locale" option

When encoding="locale" is specified to the TextIOWrapper, it
behaves same to encoding=None. In detail, the encoding is
chosen by:

  1. os.device_encoding(buffer.fileno())
  2. locale.getpreferredencoding(False)

This option can be used to suppress the PendingDeprecationWarning.

io.text_encoding

TextIOWrapper is used indirectly in most case. For example, open, and pathlib.Path.read_text() use it. Warning to these
functions doesn’t make sense. Caller of these functions should be warned instead.

io.text_encoding(encoding, stacklevel=1) is a helper function for it.
Pure Python implementation will be like this::

   def text_encoding(encoding, stacklevel=1):
       """
       Helper function to choose the text encoding.

       When encoding is not None, just return it.
       Otherwise, return the default text encoding ("locale" for now),
       and raise a PendingDeprecationWarning in dev mode.

       This function can be used in APIs having encoding=None option.
       But please consider encoding="utf-8" for new APIs.
       """
       if encoding is None:
           if sys.flags.dev_mode:
               import warnings
               warnings.warn(
                       "'encoding' option is not specified. The default encoding "
                       "will be changed to 'utf-8' in the future",
                       PendingDeprecationWarning, stacklevel + 2)
           encoding = "locale"
       return encoding

pathlib.Path.read_text() can use this function like this::

   def read_text(self, encoding=None, errors=None):
       """
       Open the file in text mode, read it, and close the file.
       """
       encoding = io.text_encoding(encoding)
       with self.open(mode='r', encoding=encoding, errors=errors) as f:
           return f.read()

subprocess module

While subprocess module uses TextIOWrapper, it doesn’t raise
PendingDeprecationWarning. It uses the “locale” encoding
by default.

Rationale

“locale” is not a codec alias

We don’t add the “locale” to the codec alias because locale can be
changed in runtime.

Additionally, TextIOWrapper checks os.device_encoding()
when encoding=None. This behavior can not be implemented in
the codec.

subprocess module doesn’t warn

The default encoding for PIPE is relating to the encoding of the stdio.
It should be discussed later.

Reference Implementation

Copyright

This document has been placed in the public domain.

2 Likes

If we add a dedicated option like PYTHONWARNTEXTENCODING, users need to use it with an option like -Wd because DeprecationWarning is suppressed by default.
So, enabling this warning with dev mode looks easy and simple for users.

On the other hand, if some users don’t like this warning but want to use dev mode, the dedicated option is better.

I like this proposal, and I think it should just be a regular deprecation warning (no extra options for it). Great job :+1:

Maybe we should also emphasise that the plan is to eventually bring back a default value of (presumably) UTF-8. But we need to deprecate the old default first because of the high risk of data loss when the change happens.

I am afraid that it makes too noisy warnings. How about this plan?

  • Python 3.9a6 – Implement the PEP with regular warning (no option).
  • Python 3.9bN – We may remove the warning from 3.9 branch, regarding to feedback.
  • Python 3.10~ – DeprecationWarning

In this plan, Python 3.9 may not raise DeprecationWarning. But Python 3.9 supports encoding="locale" option and io.text_encoding(). Users can use them in Python 3.9+ code in the future.

By the way, PendingDeprecationWarning looks better than DeprecationWarning at the moment. We don’t have actual plan to change the default encoding yet.

This has the potential to be another case where working code in libraries generates warnings that end users of those libraries end up needing to deal with. And libraries that still support Python 2 will have to switch to io.open, as open doesn’t have an encoding argument in Python 2.

I just did a quick check of pip, and we have a few such cases. And our vendored dependencies have quite a lot as well. I doubt all of those will get fixed for 3.9, so if I’m understanding correctly, pip will be flagging this warning to 3.9 users.

I don’t know the best answer here, but please be aware of the “end user who has no real control over the libraries used in apps they need” issue when deciding how to make this transition.

I’m not against this idea in principle, I’ve just had bad experience in the past of being stuck with annoying warnings for extended periods.

1 Like

I can never remember whether these warnings are on or off by default. Off by default is fine, and turned on with other deprecation warnings.

Chances are end users will be impacted negatively by this (eventually) if their dependencies haven’t been updated, so it’s probably not terrible to warn them too.

I’m also a big believer in being noisier during prereleases. So let’s go on by default as soon as it lands, then turn them off for the final release (maybe this should just be the overall policy?)

DeprecationWarning and PendingDeprecationWarning are suppressed by default.
So end users will not see the warnings.
But it makes sense to me. We should wait to enable the warning by default until we fix all warning in stdlib, tests, and bundled pip.

I updated the draft to exclude subprocess module.

I found Python test uses subprocess heavily to run Python in child process. The locale encoding is used here for now because Python uses locale encoding for stdio.

Should we change the stdio encoding when we change the default text encoding? I don’t want to discuss it for now. That’s why I exclude PIPE encoding in this PEP.

Thanks, I was in the same situation as @steve.dower - I can never remember whether these are off or on by default :slightly_smiling_face:

The locale encoding is only used for process pipes because that’s the current TextIOWrapper default. So when you change one it should change the other.

If subprocess keeps the locale default then you won’t be able to communicate with subprocesses of itself (except in bytes mode) unless you also avoid changing TextIOWrapper. Since the latter is the point, I think you need to change both.

This PEP provides a way to suppress warning without breaking backward compatibility.
subprocess module can encoding = encoding or "locale" and pass it to TextIOWrapper.

sys.stdin, sys.stdout, sys.stderr do not use the default text encoding too. (ref)

So if subprocess changes the default encoding, we need to change stdio encoding too or we can not communicate child Python process by default.

I am not against changing the default PIPE encoding. But I want to postpone warn about PIPE encoding because we can not provide recommended way to communicate with child subprocess in text mode.

subprocess.run([sys.executable, script], encoding="utf-8")   # (1)
subprocess.run([sys.executable, script], encoding="locale")  # (2)
subprocess.run([sys.executable, script], encoding=sys.stdin.encoding)  # (3)
  • (1) may not work now, because child Python process will use lcoale encoding for stdio
  • (2) might not work in Python 4.0 if Python changed the stdio encoding to UTF-8.
  • (3) may not work when current process doesn’t have a valid stdin.

subprocess will raise warning after we decide how future Python change the default encodings.

On the other hand, warn when users open text files (JSON, YAML, TOML, Markdown, reST, etc…) without encoding is worth enough even though we do not decide to change the default text encoding yet.

Okay, that’s fair enough.

I wonder if we could safely add PYTHONIOENCODING to the environment for subprocesses, to help close the divide for at least our own processes? That seems like something that could go into 3.9 anyway in Popen.

I am worrying I can not fix all warnings in stdlib and test at once.
(Current progress: https://github.com/python/cpython/pull/19481/files)

So I think I can not enable it by default as soon as it lands.
I think we enable the warning by default after 3.9b1, and before 3.10a1.

In some cases we may have to expose new encoding/errors parameters - anywhere we’re decoding from a user provided file that doesn’t have a clear spec. And those will likely need a default or a warning.

But it’s probably worth getting the PEP more widely reviewed and accepted before sinking too much effort in. Once the idea is approved, we can share the effort of fixing warnings.

Great idea! I’ll repeat what I said here: PEP 597: Use UTF-8 for default text file encoding

1 Like