Please try PYTHONWARNDEFAULTENCODING (PEP 597)

Hi, folks.

I have added the PYTHONWARNDEFAULTENCODING envvar to Python 3.10, to find open() without the encoding option.

I want to discuss the next step for Python 3.12. I have two options now:

a. Make EncodingWarning shown by default (or shown like DeprecationWarning). Change the default encoding long later.
b. Change the default encoding to UTF-8 and provide an option such that “back to legacy default encoding”.

But please don’t start the discussion yet. Instead, please try the PYTHONWARNDFAULTENCODING=1 and see how are the EncodingWarning.

  • How is it useful? Do you find issues following?
    • The real issue – Opening files with default encoding although it should be UTF-8.
    • Potential issue – Similar to real issues, but it is local scripts that run only on UTF-8 Unix.
    • Forward compatibility issue – opening files with default encoding and the encoding should be the locale encoding. encoding="locale" is needed before the default encoding is changed to UTF-8.
    • Non-issue – opening files with default encoding but the file is encoded by ASCII always.
  • How is it noisy? Should we show them to developers like DeprecationWarning?
    • Do you think every Python developer should see the EncodingWarning before we change the default encoding?
    • Or does it have only little benefit but make Python developers ignore other warnings?
    • (Please write your thought, but please don’t start discussion in this thread. I will create a discussion thread later in this year.)

Please give us your feedback!

All of my code runs on systems where the default encoding is UTF-8. I’ve tried the environment variable with a number of scripts and most of the warnings are non-issues or potential issues (in your description).

The warnings are pure noise for me and won’t result in code changes. Enabling this warning by default will result in me looking for ways to turn the warning off because its just noise, unlike DeprecationWarnings that I tend to turn on during development because they help in finding code that needs attention to future-proof.

2 Likes

In Pylint v2.10 we have added a new check for it unspecified-encoding (W1514) which is enabled by default. It might not cover all cases, but should help with the transition.

1 Like

Thank you. I didn’t know it.
I’m very interested in how many pylint users will ignore the warning.

Hello @methane,

I’ve tried to rebuild all Python packages in Fedora with Python 3.10 with PYTHONWARNDEFAULTENCODING enabled and compared the results with the ones built with regular Python 3.10.

143 out of 3891 (3,67%) packages failed to build.

You can find build logs here:
https://copr.fedorainfracloud.org/coprs/g/python/python3.10-with-WDE/packages/
This is control COPR with regular Python3.10:
https://copr.fedorainfracloud.org/coprs/g/python/python3.10-without-WDE/packages/

List of failed packages

asv
borgbackup
brd
Cython
dnf
grpc
gyp
kitty
mrchem
numpy
opencv
openvswitch
pew
pre-commit
pybind11
pyparsing
pytest
python3-pytest-asyncio
python-aiohttp
python-ansible-compat
python-ansible-pygments
python-ansible-runner
python-apsw
python-argcomplete
python-ase
python-astor
python-astropy
python-autopep8
python-avocado
python-bidict
python-biopython
python-build
python-cachelib
python-cheroot
python-cherrypy
python-click
python-colcon-argcomplete
python-colcon-bash
python-colcon-cd
python-colcon-cmake
python-colcon-core
python-colcon-defaults
python-colcon-devtools
python-colcon-ed
python-colcon-library-path
python-colcon-metadata
python-colcon-mixin
python-colcon-notification
python-colcon-output
python-colcon-package-information
python-colcon-package-selection
python-colcon-parallel-executor
python-colcon-powershell
python-colcon-python-setup-py
python-colcon-recursive-crawl
python-colcon-ros
python-colcon-test-result
python-colcon-zsh
python-colorspacious
python-cram
python-dask
python-dateutil
python-django
python-docutils
python-email-validator
python-enrich
python-eventlet
python-execnet
python-fastapi
python-filecheck
python-fissix
python-flake8
python-flask
python-fslpy
python-hypothesis
python-hypothesmith
python-itsdangerous
python-javaproperties
python-jinja2
python-josepy
python-libpysal
python-libsass
python-mako
python-markupsafe
python-mirakuru
python-mplcairo
python-mplcursors
python-nagiosplugin
python-nbclient
python-neurom
python-nilearn
python-nose
python-nose2
python-patsy
python-pep517
python-pluggy
python-ply
python-port-for
python-pydocstyle
python-pyqtgraph
python-pytest-bdd
python-pytest-forked
python-pytest-regressions
python-pytest-toolbox
python-pytest-xdist
python-pytest-xprocess
python-requests
python-scripttest
python-setuptools
python-setuptools_scm
python-sh
python-simplejson
python-starlette
python-subprocess-tee
python-suds
python-tblib
python-tornado
python-towncrier
python-tqdm
python-trio
python-troveclient
python-uvicorn
python-versioningit
python-virtualenv
python-watchdog
python-watchgod
python-werkzeug
python-xmlschema
python-yapf
python-yaql
python-yarl
renderdoc
rss2email
sagemath
snakemake
spglib
subversion
sugar-paint
swid-tools
xonsh
xrootd
yamllint
z3

1 Like

Hello again Inada-san, thanks for bringing this up and I’m looking forward to continuing to see this process move forward.

Oops, I didn’t see the “please don’t start the discussion yet” admonition on the first pass as I was sequentially going through your post—it might be wise to simply not include those parts that you aren’t yet looking for comment on, to avoid any potential for confusion. As such I’ve collapsed my response in a details block, to preserve it for the future while avoiding derailing the discussion now.

This would be a much safer option than immediately switching to UTF-8 by default after only having an optional warning that was only enabled by a special -X option, and would give developers a more reasonable amount of time to fix their code. At present, it is likely that very, very few are actually seeing these warnings; no projects I know of have yet turned them on even in CI (though I have done some experimental once-off runs with it to catch outstanding issues). Simply switching to UTF-8 could cause existing projects that depend on the legacy behavior, either explicitly or implicitly (e.g. reading previous output using locale-dependent encoding) to silently break.

Something like the following timeline might work:

  • In 3.12 (3.10 + 2), show EncodingWarning like DeprecationWarning, in __main__ and like other warnings with -W default, -X dev, etc. This will allow developers with proper python invocation, pytest or CI testing configs to catch and fix these issues, without either causing extra noise for users or having to enable special bespoke interpreter options.
  • In 3.14 (3.10 + 4), display the warning by default to all users.
  • In 3.16 (3.10 + 6, i.e. when all supported version of Python incorporate encoding="locale"), make not explicitly specifying encoding an error
  • In some future version (4.0?) allow encoding to be unspecified again, with UTF-8 as default.

Overall, we found it quite useful to spot these issues (which can and do cause many real problems for users that frequently work on non-*nix platforms like myself), and I was able to spot several with this. However, there were a few practical limitations at the moment that limited its potential utility so far.

We found a number of those, both in our own codebases, and in others.

I don’t really deal with too many of those, and we try our best to make our code correct, explicit and cross-platform, so this wasn’t an issue for us.

Forward-compat was the biggest issue for us, but not because of encoding="locale"—at least in our various use cases, that wasn’t really needed at the moment (though we could forsee some where it would benefit). The actual problem was that EncodingWarning was a brand new warning, and occurred in a number of dependencies outside our immediate control, so we had no way of silencing it in our Pytest config or our Python invocation string in a way that would not either break other warnings, cause our test suite to error out (since we use -W error by default to ensure non-silenced warnings are actually seen and dealt with) or be very imprecise and potentially silence other desired warnings:

  • The Pytest config is static, so without a hacky script rewriting it for different Python versions, we couldn’t add a warning filter for it there, or else it would result in the test suite erroring out completely on Python versions <3.10.
  • We also couldn’t reliably add it via a warnings filter passed via -W (which is needed to avoid errors that occur on or before full Pytest initialization and hooks fire), since -W does not support much of the same syntax as filterwarnings that is required for reliable but precise warning silencing
  • Finally, we couldn’t add a manual filterwarnings with branches for Python versions in a Pytest hook, because that either gets fired too late to silence early warnings or overriden anyway.

As such, it was useful for a manual pass to catch warnings in our libraries/applications and direct deps, but it is not yet useful to incorporate into routine test runs, which would make it much more broadly applicable.

While it doesn’t totally fix this problem, a staged approach to gradually enabling this (as proposed above), potentially combined with considering making EncodingWarning a subclass of another warning (DeprecationWarning, PendingDeprecationWarning and/or FutureWarning) would help to ameliorate these impacts over time.

In our view, it is much better to be explicit when this is the case, and this may always change, so we still consider it useful in this case. Also, technically speaking, it isn’t actually guaranteed that the locale encoding is 100% ASCII-compatible, though this is almost certainly the case on essentially every platform with a modern version of Python.

Yes. I suggest something like:

  • In 3.12 (3.10 + 2), show EncodingWarning like DeprecationWarning, in __main__ and like other warnings with -W default, -X dev, etc. This will allow developers with proper python invocation, pytest or CI testing configs to catch and fix these issues, without either causing extra noise for users or having to enable special bespoke interpreter options.
  • In 3.14 (3.10 + 4), display the warning by default to all users.
  • In 3.16 (3.10 + 6, i.e. when all supported version of Python incorporate encoding="locale"), make not explicitly specifying encoding an error
  • Then in a later version, change the default encoding

You could maybe skip 1 year or even 1 step, but this would ensure a smooth deprecation process.

I don’t think so, because like other warnings, it in many (though not all) cases represents a real problem, and certainly code that can and should be improved to be more explicit, and at least in our experience, it wasn’t overwhelmingly more common than other types of warnings.

I’m not 100% clear where you want to draw the boundary here between the two, but I’ve interpreted it as only commenting on the latter set of bullets rather than your plans above, and not responding to others whose feedback and views sharply differ from my own experience.

1 Like

Jut to clarify your feedback for those reading, could you clarify a bit what this number represents? I’m assuming part of the “build” is running the projects’ respective test suites; is this run with -W error (thus resulting in the failures)? Or are the failures due to something else?

Thank you for your feedback!

I worried starting long thread here because it makes difficult to see feedbacks of the PEP 597.

Writing your thoughts in your feedback is OK and welcome. But please don’t against or agree other’s feedbacks and thoughts. (Using like button is OK because it don’t make noise).

I didn’t looked all of the error. But as far as looking two errors, tests checking subprocess output are failing because they don’t ignore stderr.

2 Likes

Would you try PYTHONUTF8=1 instead of PYTHONWARNDEFAULTENCODING=1 next time?

I want to know how many modules will be broken by the PEP 686.