It’s the second time that the same discussion happens about changing the default encoding. There are basically two groups: supporters of the status quo who are fine with the current used encoding, and supporters of UTF-8 everywhere.
The problem of Python 3 was that there was no easy way to switch from Python 2 to Python 3 and that the migration was mandatory. Maybe we need to find a way to support “Python 2” (current encoding) and “Python 3” (UTF-8) in the same Python. For example, if I trust my environment and understand what I’m doing, how can I easily enable the UTF-8 mode from my Python script?
There is no sys.set_utf8_mode(True) for technical reasons: the encoding cannot be changed at runtime. Maybe we need an helper function somewhere to opt-in for UTF-8 Mode in an application? The function would re-execute Python with UTF-8 Mode, but only for the executed process (no effect on processes: don’t set PYTHONUTF-8 env var).
import sys, os, subprocess
def enforce_utf8_mode(enabled=True):
# this function should be carefully designed to prevent fork-bomb...
if sys.flags.utf8_mode == enabled:
return
opt = 'utf8' if enabled else 'utf8=0'
argv = [sys.executable, '-X', opt]
argv.extend(subprocess._args_from_interpreter_flags())
argv.extend(sys.argv)
os.execv(argv[0], argv)
# should not return
enforce_utf8_mode()
print(sys.flags.utf8_mode)
enforce_utf8_mode(False) ensures the the code runs with UTF-8 disabled.
Maybe somehow, we should ensure that an application doesn’t call the function twice. This function should not be used by Python modules (“import …” should not executed it!).
Python modules using open() without specifying an encoding would use the ANSI code page by default, but UTF-8 enforce_utf8_mode() is called.
Does it sound like a bad idea?
Old solutions:
- https://docs.python.org/dev/library/sys.html#sys._enablelegacywindowsfsencoding
- PYTHONFSENCODING env var and sys.setfilesystemencoding(): https://vstinner.github.io/painful-history-python-filesystem-encoding.html