subprocess.Popen
should just deprecate any way to switch to text mode other than specifying encoding
. Then once those are gone, the only way users encounter encoding is by explicitly setting it.
For file reads, I’d still like to see a smarter default encoding that:
- starts in UTF-8
- if there’s a BOM, silently consumes it (like
utf-8-sig
) and commits to UTF-8 codec - at the first invalid UTF-8 character, raises a warning (visible by default) and switches to
locale
Anyone who specifies an encoding doesn’t get this behaviour, so if you don’t want the warning then the “workaround” is to specify the encoding when you open the file. We can also have an invisible-by-default warning anytime you open a file without specifying the encoding, so that developers get a warning regardless of the content of the file. That warning can suggest specifying encoding, while the other warning could suggest converting the file to UTF-8.
For file writes I don’t think we have a choice but to deprecate (essentially) open(file, "w")
- no encoding argument or 'b'
mode means you don’t get to open a file anymore. After a full deprecation, we can bring it back defaulting to a different encoding.
Text encoding is just too complex for us to guess correctly. We’ve already provided the APIs for callers to get it right, so we probably just have to force them to use them.