subprocess module has text=False option. When text=True is passed, locale encoding is used for now.
Instead of changing the default encoding, we can deprecate the text option.
subprocess.check_output(["ls", "-l"], encoding="utf-8") is little longer than subprocess.check_output(["ls", "-l"], text=True).
But the difference is smaller than difference between open(file) and open(file, encoding="utf-8").
I think that instead of deprecation it would be better to specify that text=True means a native system encoding and If encoding is given, text should not be specified. (phrased after sock parameter of loop.create_connection).
On one side, I’m for this deprecation. Encoding is not some automagick stuff that just works somehow; a programmer must explicitly agree processes, especially when the child either inputs/outputs content of a text file in whatever single byte encoding it’s saved or explicitly documents its stdio encoding.
However, deprecation means that text=... will be removed in a couple of versions, and it’ll break countless Python 3 tutorials and books.
Practicality beats purity - being able to get the text output of a command is a common need, and the text approach is (in my experience at least) pretty much always correct.
If we’re confident that encoding="utf-8" is better than encoding="locale", then we should just change the effect of text=True (and the default encoding of TextIOWrapper).
Change the encoding used by text=True to UTF-8 like open(filename), or
Deprecate text.
When user want to use locale encoding, user need to specify encoding="locale" anyway.
But I think default encoding of PIPE should be consistent with default encoding of stdio.
If we don’t change the stdio encoding when changing default text encoding, no change is needed for text=True.
I spotted the legacy_text_encoding thread after I’d read this one. Sorry. The discussion is getting split over too many threads to follow easily
IMO, we shouldn’t deprecate text=True, because it expresses the user’s intent clearly. Even if it’s identical to a particular encoding (currently locale, maybe utf-8 in future), it’s still valuable because it expresses intent rather than mechanism. It’s actually the encoding parameter that should be the special case, for handling the odd cases where the default that text=True uses isn’t correct.
I’m sorry about it. I expected that the main topic (JEP 400) will be filled about warning. So I created some subtopics to focus on technical decisions I need to write new PEP.
Current my preference is:
Use UTF-8 mode, instead of adding new (legacy_text_encoding ) option.
Keep text option of subprocess, but it will emit EncodingWarning by opt-in and use UTF-8 as default.
What’s the point of a warning? Presumably the message given by such a warning is “you shouldn’t be using this option”, in which case we should just admit what we’re doing and make it a deprecation warning. If it’s not a deprecation, then how should a user prevent the warning without removing the text=True option? (Note that I am not talking about suppressing the warning via the warnings module, but about taking the warning seriously and doing what is recommended to stop getting the warning in the first place).
I think we should formally deprecate it, but be clear that we’re not removing it, simply changing its definition to "exactly the same as passing encoding='utf-8', errors='strict'".
Currently this is not what it means, and I don’t think we can change it without providing a deprecation period so that existing callers are alerted to the upcoming change, but it doesn’t necessarily mean it will be removed. (And yes, I do think people should stop using this option entirely and should pass the encoding they expect the data to be in, but there’s no point breaking code that usually doesn’t corrupt user data just because it sometimes might… we aren’t operating at Microsoft levels of safety here (though perhaps we should… this approach certainly wouldn’t fly for me at work…) )