Deprecating `text` option in subprocess

This thread is spin off from "JEP 400: UTF-8 by Default" and future of Python

subprocess module has text=False option. When text=True is passed, locale encoding is used for now.
Instead of changing the default encoding, we can deprecate the text option.

subprocess.check_output(["ls", "-l"], encoding="utf-8") is little longer than subprocess.check_output(["ls", "-l"], text=True).
But the difference is smaller than difference between open(file) and open(file, encoding="utf-8").

How do you think deprecating text option?

  • Deprecate text option
  • Keep text option

0 voters

I think that instead of deprecation it would be better to specify that text=True means a native system encoding and If encoding is given, text should not be specified. (phrased after sock parameter of loop.create_connection).

On one side, I’m for this deprecation. Encoding is not some automagick stuff that just works somehow; a programmer must explicitly agree processes, especially when the child either inputs/outputs content of a text file in whatever single byte encoding it’s saved or explicitly documents its stdio encoding.

However, deprecation means that text=... will be removed in a couple of versions, and it’ll break countless Python 3 tutorials and books.

Deprecation doesn’t necessarily mean that it will be removed any soon, just that its use is actively discouraged.

3 Likes

Surely you mean subprocess.check_output(["ls", "-l"], encoding="locale")? UTF-8 is not equivalent to text=True.

I’m -1 on this, because:

  1. It’s easy to make a mistake here and not replace text with the correct equivalent. (As you demonstrated).
  2. The locale encoding is obscure (it’s not documented in codecs — Codec registry and base classes — Python 3.12.1 documentation, for example).
  3. Practicality beats purity - being able to get the text output of a command is a common need, and the text approach is (in my experience at least) pretty much always correct.

If we’re confident that encoding="utf-8" is better than encoding="locale", then we should just change the effect of text=True (and the default encoding of TextIOWrapper).

I meant

  • Change the encoding used by text=True to UTF-8 like open(filename), or
  • Deprecate text.

When user want to use locale encoding, user need to specify encoding="locale" anyway.

But I think default encoding of PIPE should be consistent with default encoding of stdio.
If we don’t change the stdio encoding when changing default text encoding, no change is needed for text=True.

See Add legacy_text_encoding option to make UTF-8 default - #3 by methane for discussion about stdio.

I spotted the legacy_text_encoding thread after I’d read this one. Sorry. The discussion is getting split over too many threads to follow easily :slightly_frowning_face:

IMO, we shouldn’t deprecate text=True, because it expresses the user’s intent clearly. Even if it’s identical to a particular encoding (currently locale, maybe utf-8 in future), it’s still valuable because it expresses intent rather than mechanism. It’s actually the encoding parameter that should be the special case, for handling the odd cases where the default that text=True uses isn’t correct.

3 Likes

I’m sorry about it. I expected that the main topic (JEP 400) will be filled about warning. So I created some subtopics to focus on technical decisions I need to write new PEP.

Current my preference is:

  • Use UTF-8 mode, instead of adding new (legacy_text_encoding ) option.
  • Keep text option of subprocess, but it will emit EncodingWarning by opt-in and use UTF-8 as default.

What’s the point of a warning? Presumably the message given by such a warning is “you shouldn’t be using this option”, in which case we should just admit what we’re doing and make it a deprecation warning. If it’s not a deprecation, then how should a user prevent the warning without removing the text=True option? (Note that I am not talking about suppressing the warning via the warnings module, but about taking the warning seriously and doing what is recommended to stop getting the warning in the first place).

Note that EncodingWarning is opt-in.

Users uses only UTF-8 environments don’t need to enable EncodingWarning.

When users want to check where the default encoding is used before enable UTF-8 mode, they can enable EncodingWarning.

I posted PEP 686. I don’t propose deprecating text option in the PEP.

I think we should formally deprecate it, but be clear that we’re not removing it, simply changing its definition to "exactly the same as passing encoding='utf-8', errors='strict'".

Currently this is not what it means, and I don’t think we can change it without providing a deprecation period so that existing callers are alerted to the upcoming change, but it doesn’t necessarily mean it will be removed. (And yes, I do think people should stop using this option entirely and should pass the encoding they expect the data to be in, but there’s no point breaking code that usually doesn’t corrupt user data just because it sometimes might… we aren’t operating at Microsoft levels of safety here (though perhaps we should… this approach certainly wouldn’t fly for me at work…) )