Forcing sys.stdin/stdout/stderr encoding/newline behavior programmatically

I learned the hard way that Python does not always use UTF-8 for sys.stdin/stderr/stdout.

In order to make my own tools always work properly as pipes using UTF-8 encoding and also handle new lines properly (like open would do by default), I would like to set this programmatically for all Python versions 3.6 or later. Some solutions suggest to use environment variables or python command options, but that would require additional thought and action by the user which I want to avoid.

There are many different recipes for this around and I find them confusing and I am not sure what exactly the implications of the different methods are.

For Python version 3.7 or later this seems to be easy:

sys.stdout.reconfigure(encoding="utf-8", newline=None)

Buf for Python 3.6 or earlier the following suggetions have been made:

  1. sys.stdout = open(sys.stdout.fileno(), mode="w", encoding="utf-8", newline=None, buffering=1)
  2. sys.stdout = io.TextIOWrapper(sys.stdout.buffer, newline=None, encoding="utf-8", buffering=1)
  3. sys.stdout = io.TextIOWrapper(sys.stdout.fileno(), newline=None, encoding="utf-8", buffering=1)

Is the approche above for 3.7 and later the correct/recommended one and which, if any of the 3 alternatives for 3.6 and earlier is the correct/recommended one?

What is the reason the sys.stdin does not work like open(..., mode="r", newline=None) by default, i.e why does stdin/out/err not use the universal line ending setting by default?

1 and 2 should work, but it may be safer to modify them:

  • sys.stdout = open(os.dup(sys.stdout.fileno()), mode="w", encoding="utf-8", newline=None, buffering=1)
    sys.stdout = open(sys.stdout.fileno(), mode="w", encoding="utf-8", newline=None, buffering=1, closefd=False)
  • sys.stdout = io.TextIOWrapper(sys.stdout.detach(), newline=None, encoding="utf-8", buffering=1)

3 should not work.

No one method works in absolutely all cases (if you reconfigure them multiple times, or rewrite also sys.__stdout__ and sys.__stderr__, or run in IDLE, etc), this is why the reconfigure() method was added at first place. But in your case (reconfiguring only at the beginning, before any output, and keeping sys.__stdout__ and sys.__stderr__ intact) I think that any of them should work.

But it works so by default!

1 Like

Thank you!
Just to understand you suggestions better: why would os.dip or closefd=False be safer if I use sys.stdout as I would normally do, i.e. without ever calling close() on it? What happens if I would actually close the original file descriptor, why is this something I would not want?

Regarding the default newline mode of stdin/out/err I have seen claims that they are not using universal newline mode by default on stackoverflow, but I cannot remember if this was related to an older Python version or maybe the claim was wrong.
However, I could not find where this is documented, here it does not appear to get mentioned:

I prefer the version with sys.stdout.detach(). It attaches the existing buffered reader to a new text wrapper. It doesn’t mess around with duplicating or closing file descriptors.

In Windows, standard I/O uses universal newlines. In POSIX, standard I/O uses “\n” as the newline character.

1 Like

Thank you for this information - could you perhaps point me to where this is documented? I was unable to find this in the docs.
I am not a fan of behaviour like that: make it dependent on the system or other situational properties how a program behaves. Luckily, this is otherwise usually not the case with Python.
For example, the behaviour you describe could lead to surprises when the same file is shared and processed on Windows vs Linux. On the other hand, that difference would not exist if the file is openend with open because that has the same default newline behaviour on both OS.

I couldn’t find where the behavior is documented, if it is. Instead, I relied on the source of create_stdio() in “Python/pylifecycle.c”.

1 Like

Wow, thanks.
Definitely something that SHOULD be document though.