Handling `sys.stdin.read()` in Non-Blocking Mode

giosiragusa · July 31, 2024, 2:32pm

Hello Python Community,

I’m seeking your input on an issue related to the behavior of sys.stdin.read() when stdin is set to non-blocking mode and no input is available. This discussion is based on issue #109523 in the Python GitHub repository. During the sprint at EuroPython 2024, I created a draft pull request to address this issue, and I would appreciate your feedback on the approach I’ve taken.

Current Behavior

When sys.stdin is set to non-blocking mode, calling sys.stdin.read() with no input currently results in a TypeError. This behavior seems inconsistent with the expected behavior for non-blocking I/O operations.

Options for Handling This Behavior

Option 1: Return None

Description: Modify sys.stdin.read() to return None when no data is available.
Consistency: Aligns with io.RawIOBase.read(), which returns None in non-blocking mode when no bytes are available.

Option 2: Raise an Exception

Description: Raise a BlockingIOError when sys.stdin.read() is called with no input available.
Consistency: Mirrors io.BufferedIOBase.read(), which raises an exception under similar conditions.

Considerations

Consistency Across I/O Methods: Should we follow the RawIOBase model of returning None, or the BufferedIOBase model of raising an exception?
User Expectations: Would users expect None to indicate no data, or is an exception a clearer signal in non-blocking scenarios?
Documentation: How should this behavior be documented to ensure clarity for developers using non-blocking text I/O?

Seeking Community Feedback

Your feedback is invaluable in helping us determine the most intuitive and consistent behavior for sys.stdin.read() in non-blocking mode. Which approach do you think is best, and are there other factors or edge cases we should consider?

Thank you for your input!
Giovanni

srittau · July 31, 2024, 3:12pm

There are similar concerns for gh-122179/gh-122183, where hashlib.file_digest() can’t handle non-blocking mode properly.

AndersMunch · August 1, 2024, 3:07pm

I’d say the natural behaviour is returning the empty string. That way there’s no special case. It’s just read returning a string containing the available data in every case.

cmaloney · August 2, 2024, 4:56am

I’m a fan of an exception for this, likely BlockingIOError. For me personally, stdin, stdout, and stderr have a lot of variation based on context which has led to debugging the unexpected behavior[1], and worry more one-off behavior will make more unexpected cases. I would really like the behavior around stdin to not differ from TextIO which is always on top of a BufferedIO, I think that area of the Python I/O stack is already fairly intricate and doesn’t always match expectations even with just blocking I/O[2].

Reading through PEP 3316 non-blocking behavior was definitely thought about in the Python 3.0 I/O stack. For RawI/O that explicitly calls out that no bytes returned from a read means end of file, and non-blocking I/O isn’t supported above the Raw I/O layer. For text I see particular challenges around encodings, stdin is coming from external sources, some of which may be multi-byte encodings, such as UTF-8, and in UTF-8 land just getting one byte from a read system call doesn’t mean a text character has been finished or can be emitted. To me that implies that emitting a character from the text stream (or an empty string / no character) doesn’t line up with ‘there is data, just not a full character yet’.

PYTHONUNBUFFERED / -u mode, stdout buffering print() when run as a subprocess, line buffering of stdout when run interactively / at a tty, line buffering always stderr (sys.stdout docs). Most recently when trying to strace a python binary which tried to use print() to add some “markers” I needed to get out a debugger to find the answer to “why are there writes to stderr but none to stdout until interpreter shutdown”.
I see value in having a way to make stdin/stdout/stderr more non-blocking, but they already have multiple implementations (_io, _pyio, and _WinConsoleIO), as well as different behavior based on how python is invoked. There is quite a bit of code which tries to get non-blocking via read1, but also a number of issues with that over time (ex. with PEP 475 read1 it will make more than one read system call under some circumstances, but probably ones callers would be okay with).

It would be really nice to update BufferedIO and TextIO to work well for non-blocking I/O, but am worried about layering more complexity in to “get to the right result” making common python blocking usage slower when already “reading a whole files bytes” in a single .read() is has ~15% overhead with C _io.BufferedIO for a in filesystem cache file on Linux. For the _pyio case, there’s a issue that it doesn’t have access to the same _Py_read primitive which does “read into provided butffer”, and has to create that based on os.read (which always allocates a new buffer). os.read returning “empty byte string” indicates EOF, whereas with _Py_read / a readinto the “EOF” and “Would Block” cases could be separated.

giosiragusa · August 8, 2024, 10:18am

Thank you all for sharing your thoughts and insights on this topic. The discussion has been active for over a week, and there are clearly diverse opinions on the best approach. After considering the feedback, I tend to agree with @cmaloney that raising an exception could be a clearer response.

As this is my first post in the community, I’m not sure how we should proceed. Could a moderator or admin provide some guidance on the next steps?

Thank you in advance for your assistance.

Giovanni

gpshead · August 9, 2024, 9:20pm

Overall I agree. Including the use of BlockingIOError for this. That actually makes sense as opposed to the “this case was never considered” smell of the existing TypeError.

Next steps: We need a PR implementing this. The existing Draft PR can be changed to raise rather than return None, or a new draft PR can be opened and the old one closed.

Note that this is TextIOWrapper behavior, it won’t be sys.stdin specific.

Let’s keep further discussion for the specific implementation on the issue itself.