Looking for help understanding io.py on a high level

Hello all, my first post here!

I’m looking for clarification on how the io.py module of the Python3 standard library works at a very high/conceptual level, and how the average user would use it in practice.

I’ve read the documentation from start to finish twice, and the following is my understanding based on what I inferred from the docs coupled with a very basic understanding of object-oriented programming.

The module provides a small number of top level callables such as io.open() (an alias for the built-in open()) that can be used to create a file object. It seems to be the case that open() can be used to create a wide range of file objects for many different applications, and that the actual object it returns depends on the parameters passed to it by the programmer. In some cases the returned object will actually be an interface to another, lower level, file object that is created simultaneously.

objects returned by open() are defined in, and instances of, classes defined within io.py - with the particular hierarchy of classes from which an object is constructed being determined by the parameters passed to open().

It seems to me that the average user can accomplish much using only the built-in open() and does not need to understand the io.py class hierarchy in order to create file objects that can be used to perform I/O operations on a wide range of virtual devices (not just the standard OS level files)

I’d be grateful if someone could weigh in on this and let me know if I’ve understood the theory correctly!

Edit: I may have misunderstood, it seems that perhaps open() is only used for creating a file object attached to an actual file, but I’m still unsure

That is all correct, open() is the main thing you would use to open a real file. The main reason you’d use most of the classes is to subclass when writing your own file objects - for instance, the zipfile module gives back file objects for the files inside the zip, which decompress as you read from them. Since file objects have a lot of methods, by subclassing these bases you only need to implement a few, and then the others are implemented generically for you.

Other than open(), there are 3 classes that you would often directly use manually. StringIO and BytesIO are text/binary files that just modify a block of memory. They’re also useful to efficiently build up a big string from smaller components. There’s also TextIOWrapper, which takes a binary file and decodes it as a text file on the fly. Most text file objects you get are probably this, so they don’t need to reimplement encodings and newline conversion.

1 Like

Thanks! It’s quite hard to get a feel for how it works in a practical sense just from the documentation alone. One thing that I now realise (or perhaps ‘believe’ as I’m not 100% sure) is that the backing store bound to a file object returned by open() will always be a ‘real’ file, whereas, like you pointed out, an alternative class defined in io.py must be used to construct a file object bound to another type of backing store (such as an in memory buffer, as per your two examples).

It seems now that open() can be used to construct streams that are either buffered or unbuffered, and either text I/O or binary I/O streams, and the backing store can be a file of an arbitrary format, text or otherwise, but in the end it must still be a file.