`open()`able objects

NeilGirdhar · May 13, 2025, 6:41pm

Rob Rolls:

__open_reader__(*, buffering=-1) -> Reader
__open_writer__(*, mode: Literal['a', 'w', 'x', 'c'], buffering=-1) -> Writer
__open_duplex__(*, mode: Literal['r', 'w', 'x', 'c'], buffering=-1) -> Duplex

Why not use a more descriptive enumeration for the mode then? Because duplex with mode 'r' almost always needs a comment to make it make sense. I think all these modes do unless you’re familiar with them.

rrolls · May 13, 2025, 7:03pm

I suggested that earlier but we felt in the end that the single-character str worked better overall.

Read from here onwards to see how that discussion went.

pf_moore · May 13, 2025, 7:08pm

I support having a mode argument which takes a single-character str. It fits the model of the open function which, after all, is what this API is supporting.

NeilGirdhar · May 13, 2025, 7:11pm

I don’t see much of an argument for strings over enums besides that other functions in Python accept strings. Did I miss it somewhere?

In support of descriptive arguments (longer strings or enums): I think these one character strings are really bad for readers of code, and reading code happens thousands of times more than writing code. We should make code as easy to read as possible.

__open_duplex__(mode='r')

is not easy to read.

In support of enums over strings: If someone wants to write a shim, say

def open_log_file(mode: Mode): ...

then with strings you are forced to annotate the mode by copying the Literal[....]. Therefore, you’ll want to expose the literal somewhere anyway:

type OpenModes = Literal[...]

So you’re halfway to having a type safe enumeration anyway.

Sorry, but why should “fitting the model” of the existing function matter? Are we suggesting that its design is ideal?

rrolls · May 13, 2025, 7:21pm

Neil Girdhar:

I think these one character strings are really bad for readers of code, and reading code happens thousands of times more than writing code. We should make code as easy to read as possible.
__open_duplex__(mode='r')
is not easy to read.

Nobody (okay, almost nobody) is going to call __open_duplex__ directly.

The only code that is going to call these dunder methods is:

builtins.open
maybe pathlib.Path.open
maybe the odd function elsewhere where someone has a need to write some low-level glue code.

And in basically all of those cases, it won’t be called with mode='r', it’ll be called with mode=mode after applying an algorithm to validate the caller’s mode and turn it into the mode that needs to be passed to the dunder:

The code that people are going to end up reading is:

open(path, mode='r+')

since that’s what would end up calling this dunder. And indeed, to the untrained eye, that’s not easy to read/understand - but it’s been around for decades and is not something that’s easy to change now.

pf_moore · May 13, 2025, 7:23pm

Consistency. __open_reader__ is a dunder that allows a class to implement open. It makes sense to me to be consistent with the API of open.

You can disagree. That’s fine. Ultimately it’s @barneygale’s decision - I’m just casting my vote for my preferred option.

barneygale · May 13, 2025, 8:24pm

Good point - thanks.

Just to throw out another option: the open() docs say that ‘+’ means opening for “updating”, so we could use _updater() perhaps

(I think it’s academic - we don’t generally add type annotations in the stdlib.)

We could use io.BufferedReader, io.BufferedWriter and io.BufferedRandom, but I suppose they’re too specific. In the typing module we have BinaryIO but it’s not further differentiated. Perhaps we could add typing.BinaryInput and typing.BinaryOutput?

IIUC BufferedIOBase guarantees that all requested data is read/written in read() and write(), whereas this isn’t guaranteed in RawIOBase. But both of these classes are typing.BinaryIO nonetheless. Cody’s first reply explains it better.

I was trying to increase the signal-to-noise ratio for the purposes of this discussion, as we all seem to agree that it’s an int!

EpicWink · May 13, 2025, 11:18pm

As suggested above, buffering could be left up to the caller (open), so the return value would simply be the raw stream, and the buffering parameter can be dropped, simplifying the implementation for callees. Is there a reason for implementers to handle buffering themselves?

barneygale · May 14, 2025, 3:29pm

Implementers may be unable to return an unbuffered stream, e.g. zipfile.ZipFile.open() can’t return anything less buffered than the ZipFile’s underlying file obj. If you nevertheless treat the __open_reader__ return value as unbuffered and wrap it in BufferedReader (for example), you introduce a pointless level of buffering IIUC.

edit: even so, I’m keen to drop the buffering parameter in the dunders if we can find a way to do it. I suspect that most implementations won’t be in a position to fully control buffer sizes

barneygale · May 14, 2025, 4:44pm

If we kept the buffering parameter in our open() function but removed it from the dunders, then perhaps our open() implementation could handle buffering like this:

if buffering == -1:
    pass
elif buffering == 0:
    warn("disabling buffering is unsupported")
elif buffering == 1:
    warn("line buffering is unsupported")
elif isinstance(stream, io.BufferedIOBase):
    warn("configuring the buffer size is unsupported")
    # because the stream is already buffered!
else:
    stream = io.BufferedReader(stream, buffering)
    # or BufferedWriter/BufferedRandom depending on mode

Arguable some/all of the warnings should be exceptions.

edit: or even more simply:

if buffering != -1:
    raise ValueError("configuring the buffer size is unsupported")

encukou · May 15, 2025, 7:58am

Hmm, does BufferedReader check if the wrapped stream has read1, and use that if available? That might be the right API level for avoiding needless buffering.

cmaloney · May 15, 2025, 2:17pm

What is the goal / use case / reason you want to avoid buffering?

BufferedReader prefers readinto if the “Raw I/O” provides it to reduce allocations and copies (read1 actually defers to readinto by default). If code allocates/manages its own buffer and calls buffered.readinto() generally only get os.readinto/ the read system call writing directly to caller-provided buffer. (note: this isn’t what the docs say currently)

to just throwing an exception if specific buffering is specified to start.

barneygale · May 16, 2025, 6:16pm

I’ve opened a PR that adjusts pathlib’s internal open()-like function to our design:

github.com/python/cpython

GH-128520: pathlib ABCs: improve protocol for 'openable' objects

main ← barneygale:vfsopen

opened 05:32PM - 16 May 25 UTC

barneygale

+114 -63

Rename `pathlib._os.magic_open()` to `vfsopen()`. The new name is a bit less abs…tract, and it aligns with the `vfspath()` method added in 5dbd27db7d79af88fb3f9b47f8b80438a147d349. Per discussion on discourse[^1], adjust `vfsopen()` so that the following methods may be called: - `__open_reader__()` - `__open_writer__(mode)` - `__open_updater__(mode)` These three methods return readable, writable, and full duplex file objects respectively. In the 'writer' method, *mode* is either 'a', 'w' or 'x'. In the 'updater' method, *mode* is either 'r' or 'w'. Also stop trying built-in `open()` first. I don't know whether this is a good idea or not, so it's best to leave it out for now. In the pathlib ABCs, replace `ReadablePath.__open_rb__()` with `__open_reader__()`, and replace `WritablePath.__open_wb__()` with `__open_writer__()`. [^1]: https://discuss.python.org/t/open-able-objects/90238 * Issue: gh-128520

I for one like the design very much thank you everyone for your contributions so far!

MegaIng · May 16, 2025, 7:09pm

Even if we don’t currently have corresponding text functions, I would still include binary or bytes in the methods names to be future compatible.

barneygale · May 16, 2025, 7:16pm

Quick poll just to see if there’s a clear preference. If the results are close I reserve the right to pick a winner . I’ve added buffered_reader as an option because it squares with io.BufferedReader and friends. Vote for up to four.

__open_reader__()
__open_bytes_reader__()
__open_binary_reader__()
__open_buffered_reader__()

0 voters

RonnyPfannschmidt · May 17, 2025, 6:21am

Can we please makes a opener a concept and have a single dunder method return something following a abc instead of adding dozen of potential methods to objects openable

rrolls · May 17, 2025, 4:37pm

Actually, I really like this idea, if I’m understanding what you’re asking for correctly:

Openable objects have a single __opener__ dunder
Calling openable.__opener__() returns an object that satisfies some Opener interface
openable.__opener__().open_reader() would replace the current openable.__open_reader__(); similar for writer and updater (each with their mode argument)

The benefit that I see is:

Openable objects only need one dunder, avoiding clutter. Then, because the opener needs only deal with opening, that interface could be extended with additional methods in future at any time; e.g. if buffering was added, a supports_buffering method/property/attribute could be added; or maybe a supports_mode method could be added, which could aid adding modes like c+ and x+ in future; etc etc. So it makes it far more future-proof.

With this feature in particular, I feel there’s a lot of areas where additional features could be desired in future, and from the beginning I was concerned about how dunders don’t really provide many great options for future-proofing, so this would solve that quite nicely.

It does make the interface a bit more complicated at first glance, but I think it’s worth it. Implementors could always choose to def __opener__(self): return self and then implement the opener interface on the class itself if they didn’t want to define a separate class, which’d only be adding 1 trivial function compared to the current proposal.

@barneygale What are your thoughts on this?

RonnyPfannschmidt · May 17, 2025, 4:40pm

Another fun detail is that io could provide abcs and mixins for various implementations and they wouldn’t Frankenstein the openable as the mro is contained in the opener

MegaIng · May 17, 2025, 6:28pm

Adding a generic __opener__ method would fail to satisfy the original intention of being typing-friendly and it’s semantically exactly equivalent to having a simple __open__ method. supports_buffering and supports_mode have no real benefit - all the top level open function could realistically do in such a situation is to raise a TypeError anyway - the exact same thing that a missing method or parameter would do.

cmaloney · May 17, 2025, 8:11pm

I’d be concerned that an opener interface would end up meaning either:

a. more temporaries + class definitions (more code)
b. modules more often need to make a “static” / global object that gets returned from all the openers which may wrap something

In general, more code which needs to be run → more runtime → if it ends up in a “hot loop” people will hand-code around it to avoid it. Two extra functions are cheaper than objects to me for that. also reduces number of hops / lookups / function calls.

Any reason that Opener isn’t a protocol which says you do the two opener functions? The Path implements “Opener” (the self case), or you could always write a self.opener() function if you wanted to keep it separate. (open(self.opener())). That is to say I think a dedicated “opener” is very equivalent to the two methods (esp. if that is required to have the two methods). Just adds another layer of indirection which, to me, I like as optional.