Why not use a more descriptive enumeration for the mode then? Because duplex with mode 'r'
almost always needs a comment to make it make sense. I think all these modes do unless you’re familiar with them.
I suggested that earlier but we felt in the end that the single-character str worked better overall.
Read from here onwards to see how that discussion went.
I support having a mode argument which takes a single-character str. It fits the model of the open
function which, after all, is what this API is supporting.
I don’t see much of an argument for strings over enums besides that other functions in Python accept strings. Did I miss it somewhere?
In support of descriptive arguments (longer strings or enums): I think these one character strings are really bad for readers of code, and reading code happens thousands of times more than writing code. We should make code as easy to read as possible.
__open_duplex__(mode='r')
is not easy to read.
In support of enums over strings: If someone wants to write a shim, say
def open_log_file(mode: Mode): ...
then with strings you are forced to annotate the mode by copying the Literal[....]
. Therefore, you’ll want to expose the literal somewhere anyway:
type OpenModes = Literal[...]
So you’re halfway to having a type safe enumeration anyway.
Sorry, but why should “fitting the model” of the existing function matter? Are we suggesting that its design is ideal?
Nobody (okay, almost nobody) is going to call __open_duplex__
directly.
The only code that is going to call these dunder methods is:
builtins.open
- maybe
pathlib.Path.open
- maybe the odd function elsewhere where someone has a need to write some low-level glue code.
And in basically all of those cases, it won’t be called with mode='r'
, it’ll be called with mode=mode
after applying an algorithm to validate the caller’s mode
and turn it into the mode
that needs to be passed to the dunder:
The code that people are going to end up reading is:
open(path, mode='r+')
since that’s what would end up calling this dunder. And indeed, to the untrained eye, that’s not easy to read/understand - but it’s been around for decades and is not something that’s easy to change now.
Consistency. __open_reader__
is a dunder that allows a class to implement open
. It makes sense to me to be consistent with the API of open
.
You can disagree. That’s fine. Ultimately it’s @barneygale’s decision - I’m just casting my vote for my preferred option.
Good point - thanks.
Just to throw out another option: the open()
docs say that ‘+’ means opening for “updating”, so we could use _updater()
perhaps
(I think it’s academic - we don’t generally add type annotations in the stdlib.)
We could use io.BufferedReader
, io.BufferedWriter
and io.BufferedRandom
, but I suppose they’re too specific. In the typing
module we have BinaryIO
but it’s not further differentiated. Perhaps we could add typing.BinaryInput
and typing.BinaryOutput
?
IIUC BufferedIOBase
guarantees that all requested data is read/written in read()
and write()
, whereas this isn’t guaranteed in RawIOBase
. But both of these classes are typing.BinaryIO
nonetheless. Cody’s first reply explains it better.
I was trying to increase the signal-to-noise ratio for the purposes of this discussion, as we all seem to agree that it’s an int
!
As suggested above, buffering could be left up to the caller (open
), so the return value would simply be the raw stream, and the buffering
parameter can be dropped, simplifying the implementation for callees. Is there a reason for implementers to handle buffering themselves?
Implementers may be unable to return an unbuffered stream, e.g. zipfile.ZipFile.open()
can’t return anything less buffered than the ZipFile
’s underlying file obj. If you nevertheless treat the __open_reader__
return value as unbuffered and wrap it in BufferedReader
(for example), you introduce a pointless level of buffering IIUC.
edit: even so, I’m keen to drop the buffering
parameter in the dunders if we can find a way to do it. I suspect that most implementations won’t be in a position to fully control buffer sizes
If we kept the buffering
parameter in our open()
function but removed it from the dunders, then perhaps our open()
implementation could handle buffering like this:
if buffering == -1:
pass
elif buffering == 0:
warn("disabling buffering is unsupported")
elif buffering == 1:
warn("line buffering is unsupported")
elif isinstance(stream, io.BufferedIOBase):
warn("configuring the buffer size is unsupported")
# because the stream is already buffered!
else:
stream = io.BufferedReader(stream, buffering)
# or BufferedWriter/BufferedRandom depending on mode
Arguable some/all of the warnings should be exceptions.
edit: or even more simply:
if buffering != -1:
raise ValueError("configuring the buffer size is unsupported")
Hmm, does BufferedReader
check if the wrapped stream has read1
, and use that if available? That might be the right API level for avoiding needless buffering.
What is the goal / use case / reason you want to avoid buffering?
BufferedReader
prefers readinto
if the “Raw I/O” provides it to reduce allocations and copies (read1
actually defers to readinto
by default). If code allocates/manages its own buffer and calls buffered.readinto()
generally only get os.readinto
/ the read
system call writing directly to caller-provided buffer. (note: this isn’t what the docs say currently)
to just throwing an exception if specific buffering is specified to start.
I’ve opened a PR that adjusts pathlib’s internal open()
-like function to our design:
I for one like the design very much thank you everyone for your contributions so far!
Even if we don’t currently have corresponding text
functions, I would still include binary
or bytes
in the methods names to be future compatible.
Quick poll just to see if there’s a clear preference. If the results are close I reserve the right to pick a winner . I’ve added
buffered_reader
as an option because it squares with io.BufferedReader
and friends. Vote for up to four.
__open_reader__()
__open_bytes_reader__()
__open_binary_reader__()
__open_buffered_reader__()
Can we please makes a opener a concept and have a single dunder method return something following a abc instead of adding dozen of potential methods to objects openable
Actually, I really like this idea, if I’m understanding what you’re asking for correctly:
- Openable objects have a single
__opener__
dunder - Calling
openable.__opener__()
returns an object that satisfies someOpener
interface openable.__opener__().open_reader()
would replace the currentopenable.__open_reader__()
; similar forwriter
andupdater
(each with theirmode
argument)
The benefit that I see is:
Openable objects only need one dunder, avoiding clutter. Then, because the opener needs only deal with opening, that interface could be extended with additional methods in future at any time; e.g. if buffering was added, a supports_buffering
method/property/attribute could be added; or maybe a supports_mode
method could be added, which could aid adding modes like c+
and x+
in future; etc etc. So it makes it far more future-proof.
With this feature in particular, I feel there’s a lot of areas where additional features could be desired in future, and from the beginning I was concerned about how dunders don’t really provide many great options for future-proofing, so this would solve that quite nicely.
It does make the interface a bit more complicated at first glance, but I think it’s worth it. Implementors could always choose to def __opener__(self): return self
and then implement the opener interface on the class itself if they didn’t want to define a separate class, which’d only be adding 1 trivial function compared to the current proposal.
@barneygale What are your thoughts on this?
Another fun detail is that io could provide abcs and mixins for various implementations and they wouldn’t Frankenstein the openable as the mro is contained in the opener
Adding a generic __opener__
method would fail to satisfy the original intention of being typing-friendly and it’s semantically exactly equivalent to having a simple __open__
method. supports_buffering
and supports_mode
have no real benefit - all the top level open
function could realistically do in such a situation is to raise a TypeError
anyway - the exact same thing that a missing method or parameter would do.
I’d be concerned that an opener interface would end up meaning either:
a. more temporaries + class definitions (more code)
b. modules more often need to make a “static” / global object that gets returned from all the openers which may wrap something
In general, more code which needs to be run → more runtime → if it ends up in a “hot loop” people will hand-code around it to avoid it. Two extra functions are cheaper than objects to me for that. also reduces number of hops / lookups / function calls.
Any reason that Opener
isn’t a protocol which says you do the two opener functions? The Path
implements “Opener” (the self case), or you could always write a self.opener()
function if you wanted to keep it separate. (open(self.opener())
). That is to say I think a dedicated “opener” is very equivalent to the two methods (esp. if that is required to have the two methods). Just adds another layer of indirection which, to me, I like as optional.