I’m having trouble figuring out the correct way to use the
io module to implement a filter on bytes (a decompressor). There’s the push strategy inheriting from
BufferedIOBase(writeable), storing a file and accepting
.write(compressed data). There is the pull strategy inheriting from
RawIOBase(readable) that produces decompressed data on
.read(), and refills a buffer of compressed data from readable as needed. It feels like it would be easy to miss details of exactly how the classes should behave, or that a different composition would be better.
The “push” implementation pseudocode:
def write(self, data): while data: decompressed, consumed = decompress(data) self.buffer.write(decompressed) data = data[consumed:]
The “pull” implementation pseudocode:
def readinto(self, b): if self.inbuf is empty: self.file.readinto(self.inbuf, len(self.inbuf)) # can this return 0 bytes when the file is not really closed? data, consumed = decompress(self.inbuf) b[:len(data)] = data return len(data)
The actual code: https://github.com/dholth/zstdpy/blob/master/dezstd/dezstd.py#L22
On pypy3 v7.3.1 it looks like the
pull version is slower than the
push version; on CPython they are about the same speed.
Am I missing a guide for doing cool things with the io module?