I’m having trouble figuring out the correct way to use the io module to implement a filter on bytes (a decompressor). There’s the push strategy inheriting from BufferedIOBase(writeable), storing a file and accepting .write(compressed data). There is the pull strategy inheriting from RawIOBase(readable) that produces decompressed data on .read(), and refills a buffer of compressed data from readable as needed. It feels like it would be easy to miss details of exactly how the classes should behave, or that a different composition would be better.
The “push” implementation pseudocode:
def write(self, data):
while data:
decompressed, consumed = decompress(data)
self.buffer.write(decompressed)
data = data[consumed:]
The “pull” implementation pseudocode:
def readinto(self, b):
if self.inbuf is empty:
self.file.readinto(self.inbuf, len(self.inbuf))
# can this return 0 bytes when the file is not really closed?
data, consumed = decompress(self.inbuf)
b[:len(data)] = data
return len(data)
The actual code: https://github.com/dholth/zstdpy/blob/master/dezstd/dezstd.py#L22
On pypy3 v7.3.1 it looks like the pull version is slower than the push version; on CPython they are about the same speed.
Am I missing a guide for doing cool things with the io module?
Thanks,
Daniel