Thanks! I tried with larger buffer sizes like io.BufferedReader(f, buffer_size=1024*1024), but it did not really improve, still between 550k and 600k per second for my example.
The default value of
io.BufferedReader() is 8 KiB.
buffer_size may not help a lot.
buffer_size is 100 KiB, a 420 KiB read request will be completed in these steps:
- Read 400 KiB from the underlying stream in block mode.
- Request 100 KiB from the underlying stream, and store to the buffer. If 20+ KiB is read, it will stop to prevent blocks indefinitely .
- Return 420 KiB data to the caller.
BufferedReader.readline() has the similar behavior, if it reads a ‘\n’, it stops .
The role of
BufferedReader here is more to ensure that enough bytes can be read. The underlying stream’s
.read(size) method only return at most size bytes .
BufferedReader.read(300_000) will continuously call the underlying stream’s
.read() method, until it return 300_000+ bytes.
_compression.BUFFER_SIZE (default 8 KiB) should also be changed, it’s the decompressor’s input chunk size . If it’s 8 KiB, maybe only dozens of KiB can be decompressed, can’t fill
BufferedReader's 1 MiB buffer.
Even increase these two values at the same time, the improvement is not big. Increasing them from 8KiB to 128KiB,
.read(-1) will speed up from 6.6x seconds to 6.5x seconds. I tested bz2/lzma modules, .readline()/.read(-1) methods, when I have time, I will look into the reasons.
At a guess, having run into this kind of issue elsewhere in CPython, the default buffer sizes are probably too small for modern PCs and modern workflows.
OS usually has read prefetching, this paper written in 2005 introduced the prefetching mechanism in Linux/BSD.
decomp = Decompressor()
input_data = ifp.read(READ_SIZE)
output_data = decomp.decompress(input_data)
If the OS has prefetching and write cache, it may perform the tasks (read/decompress/write) in parallel to some degree.
If change the default buffer size, it’s best to ask OS experts, and those who are familiar with various storage devices. If it is too large, it may be slower.