Making the wheel format more flexible (for better compression/speed)

dstufft · April 7, 2020, 3:51am

I didn’t put any thought into it beyond “what’s the highest” and figured we could tune from there, if 19 makes more sense that’s fine with me. I’ll rerun my numbers with that.

requests:

current: 57k
tar+xz: 44k
tar+zstd: 45k

pip:

current: 1.4M
tar+xz: 914k
tar:zstd: 960k

tensorflow:

current: 402M
tar+xz: 126M
tar+zstd: 141M

Here is the numbers again with zstd at 19, xz still at compression 6 actually.

So that means to me the tradeoffs seems to be:

lzma:

Smaller without blowing up the memory on decompression.
Slower
More memory on compression.
Already part of the Python standard library, so transition path is simpler.

zstd:

Much Faster
Somewhat larger files, but within the same ballpark.
Less memory on compression.