Making the wheel format more flexible (for better compression/speed)

I didn’t put any thought into it beyond “what’s the highest” and figured we could tune from there, if 19 makes more sense that’s fine with me. I’ll rerun my numbers with that.

requests:

  • current: 57k
  • tar+xz: 44k
  • tar+zstd: 45k

pip:

  • current: 1.4M
  • tar+xz: 914k
  • tar:zstd: 960k

tensorflow:

  • current: 402M
  • tar+xz: 126M
  • tar+zstd: 141M

Here is the numbers again with zstd at 19, xz still at compression 6 actually.

So that means to me the tradeoffs seems to be:

lzma:

  • Smaller without blowing up the memory on decompression.
  • Slower
  • More memory on compression.
  • Already part of the Python standard library, so transition path is simpler.

zstd:

  • Much Faster
  • Somewhat larger files, but within the same ballpark.
  • Less memory on compression.
1 Like