Making the wheel format more flexible (for better compression/speed)

twm · April 19, 2020, 3:51am

I was curious what effect the zlib tunables would have here. zipfile uses the zlib default compression level:

python/cpython/blob/69cdeeb93e0830004a495ed854022425b93b3f3e/Lib/zipfile.py#L687-L688


return zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,
                        zlib.DEFLATED, -15)

So if we drop a patch in wheelfile.py (with zopfli in there as well):

import zipfile, zlib, zopfli
z_comp = os.environ['Z_COMPRESSION']

def _get_compressor_patch(compress_type):
    if compress_type != ZIP_DEFLATED:
        return _get_compressor(compress_type)
    if z_comp == 'zopfli':
        return zopfli.ZopfliCompressor(zopfli.ZOPFLI_FORMAT_DEFLATE)
    return zlib.compressobj(int(z_comp), zlib.DEFLATED, -15)

_get_compressor, zipfile._get_compressor = zipfile._get_compressor, _get_compressor_patch

With that in place I used wheel pack on a tensorflow wheel at varying compression levels:

compression     user   sys   max RSS  size
deflate -1     50.23  1.04  1555476k  403M
deflate 1      23.58  0.91  1605604k  457M
deflate 2      24.79  0.97  1594144k  445M
deflate 3      28.16  1.01  1584876k  435M
deflate 4      31.30  0.94  1571500k  419M
deflate 5      37.34  0.93  1561152k  409M
deflate 6      50.17  0.98  1555252k  403M
deflate 7      60.30  0.95  1554268k  401M
deflate 8      88.70  1.12  1552852k  400M
deflate 9     115.30  1.27  1552480k  400M
zopfli       6154.12  3.24  3075700k  388M

i5-4670 desktop PC (scheduler powersave, frequency boost off). The user, sys, and max RSS are as output by /usr/bin/time. The size column is the humanized size per ls.

The actual wheel on PyPI seems to have been generated with a compression level of 3 or so, by its size. Perhaps wheel should configure zlib deflate with a slightly higher level of 6 or so?

I also threw zopfli in there for comparison. While the resulting wheel is smaller it took over an hour and a half to generate.