As you may know .zip files only compress file content, individually, and do not compress any of the zip metadata like filenames and the zip directory structure. This is inefficient for archives with a lot of small files because of the uncompressed metadata and because the compression doesn’t have time to “get going” on a longer input. This is one reason why we let you put files in the root of the wheel instead of giving all files a prefix.
One way to get around this problem is to put an uncompressed zip inside another zip, so that the entire inner zip is compressed as a single unit, similar to the nested archive structure of a .deb. This will be the most helpful for wheels with a lot of small files and a lot of long filenames. I finally got around to seeing what this would do to wheel.
This program transforms a wheel so that everything except the .dist-info metadata goes into an inner .zip, and can compress that .zip with various algorithms: https://gist.github.com/dholth/42a9b452f024d283b7c3c34cfa40832a
To test, I took the top 512 packages over the last 30 days according to https://hugovk.github.io/top-pypi-packages/ and recompressed the ones that have wheels. The report shows the original size, the size change when they are recompressed with either gz -9 (the only compression algorithm allowed in wheel), bzip2, or LZMA (xz), and the weighted savings by download count. Did I make a mistake with LZMA or is it that good (and slow)? https://docs.google.com/spreadsheets/d/1PPGK5m7mX5G3blpqG77AFeR59-PiXn9guOhySZow1PI
Change in transfer with nested .zip: