Pre-PEP: Patch files as releases

I’ve been considering various ways to reduce the impact of packaging on file sizes and bandwidth. Things like compression of a wheel can help, but there are likely even bigger gains than this available for some libraries by only sending what’s changed on some releases. This is likely exceptionally true for libraries that wrap and bundle native code in cases where the native code being wrapped has not changed since the previous version, or the inverse, an update only to update the bundled native code.

Patch files as a release.

It’s relatively simple, but the idea here is that for a small change, especially in libraries that bundle large binaries that may not have changed, a patch release can consist of

  • new metadata.
  • a prior version that had a full release.
  • a mapping of hashes of wheels to binary patch files that should be applied, and the hash it should result in.
  • The patch files must result in the same platform tag for before and after.

In terms of uploader and index actions:

  • An uploader may (but does not have to) upload, a full wheel alongside only if it matches the resulting hash of the patched wheel (should be bit for bit identical)
  • A multi-uploader index should not allow other users to provide patches for packages they did not upload.
  • Indexes may (but do not have to) generate patches if an uploader has not, so long as the uploader uploaded a wheel.

Installation of this would be:

  • use an already locally cached wheel or fetch the wheel
  • Verify the starting hash
  • Create a new wheel by applying the binary patch
  • Verify the resulting hash
  • Install the new wheel
  • (up to installer) choose how to cache this (it’s possible to discard one of the wheels if the patch file and knowledge of the patch is kept, or just cache both)

Considerations:

  • If we haven’t gotten to “you can’t delete from pypi” an existing patch release should block deleting the version it patches for as long as it exists.

  • It should not from a safety standpoint require a marker if a patch was generated by an index given the position of trust and the requirement of resulting equivalence, but this might need discussion.

  • Getting the most out of this from a maintainer side may result in needing to change build processes some, so that native dependencies are built in a way that they are cached separately and re-used or otherwise built reproducibly. Changes along these lines are likely a win for these projects in terms of build time even without this pep, and some people’s build systems already look like this.

2 Likes

This is a really cool idea. I do wonder if it’s possible to estimate the potential impact, or somehow gather relevant statistics from users or indexes.

My assumption is that in terms of file size, [previous wheel] + [patch] > [a fresh wheel] almost every time, even if it’s a very small relative difference. So how often are users installing a package into an environment with an older version that can be patched[1], versus how often a package gets installed into a fresh environment? In the latter case, they end up downloading wheel + patch [ + patch ...].

The potential downside here is that the developer starts to upload patch releases because it’s easier for them, but a substantial majority of the installs just have to download everything from scratch. It would still reduce file storage on the index, but there’s an overall increase in bandwidth usage.

Maybe an addition that would help is: an index may generate a full wheel from the corresponding patch release if it decides that’s worth doing[2]. This seems like a natural flipside to the “indexes can create a patch if they want to” suggestion.


  1. or have an older version in their cache ↩︎

  2. maybe they notice that a lot of requests are for both the previous wheel and the patch ↩︎

1 Like

Dropping some bread crumbs: Chrome and ChromeOS use a binary delta patching mechanism to drastically reduce the size of the updates billions of users download: Software Updates: Courgette (and a link to the code)

Binary delta compression is the typical search term for these things.

1 Like

Without a doubt, yes, at least when only considering a single version. The calculus gets more interesting for disk space than bandwidth for only a single version, especially with frequently updated libraries. For libraries that are further along in stability and may have months to years between updates, it’s more likely a user will have the previous version already, and more likely to also help with bandwidth.

It also gets much more interesting when considering multiple successive versions which are a patch: if you have {1.0, 1.1 (patch), 1.2 (patch)} because minor versions are changing less.

So it comes down to less stored in places that cache it (this is not just pypi, but CDNs and every mirror) It could also be used in local wheel caches to make them smaller on disk if installers generate or store existing patches.

This seems reasonable, but I’d want to leave this and the other entirely up to the index to decide when it is worth it. Especially if done dynamically, this could become a space/memory/cpu/ and bandwidth tradeoff.

I would have thought to link to xdelta or HDiffPatch, but Courgette is probably a better example showing off the gains in software people are familiar with, so thank you for that.

zstd also has a patching mechanism. Given the interest that exists in having wheel be compressible with zstd in the future, I thought there’d be a benefit in using a tool already being considered for inclusion, and it was what I was considering building on top of. With that said, only a slight lean towards as a result of that, and open to other options; I haven’t compared existing options for implementation performance in enough detail yet to ensure this is the right choice, and won’t be doing that till I’m more sure that I’ve covered the design constraints different parties would need or want.

2 Likes

Minor note here: courgette has been replaced by Zucchini. I’m not familiar with those algorithms but my understanding is that those rely on decompiling PE/ELF files and diffing the instructions. The underlying goal however is that given an original file + a patch you end up with a perfect match to the file you would have downloaded from scratch.


I like that concept in general because it means that it does not complicate anything for package authors or the packaging tools. It would be up for PyPI to generate those patches, and an installer could leverage them if they already have a base line download they can use.

I think for this system to work, it would be necessary to be able to apply the patches to a structure that tools like uv, poetry, pip already have cached. For instance uv today does not cache wheels but unpacked wheels which already removes some amount of information to retaining the original wheel. Ie: one would have to run the diff on the files contained in the archive instead of the wheel. My suspicion is that this will also be easier than to do it on the container. One could use courgette/zucchini for contained .so files and basic text diff for python files.

Just to make sure the complexity of doing this well isn’t missed, let me
note that Fedora supported a package delta scheme (Delta RPMs) which,
just from watching upgrades over the years, didn’t seem to do much after
some early successes - the numbers the package manager reported at the
end up an upgrade were usually minimal (single-digit percentages), and
not infrequently, some deltas failed to apply and then the whole new
package had to be downloaded anyway. Not part of the Fedora team, just a
long-term user, so didn’t worry too much about it - in only just dawned
on me I wasn’t seeing any deltarpm trafffic any longer.

Turns out this wasn’t just anecdotal - Fedora has dropped generating
these just recently. Some thoughts justifying the change here:

https://fedoraproject.org/wiki/Changes/Drop_Delta_RPMs

and the discussion page:

Of course every situation is different but it may be worth considering a
case where it went wrong for learnings.

2 Likes

I also love the idea, but want to point out that MSP (the patch format that applies to Windows MSIs) doesn’t get anywhere near the use or love that it deserves. Even the incremental update features of MSIs[1] tend not to be used, and leads to more confusion and broken installs than anything else. So it’s hardly been a massive success there.

That said, Steam saves huge amounts of time and bandwidth by patching games rather than replacing entire files, so there are clearly some contexts where it is very beneficial. I think if those can be identified and then justified as applying to wheels, then the complexity argument can be overcome.

In the meantime, I don’t think there’s any harm in building the tooling independent of PyPI itself. For organisations maintaining entire clusters there’s certainly going to be value in binary diffing an environment once and deploying a patch, and possibly that’s going to be most efficient at the package level? This looks like an idea that can be implemented first without needing to standardise everything, and if it’s valuable, then people will beg for it to be standardised.


  1. Updating a specific file/component only if the MSI contains a newer version. ↩︎

1 Like

I’d also add Arch Linux to the list of packaging ecosystems that tried this for a while but were underwhelmed by how little bandwidth was actually saved and gave up. From mucking around locally with these binary delta tools, I’ve generally observed that even small changes in source code lead to massive deltas in the executable once the compiler/optimiser has done its work.

If we’re looking for size optimizations, I’d personally suggest doing what Fedora does with debug symbols – namely strip -ing the main binaries then banishing the debug symbols into separate optional subpackages (or would be sub-wheels in our case). I’ve seen extension modules go down an entire order of magnitude after using strip.

1 Like

Another factor to consider is that the majority of downloads are likely made from CI/CD or containerized environments with no cache or previous version installed so none of them would benefit from this.

1 Like

Games are very well-suited to this, since they’re often a huge initial download that is persistently installed for a long time, and many of the assets won’t change in a patch.

Patch releases seem most useful for wheels that include large binaries, but I wonder how much those binaries can change in a patched version. A useful experiment would be to build patches for some of the biggest wheels on PyPI (pytorch and tensorflow come to mind) and look at those diffs to see if this would improve storage and/or bandwidth.

2 Likes

On my radar, but something I think python wheels are positioned to handle well. I will have to consider this more in-depth as part of the analysis. I played around with some wheels I had on hand a few months ago, and found meaningful gains in about half the ones that had binaries, and nearly all of the ones that did not, but that was only simple testing I did for myself to determine if this was actually worth working on in spare time. I’ve now reached the point of considering the design enough to reach out and make sure I’m covering adequately what other people would need before implementing.

I’m also fine if we get to the point of me implementing it and the community doesn’t find the gains worth the added complexity. That isn’t wasted work, may still be usable as a non-standard tool in some cases, and still would serve as an exploration and documentation of approaches that have been tried before, and why they were ultimately not enough.

So please do mention more things like this where you are thinking of them, I want whatever succeeds to be as good as reasonably possible, or if it fails, to fail because as good as reasonably possible wasn’t a worthwhile improvement.

This is something I considered, but there are two problems IMO

  • Courgette/Zucchini are much more expensive to compute and harder to integrate into the existing ecosystem. If the gains are high enough, and the ecosystem is open to it, it’s worth considering, but my initial “playing around with the idea” months ago mentioned above seemed worth it with simpler options. I’ll look into it more for comparison.

  • I think this also results in more overhead, reducing the number of cases it would be useful for. This might indicate that to get the most out of this, we need a flexible patch format and multiple defined strategies within the format, along with tooling smart enough to pick a strategy intelligently. I’m willing to work through the concerns of a flexible patch format that sufficiently allows different updates to use different strategies, but I’m concerned about this part in particular:

For instance uv today does not cache wheels but unpacked wheels which already removes some amount of information to retaining the original wheel.

I’m inclined to think that discarding the original in a non-recoverable way is a speed/soundness tradeoff that may usually be correct for uv. While it may not be incompatible with one of my design goals (reproducibility at each step), I’ll really need to look into how well specified the spreading step of wheel installation is, and this may require tightening that up and possibly even coming to a consensus on things like directory hashing in python packaging, both of which are things that I did not want to predicate this improvement on.

This might need to interact with some other concerns you’ve brought up about what is safe to cache and not in other contexts.

I’d also add Arch Linux to the list of packaging ecosystems that tried this for a while but were underwhelmed by how little bandwidth was actually saved and gave up. From mucking around locally with these binary delta tools, I’ve generally observed that even small changes in source code lead to massive deltas in the executable once the compiler/optimiser has done its work.

Debian does it for package indices (not package files), but even then it’s only on by default for testing/unstable where the indices have a lot of churn, and past a few days worth of incremental patches it falls back to just downloading the entire file because the diffs in aggregate wind up being larger anyway.

For reference, I’ll note that Fedora uses zchunk for package list files. It’s a chunked format, allowing clients can only download changed parts of a file – like rsync, but doesn’t need dedicated software on the server (clients use HTTP range requests, so it’s compatible with HTTP proxies/CDNs). It only needs one file to be hosted, which decreases the load on mirrors compared to having separate “delta” files (AFAIK that was a major issue with deltaRPM).
But it’s probably no-go for PyPI: it needs a bespoke client library, and there’s some size increase. It’s suitable for frequently-changed files with lots of unchanged parts; I doubt that applies to wheels.

2 Likes

(To say the obvious: the best way to avoid downloading a lot of stuff from PyPI will be to make sure that metadata is easily accessible and correct. A ton of downloads today come from catastrophic backtracking in resolvers and them fetching the entire archive)

4 Likes

Note that PyPI supports HTTP range requests, which allows installers to only download the metadata and not the entire archive when backtracking (e.g., pip’s implementation).

1 Like