Fallback links on PyPI for the same file

I’ve had a thought for allowing external hosting of files in PyPI while addressing previous concerns: link to the externally-hosted file in the package’s (simple-index) page on PyPI, but also link to the exact same file hosted by PyPI after it. The dependency installer (eg Pip) would first attempt to install the first link[1] on the HTML page provided by PyPI (which would be the external link), then fall-back to later links for the same file.

It seems Pip’s implementation will only attempt the first result currently, so I would assume there has to be buy-in from all the current dependency-installers to supports this. Correct me if I’m wrong, but it seems like Pip associates one link per candidate (ie file).

This would reduce PyPI’s file-storage bandwidth cost, and storage cost (see below). If there’s interest in this, I could cook up a PEP (or similar).

To address the prior concerns:

  • If the connection fails or the download time (normalised by content-length - must be present) exceeds a threshold, fall-back to the PyPI option (optimisation at implementer’s discretion). This addresses reliability, speed and firewalls
  • Externally-hosted links must come with a check-sum (computed by PyPI on upload). Addresses security
  • The implementation should be wholly transparent to the end-user, so no UI/UX changes at all (perhaps a flag to disable external hosting)
  • Perhaps “uploading” external file links should not be included in Twine’s features to increase the barrier to entry, not sure what this addresses
  • PyPI could store the fall-back files on an intelligently-tiered file-store (I’m used to S3 terminology), where files would be moved to cheaper-to-store (but expensive-to-access) tier when not accessed for a while (frequent access would move it to a high-availability tier). Addresses cost
  • PyPI could randomly check external files for existence to see whether to (temporarily?) remove the external links
  • Because the file is still being uploaded to PyPI, virus checks (does PyPI do this?) and Terms of Service agreements still happen

Further concerns:

  • Privacy: external hosts can log requests with IP/agent details. Perhaps PyPI (or its CDN) can route the requests to the external providers through itself (note: not redirect)
  • I haven’t thought of a good way to report to external hosts why the dependency installer aborted a file-download. A CLI warning doesn’t seems like a good idea
  • Backwards compatibility: old versions of installers would start downloading the externally hosted files. Perhaps the PyPI-hosted files should be listed first?

Previous discussion on external hosting:

  1. link ~ URL ~ internet reference ~ anchor HTML element ↩︎

I think privacy would be an important concern, the eternal link will need to be proxied. Come to think of it, if the link is always proxied, maybe we don’t even need the API and client side code to change at all, but simply point them to the proxied link, and do the fall back there.

Downloading files through the index server would be a drastic increase in the index server’s utilisation (compute and network bandwidth). Is there some solution using the CDN?