I’ve had a thought for allowing external hosting of files in PyPI while addressing previous concerns: link to the externally-hosted file in the package’s (simple-index) page on PyPI, but also link to the exact same file hosted by PyPI after it. The dependency installer (eg Pip) would first attempt to install the first link[1] on the HTML page provided by PyPI (which would be the external link), then fall-back to later links for the same file.
It seems Pip’s implementation will only attempt the first result currently, so I would assume there has to be buy-in from all the current dependency-installers to supports this. Correct me if I’m wrong, but it seems like Pip associates one link per candidate (ie file).
This would reduce PyPI’s file-storage bandwidth cost, and storage cost (see below). If there’s interest in this, I could cook up a PEP (or similar).
To address the prior concerns:
- If the connection fails or the download time (normalised by content-length - must be present) exceeds a threshold, fall-back to the PyPI option (optimisation at implementer’s discretion). This addresses reliability, speed and firewalls
- Externally-hosted links must come with a check-sum (computed by PyPI on upload). Addresses security
- The implementation should be wholly transparent to the end-user, so no UI/UX changes at all (perhaps a flag to disable external hosting)
- Perhaps “uploading” external file links should not be included in Twine’s features to increase the barrier to entry, not sure what this addresses
- PyPI could store the fall-back files on an intelligently-tiered file-store (I’m used to S3 terminology), where files would be moved to cheaper-to-store (but expensive-to-access) tier when not accessed for a while (frequent access would move it to a high-availability tier). Addresses cost
- PyPI could randomly check external files for existence to see whether to (temporarily?) remove the external links
- Because the file is still being uploaded to PyPI, virus checks (does PyPI do this?) and Terms of Service agreements still happen
Further concerns:
- Privacy: external hosts can log requests with IP/agent details. Perhaps PyPI (or its CDN) can route the requests to the external providers through itself (note: not redirect)
- I haven’t thought of a good way to report to external hosts why the dependency installer aborted a file-download. A CLI warning doesn’t seems like a good idea
- Backwards compatibility: old versions of installers would start downloading the externally hosted files. Perhaps the PyPI-hosted files should be listed first?
Previous discussion on external hosting:
- External hosting linked to via PyPI - Packaging - Discussions on Python.org
- What to do about GPUs? (and the built distributions that support them) - Packaging - Discussions on Python.org
-
PEP 470:
- Mailman 3 PEP 438, pip and --allow-external (was: “pip: cdecimal an externally hosted file and may be unreliable” from python-dev) - Distutils-SIG - python.org
- Mailman 3 PEP 470 Round 2 - Using Multi Index Support for External to PyPI Package File Hosting - Distutils-SIG - python.org
- Mailman 3 new PEP submission: transitioning to hosting release files on PYPI - Python-ideas - python.org
- Mailman 3 PEP 470 - Once More, with Feeling - Distutils-SIG - python.org
- Mailman 3 PEP470, backward compat is a … - Distutils-SIG - python.org
-
link ~ URL ~ internet reference ~ anchor HTML element ↩︎