I’ve had a thought for allowing external hosting of files in PyPI while addressing previous concerns: link to the externally-hosted file in the package’s (simple-index) page on PyPI, but also link to the exact same file hosted by PyPI after it. The dependency installer (eg Pip) would first attempt to install the first link on the HTML page provided by PyPI (which would be the external link), then fall-back to later links for the same file.
It seems Pip’s implementation will only attempt the first result currently, so I would assume there has to be buy-in from all the current dependency-installers to supports this. Correct me if I’m wrong, but it seems like Pip associates one link per candidate (ie file).
This would reduce PyPI’s file-storage bandwidth cost, and storage cost (see below). If there’s interest in this, I could cook up a PEP (or similar).
To address the prior concerns:
- If the connection fails or the download time (normalised by content-length - must be present) exceeds a threshold, fall-back to the PyPI option (optimisation at implementer’s discretion). This addresses reliability, speed and firewalls
- Externally-hosted links must come with a check-sum (computed by PyPI on upload). Addresses security
- The implementation should be wholly transparent to the end-user, so no UI/UX changes at all (perhaps a flag to disable external hosting)
- Perhaps “uploading” external file links should not be included in Twine’s features to increase the barrier to entry, not sure what this addresses
- PyPI could store the fall-back files on an intelligently-tiered file-store (I’m used to S3 terminology), where files would be moved to cheaper-to-store (but expensive-to-access) tier when not accessed for a while (frequent access would move it to a high-availability tier). Addresses cost
- PyPI could randomly check external files for existence to see whether to (temporarily?) remove the external links
- Because the file is still being uploaded to PyPI, virus checks (does PyPI do this?) and Terms of Service agreements still happen
- Privacy: external hosts can log requests with IP/agent details. Perhaps PyPI (or its CDN) can route the requests to the external providers through itself (note: not redirect)
- I haven’t thought of a good way to report to external hosts why the dependency installer aborted a file-download. A CLI warning doesn’t seems like a good idea
- Backwards compatibility: old versions of installers would start downloading the externally hosted files. Perhaps the PyPI-hosted files should be listed first?
Previous discussion on external hosting:
- External hosting linked to via PyPI - Packaging - Discussions on Python.org
- What to do about GPUs? (and the built distributions that support them) - Packaging - Discussions on Python.org
- Mailman 3 PEP 438, pip and --allow-external (was: “pip: cdecimal an externally hosted file and may be unreliable” from python-dev) - Distutils-SIG - python.org
- Mailman 3 PEP 470 Round 2 - Using Multi Index Support for External to PyPI Package File Hosting - Distutils-SIG - python.org
- Mailman 3 new PEP submission: transitioning to hosting release files on PYPI - Python-ideas - python.org
- Mailman 3 PEP 470 - Once More, with Feeling - Distutils-SIG - python.org
- Mailman 3 PEP470, backward compat is a … - Distutils-SIG - python.org
link ~ URL ~ internet reference ~ anchor HTML element ↩︎