Inspired by a discussion on github on why cadquery (Python package) isn’t available through PyPI.
I’ll just quote the explanation they gave:
We’d all love to see PyPI-hosted packages for CadQuery. No one sane is opposed to that. Unfortunately, the core issues here are all on PyPI’s side: e.g.,
- Distributing large binary payloads as separate downloads
- API to get dependencies without full download
Without significant movement from the official PyPI community on those issues (which seems unlikely, because there’s no consensus as to at what the high-level solutions to those issues even are), it’s unclear whether CadQuery can meaningfully do anything here.
Full comment here.
Well that’s unfortunate. So apparently one big reason why large binaries aren’t allowed to be distributed through PyPI is that hosting is expensive. PyPI spends about 1,5 million USD on hosting alone apparently (according to the comment linked above).
So I feel like 2 challenges that PyPI has could be solved. Those challenges are:
- Spending 1,5 million USD on hosting per year (and growing), which is not sustainable regardless of the large binary distribution discussion.
- Not allowing large binaries to be distributed with Python packages, because bandwidth is limited and expensive.
So…
Why not build a torrent client into PyPI as one of the repo download channels? So: you keep PyPI the way it is (including the hosting, hard coded repo URL’s, “normal” way of downloading) but you add the option to find packages through torrent indexes and download them from within pip. So instead of (just) hosting packages PyPI could offer the option to host magnet links, torrents and/or links to other indexes/repositories of torrents that contain package(s).
This would allow:
– Users of pip to experience 0 difference in how pip is used (except now old/obscure/large data packages are more likely to still be seeded by someone somewhere and to be downloaded through a torrent which requires no extra user interaction).
– PyPI to start saving bandwidth costs, because little by little more and more packages will be downloaded through the torrent network instead of 100% from PyPI servers.
– Volunteers who know how to seed a torrent to start hosting Python packages. Just because they’re nice like that. Anybody could do that from any device. The barrier to entry for hosting python packages right now is pretty much infinitely larger than seeding a torrent? Unless I’m missing something.
– It would allow creators/maintainers of packages to seed torrents for their own packages. Including packages with huge binaries such as cadquery.
– Businesses with spare bandwidth to support Python by simply seeding package torrents.
– conda to die?
Disclaimer: I have almost no knowledge of how package management/distribution or PyPI works. Just firing off ideas here and trying to learn why this could (not) work.