Something that was surfaced in the [discussion around deletions](https://discuss….python.org/t/stop-allowing-deleting-things-from-pypi/17227/28?u=dstufft) was a concern that the quota system on PyPI, as it is currently implemented, is causing a less than ideal experience for both authors and users of PyPI. I've also gone back and read previous discussions or posts like [What to do about GPUs? (and the built distributions that support them)](https://discuss.python.org/t/what-to-do-about-gpus-and-the-built-distributions-that-support-them/7125?u=dstufft).
The problems from the maintainer side, that I have seen surfaced:
- Projects are being forced to delete older releases in order to make room for newer releases, though thankfully it is largely pre-releases currently [^1].
- Projects are avoiding uploading wheels for a variety of platforms, because they're worried about hitting the quota limits and having to ask for a quota increase and not knowing whether they'd actually get the increase [^2].
- When they do ask for a quota increase, it can take weeks or even months for the maintainer to get a reply, blocking their ability to do releases [^3].
Just to make sure that everyone is on the same page, the background of how file hosting/quotas has evolved on PyPI is roughly:
- Originally PyPI did not support file uploads at all, nor was it intended to be used as a software repository for tools to consume.
- At some point setuptools was written that started finding files to fetch from PyPI through a variety of mechanisms.
- At some other point (not sure if before or after the last one), PyPI added the ability to host files on PyPI, and as a basic sanity check the... Apache I think it was at the time, host had a default limit on the total request body size (as most servers do), and over time this eventually got increased to 60M, effectively limiting files on PyPI to no more than 60M in size.
- At some point PEP 470 removes external file hosting from PyPI, which meant that in order to have a good experience with `pip install ...` by default, projects are required to upload to PyPI unless they want to require their users to configure an additional repository.
- As part of the migration to Warehouse, we switched from having a web server fronting Warehouse that buffered the entire request body to one that let Warehouse itself handle pulling those bytes off the wire, which no more buffering meant that Warehouse itself was responsible for setting limits, and originally just hard coded the same 60M limit that PyPI originally had.
- In https://github.com/pypi/warehouse/issues/346, Richard noted that we were starting to get requests for larger files sizes for some projects, which was implemented https://github.com/pypi/warehouse/pull/655 to allow having that 60M limit changed on a per project basis.
- In https://github.com/pypi/warehouse/issues/4288 it was surfaced that PyPI's on disk size was currently larger than 2TB but we didn't have a great mechanism to show the information on what projects were involved in that, which was implemented in https://github.com/pypi/warehouse/pull/4469 [^4] to add a ``/stats/`` route that showed the top N packages and how much storage they consume.
- In https://github.com/pypi/warehouse/issues/7446 the idea of limiting the total size of a project was proposed and implemented in https://github.com/pypi/warehouse/pull/8128 and https://github.com/pypi/warehouse/pull/8129.
That brings us to where we are today.
I don't have really good information for how large PyPI has grown over time other than we're currently at 12TB and in 2018 we were at "> 2TB", but the per project quotas were implemented in 2020. It was mentioned in a [comment](https://github.com/pypa/pypi-support/issues/50#issuecomment-553495968) on Nov 13 13, 2019 that PyPI was currently at 6.5TB
Picking 10GB as our default project quota in PyPI was done with this comment:
> I grandfathered in all existing projects with Project.total_size >= 10GB. I set their limits to roughly twice their current size, minus ~20%, rounded to the nearest 10GB. My thought is that PyPI's total size is roughly doubling every year, and that the rate of growth of these should probably fall under that curve.
>
> I wouldn't expect any of the projects on https://pypi.org/stats to request total size increases in the next ~1 year. I think we can give them file size increases liberally though.
At the time, the grandfathered in projects at >= 10G was 73 total projects.
Currently our process for people to ask for increased limits is to have them post a ticket on https://github.com/pypa/pypi-support, and one of the PyPI team will come around and look into it.
I went ahead and did some looking at those requests, and what I found was:
- The oldest request in that repo goes back to Nov 13, 2019 asking for an increased file size limit [^5].
- There are a total of 383 requests in that time period, averaging to a limit request every 2.5 days since the first request.
- Limit requests are split pretty evenly between requests for increase file size limit and increased project size limit, but there is a 10% tilt towards project size.
- It appears out of 383 requests, 369 of them have been closed. Of those 369, 274 of them have been accepted, or about 75% of them, 11, or about 2% have been denied, and 19, or about 5% the user was guided towards alternative strategies to reduce their file size. The remaining ones were generally just ones where the issue was closed due to no response to asking questions [^6].
- Of the 11 that were denied, most of them were denied for the user hosting a large data file (including java jars, etc) in the project.
- Of the 19 that were guided towards alternative strategies, it was largely split between:
- Breaking the project up into sub projects, each getting its own limit [^7].
- Side loading large data files through some other fashion (e.g. a ``download()`` method).
- Removing files from the wheels (tests, docs, etc) or getting the user to try different compilation strategies or even just paying attention to their file size causing them to notice something they can adjust to reduce file size.
That's a lot of information there, but ultimately the questions for this issue are:
- Is the quota system providing value?
- Is our process for requesting an increase providing value?
- Is there anything that we can change to reduce the friction?
[^1]: This kind of flies in the face of how we typically expect PyPI to be used, as a stable archive of artifacts with deletions being rare.
[^2]: This directly hurts the consumers of Python packages, as they lose out on the ability to install from wheels on those platforms.
[^3]: Obviously this is due to the fact PyPI has no staff available to process these requests, relying on when volunteers are able/willing to do pretty tedious work going through issues.
[^4]: This was ultimately reverted, then reworked, then had more changes to it over the years, but this was the initial PR to add it.
[^5]: Since per project limits weren't added until 2020, that should mean that all of our project quota requests ended up here.
[^6]: Categorizing this was kind of lossy, I had to go through all of those issues manually and skim through them, so there very well might have been some miscategorizations in my tally.
[^7]: This feels kind of like approving the limit in spirit? If a project wants a single 20GB limit, that doesn't feel materially different to me than splitting the project into two, with two 10GB limits.