Python download stats for March 2019

Periodically, I am allowed access to the CDN logs of python.org to collect some download statistics. (For hopefully obvious reasons, these are not publicly available because of personally identifiable information, and nobody has set up an automated process to transfer and filter them into a public dataset.)

In general, these are not particularly informative, since python.org is only the primary source of Python downloads for Windows users, but in terms of assessing the overall scale of Python usage it is one useful input. So here are a few summary tables that I extracted from the logs:

Category Sum of Hits % of Hits % of runtime downloads
Windows 23122064 59.37% 81.37%
Windows Dep 9785972 25.13% —
Sources 4023290 10.33% 14.16%
macOS 1269540 3.26% 4.47%
Docs 495378 1.27% —
Sigs 251525 0.65% —
Grand Total 38947769 100% 100%

These have been corrected for a couple of CI systems that I deemed broken (for example, there’s one system that downloads Python 3.5.4 for macOS every few seconds, and a version of Chef that downloads Python 2.7.10 for Windows unusually frequently). The last column shows percentages of only the rows that have values - excluding downloads that don’t represent “give me a Python runtime”.

The categories:

  • Windows is a download of the main installer (either the .exe or older .msi) or the embeddable package (nuget package downloads are visible here)
  • Windows deps are optional MSIs downloaded by the installer (debug symbols, etc.)
  • Sources are any of the source packages
  • macOS are any of the .dmg or .pkg files
  • Docs are any of the documentation files (primarily the Windows .chm files)
  • Sigs are any of the .asc files

If instead of pivoting by operating system, I switch to Python versions (based on the version number directory in the URL), filter to the Windows/Sources/macOS breakdown used above, and filter to only versions that had at least 100k downloads or more, we get this:

Category Sum of Hits % of Hits
3.7.2 11605352 43.64%
3.7.3 2904509 10.92%
3.6.7 1863695 7.01%
2.7.16 1826445 6.87%
3.6.6 1110973 4.18%
3.6.8 1080209 4.06%
2.7.15 817422 3.07%
3.7.0 671576 2.53%
3.6.5 374122 1.41%
3.6.0 350711 1.32%
3.7.1 338291 1.27%
3.6.4 329948 1.24%
2.7.13 304177 1.14%
2.7.11 279775 1.05%
2.7.14 277178 1.04%
3.5.0 252586 0.95%
3.5.2 251011 0.94%
3.5.4 227611 0.86%
3.5.3 214622 0.81%
2.7 207678 0.78%
3.6.3 207612 0.78%
2.7.12 172630 0.65%
3.6.2 157494 0.59%
3.8.0 156111 0.59%
3.6.1 142205 0.53%
3.5.1 139020 0.52%
2.7.10 113903 0.43%
2.7.3 113220 0.43%
3.5.6 102990 0.39%
Grand Total 26593076 100%

So the good news is that the latest releases are getting the majority of the downloads (bearing in mind that 3.7.3 was released in the last week of March).


Finally, as a last validation step to see how often we were getting repeated requests from the same source (e.g. CI systems), I bucketed unique IP addresses. This table is the number of unique IPs for each range of request count, including 200, 300 and 400 HTTP responses. So approx. 2 million unique IPs made 10 or fewer requests during March (requests seem to come in pairs, but I didn’t figure out why this is the case).

Requests Unique IPs
<= 10 2039388
<= 20 404657
<= 30 162932
<= 40 66358
<= 50 41392
<= 60 24482
<= 70 15670
<= 80 11088
<= 90 8198
> 90 58600

The top 20 IP addresses here accounted for 4.64 million requests. A quick sanity check makes it seem like they are distributed across file types and responses (many 300 and 400 responses are included in the request count) and are more likely to be spiders than actual users. Dropping them from the download counts didn’t have a noticeable impact.


These are the most interesting results as far as I’m concerned. If anyone has any suggestions for things they’d like me to take a look at please let me know, though I’ve already gotten rid of the original reports and revoked access to the logs, so I may not be able to get them from my filtered sets.

8 Likes

Thanks for sharing this. Great to see bug fix versions getting more adoption. It will be interesting to see the count for 2.x after 2020 EoL. Windows also has a new channel with Windows store as a point of distribution and at some point to see how people are trying it out.

1 Like

I actually have significantly more detailed analytics on the Windows Store version, including crashes and actual usage data (number of minutes, etc.). But it’s hard to extract and synthesize that into anything useful right now. Let me pick out a few highlights:

Over the last 30 days, there have been 13.2k “acquisitions” (people clicking “Get it free”) and 5k actual installs. The US and China are the major markets and over 50% of users who gave their age were younger than 25.

We’re getting approximately 600 new users each week, but after a month only 10-15% of those are still using it.

Most crashes are coming from third-party extension modules. Unsurprisingly :slight_smile:

1 Like

Good to hear. Maybe at some point once it’s known to be stable enough to be supported the windows store link could be added to python.org/downloads to drive adoption. It also depends on how easy it will be to help users when things go wrong where current installers leave log files at specified locations.

Thanks for pushing it forward through the 3.7.x releases :slight_smile:

1 Like

This has already been added, so it’s coming soon :slight_smile: The worst of the issues were resolved in 3.7.3, but I want to leave it as “use with caution” for the rest of 3.7.

For the most part, things don’t “go wrong”. Behind the scenes, Windows is just extracting a ZIP file, and it’s already validated all the prerequisites. So anything that doesn’t work is in our code (or assumptions about the environment, more likely) than in the installation itself. So the kinds of failures we see with the traditional installers simply don’t exist.

1 Like

From the same data, here’s subtotals for each x.y Python version:

Category Sum of Hits % of Hits
3.7.x 15,519,728 58.36%
3.6.x 5,616,969 21.12%
2.7.x 4,112,428 15.46%
3.5.x 1,187,840 4.47%
3.8.x 156,111 0.59%
Grand Total 26,593,076 100%
6 Likes

Thanks for posting. At least on Windows, the 3.x transition is basically done.

Can this system and any like it be blocked so we are not paying for the bandwidth?

Our bandwidth is donated and that system is a blip probably in our overall downloads.

1 Like

@hugovk: I like your download statistics for each x.y Python version!

Are those statistics available somewhere for (a) the running last 30 days or (b) for January 2020 or © for all of 2019?

@steve.dower I like this data a lot! I might have missed it, but for which timeframe is it? Is it possible to get this for the last 30 days (running) on a website? Or to get it for all of January 2020?

Guessing it’s for the month before whenever I posted it. I did the analysis manually, automating it is not something I have time for, but you could propose it to the PSF to get money to pay someone.

My stats were using the data Steve shared.

Are those statistics available somewhere for (a) the running last 30 days or (b) for January 2020 or © for all of 2019?

Not for those download stats of python.org, but download stats of packages from pypi.org are:

a) yes
b) yes
c) yes (but need to pay or use free quota to go beyond last 6 months)

https://pypistats.org/packages/\_\_all__ has charts for the last 6 months like these:

That site also has an API, and I made a command-line client for it. For example:

$ pip install -U pypistats
...
$ pypistats python_minor __all__ --start-date 2020-01-13
category percent downloads
2.7 34.43% 1,444,671,160
3.6 26.07% 1,093,923,501
3.7 23.81% 998,942,830
3.5 8.50% 356,753,461
3.8 3.26% 136,918,089
2.6 1.51% 63,559,345
null 1.32% 55,366,210
3.4 1.07% 44,716,725
3.9 0.01% 572,262
3.3 0.00% 71,529
3.2 0.00% 4,969
2.4 0.00% 560
2.8 0.00% 152
2.5 0.00% 100
3.1 0.00% 76
3.0 0.00% 3
Total 4,195,500,972

Date range: 2020-01-13 - 2020-02-12

$ pypistats python_minor __all__ --last-month
category percent downloads
2.7 37.00% 1,479,189,213
3.6 25.69% 1,027,020,179
3.7 22.51% 899,664,628
3.5 8.25% 329,799,889
3.8 2.75% 109,931,029
null 1.42% 56,817,109
2.6 1.23% 48,970,481
3.4 1.14% 45,527,500
3.9 0.01% 473,160
3.3 0.00% 74,501
3.2 0.00% 4,253
2.4 0.00% 285
2.8 0.00% 177
3.1 0.00% 41
3.0 0.00% 5
2.5 0.00% 4
4.0 0.00% 1
Total 3,997,472,455

Date range: 2020-01-01 - 2020-01-31

$ pypistats python_minor __all__
category percent downloads
2.7 41.31% 9,004,447,022
3.6 24.72% 5,387,271,581
3.7 19.40% 4,228,431,473
3.5 8.83% 1,924,906,894
3.4 1.68% 365,374,109
3.8 1.42% 309,993,271
2.6 1.36% 297,151,432
null 1.27% 276,260,564
3.9 0.00% 1,067,163
3.3 0.00% 413,969
3.2 0.00% 36,813
2.4 0.00% 5,175
2.8 0.00% 4,199
2.5 0.00% 1,307
3.1 0.00% 621
3.10 0.00% 22
3.0 0.00% 11
2.1 0.00% 2
4.0 0.00% 1
Total 21,795,365,629

Date range: 2019-08-16 - 2020-02-12

All this data comes from BigQuery, where you make other types of query. You need an API key, and get free monthly. pypinfo can be helpful for making queries.

Finally, I’ve plotted some historical data at Python version share over time, 6 - DEV Community.

2 Likes

And if anyone is wondering why Python 4.0 and 3.10 are showing up in those stats, Anthony Sottile has a testing project that looks for bugs arising from the switch to a 2-digit minor version number (3.10), and also did some related experiments to see which projects break simply from changing the version number to 4.0 without making any other changes.

And Anthony’s Flake8-2020 plugin can help find bugs that will occur in Python 3.10 or 4.0:

1 Like

It will be interesting to see how many CI runs go red the day the 3.9 maintenance branch gets forked and master becomes 3.10a0 (that’s only a couple of months away now)

Adding flake8-2020 to CIs would be good prep.

I’ve been fixing some for the most downloaded packages.

And not many projects’ CIs test on pre-release Python versions, but it’ll be especially important to do so for 3.10.

The first thing that might break is the CI YAML config itself. You need to include quotes around "3.10", otherwise it’ll be interpreted as a float as 3.1!

For example:

matrix:
  include:
    - python: 3.1
      env: comment="3.1 as float"
    - python: 3.10
      env: comment="3.10 as float"
    - python: "3.1"
      env: comment="3.1 as string"
    - python: "3.10"
      env: comment="3.10 as string"