Python download stats for March 2019

Periodically, I am allowed access to the CDN logs of to collect some download statistics. (For hopefully obvious reasons, these are not publicly available because of personally identifiable information, and nobody has set up an automated process to transfer and filter them into a public dataset.)

In general, these are not particularly informative, since is only the primary source of Python downloads for Windows users, but in terms of assessing the overall scale of Python usage it is one useful input. So here are a few summary tables that I extracted from the logs:

Category Sum of Hits % of Hits % of runtime downloads
Windows 23122064 59.37% 81.37%
Windows Dep 9785972 25.13%
Sources 4023290 10.33% 14.16%
macOS 1269540 3.26% 4.47%
Docs 495378 1.27%
Sigs 251525 0.65%
Grand Total 38947769 100% 100%

These have been corrected for a couple of CI systems that I deemed broken (for example, there’s one system that downloads Python 3.5.4 for macOS every few seconds, and a version of Chef that downloads Python 2.7.10 for Windows unusually frequently). The last column shows percentages of only the rows that have values - excluding downloads that don’t represent “give me a Python runtime”.

The categories:

  • Windows is a download of the main installer (either the .exe or older .msi) or the embeddable package (nuget package downloads are visible here)
  • Windows deps are optional MSIs downloaded by the installer (debug symbols, etc.)
  • Sources are any of the source packages
  • macOS are any of the .dmg or .pkg files
  • Docs are any of the documentation files (primarily the Windows .chm files)
  • Sigs are any of the .asc files

If instead of pivoting by operating system, I switch to Python versions (based on the version number directory in the URL), filter to the Windows/Sources/macOS breakdown used above, and filter to only versions that had at least 100k downloads or more, we get this:

Category Sum of Hits % of Hits
3.7.2 11605352 43.64%
3.7.3 2904509 10.92%
3.6.7 1863695 7.01%
2.7.16 1826445 6.87%
3.6.6 1110973 4.18%
3.6.8 1080209 4.06%
2.7.15 817422 3.07%
3.7.0 671576 2.53%
3.6.5 374122 1.41%
3.6.0 350711 1.32%
3.7.1 338291 1.27%
3.6.4 329948 1.24%
2.7.13 304177 1.14%
2.7.11 279775 1.05%
2.7.14 277178 1.04%
3.5.0 252586 0.95%
3.5.2 251011 0.94%
3.5.4 227611 0.86%
3.5.3 214622 0.81%
2.7 207678 0.78%
3.6.3 207612 0.78%
2.7.12 172630 0.65%
3.6.2 157494 0.59%
3.8.0 156111 0.59%
3.6.1 142205 0.53%
3.5.1 139020 0.52%
2.7.10 113903 0.43%
2.7.3 113220 0.43%
3.5.6 102990 0.39%
Grand Total 26593076 100%

So the good news is that the latest releases are getting the majority of the downloads (bearing in mind that 3.7.3 was released in the last week of March).

Finally, as a last validation step to see how often we were getting repeated requests from the same source (e.g. CI systems), I bucketed unique IP addresses. This table is the number of unique IPs for each range of request count, including 200, 300 and 400 HTTP responses. So approx. 2 million unique IPs made 10 or fewer requests during March (requests seem to come in pairs, but I didn’t figure out why this is the case).

Requests Unique IPs
<= 10 2039388
<= 20 404657
<= 30 162932
<= 40 66358
<= 50 41392
<= 60 24482
<= 70 15670
<= 80 11088
<= 90 8198
> 90 58600

The top 20 IP addresses here accounted for 4.64 million requests. A quick sanity check makes it seem like they are distributed across file types and responses (many 300 and 400 responses are included in the request count) and are more likely to be spiders than actual users. Dropping them from the download counts didn’t have a noticeable impact.

These are the most interesting results as far as I’m concerned. If anyone has any suggestions for things they’d like me to take a look at please let me know, though I’ve already gotten rid of the original reports and revoked access to the logs, so I may not be able to get them from my filtered sets.


Thanks for sharing this. Great to see bug fix versions getting more adoption. It will be interesting to see the count for 2.x after 2020 EoL. Windows also has a new channel with Windows store as a point of distribution and at some point to see how people are trying it out.

1 Like

I actually have significantly more detailed analytics on the Windows Store version, including crashes and actual usage data (number of minutes, etc.). But it’s hard to extract and synthesize that into anything useful right now. Let me pick out a few highlights:

Over the last 30 days, there have been 13.2k “acquisitions” (people clicking “Get it free”) and 5k actual installs. The US and China are the major markets and over 50% of users who gave their age were younger than 25.

We’re getting approximately 600 new users each week, but after a month only 10-15% of those are still using it.

Most crashes are coming from third-party extension modules. Unsurprisingly :slight_smile:

1 Like

Good to hear. Maybe at some point once it’s known to be stable enough to be supported the windows store link could be added to to drive adoption. It also depends on how easy it will be to help users when things go wrong where current installers leave log files at specified locations.

Thanks for pushing it forward through the 3.7.x releases :slight_smile:

1 Like

This has already been added, so it’s coming soon :slight_smile: The worst of the issues were resolved in 3.7.3, but I want to leave it as “use with caution” for the rest of 3.7.

For the most part, things don’t “go wrong”. Behind the scenes, Windows is just extracting a ZIP file, and it’s already validated all the prerequisites. So anything that doesn’t work is in our code (or assumptions about the environment, more likely) than in the installation itself. So the kinds of failures we see with the traditional installers simply don’t exist.

1 Like

From the same data, here’s subtotals for each x.y Python version:

Category Sum of Hits % of Hits
3.7.x 15,519,728 58.36%
3.6.x 5,616,969 21.12%
2.7.x 4,112,428 15.46%
3.5.x 1,187,840 4.47%
3.8.x 156,111 0.59%
Grand Total 26,593,076 100%

Thanks for posting. At least on Windows, the 3.x transition is basically done.

Can this system and any like it be blocked so we are not paying for the bandwidth?

Our bandwidth is donated and that system is a blip probably in our overall downloads.

1 Like