Python download stats for May 2020

steve.dower · June 17, 2020, 12:01am

As in Python download stats for March 2019, I recently did some analysis of the python.org download logs. I used better tooling this time, so I have some more interesting numbers, and I’m also happy to run more queries if people have ideas (though all I have access to is essentially the URL).

I filtered out a few obvious scrapers, but it made little difference. They showed up with a high absolute number of hits, but evenly distributed across every single file on the server. The badly behaved downloaders from last year seem to be gone.

Category	Downloads	Users	Downloads/user
Windows	15,999,056	3,941,325	4.06
Source	5,297,204	832,034	6.37
macOS	1,022,698	309,514	3.3
Sig	770,146	58,952	13.06
Docs	430,758	63,257	6.81
Other	18,984	1208	15.72
RPM	16,076	573	28.06

Definitions:

Windows any of the .exe, .msi or .zip packages, not counting sub-parts of an install (so only the 2.7 MSIs, essentially)
macOS any .dmg or .pkg file
Source any of the archives, except those containing docs
Docs those containing docs
Sig any .gpg, .asc or .md5 files
RPM some old .rpm files that are still floating around
Other anything else

Operating Systems

As with last year, this is heavily biased towards Windows, as python.org is the primary source for most users. Though this year, the Microsoft Store is an alternative - according to my dashboard, there were 226,272 downloads in the same time period. So quite small (1.4%) compared to python.org.

Versions

This chart is all downloads relating to a particular version (based on the directory name).

Version	Downloads	Users
3.9	193,347	32,902
3.8	10,173,285	2,945,632
3.7	6,449,794	1,611,958
3.6	2,207,577	475,685
2.7	2,121,831	469,376

Hope this is informative or interesting to people. I’m happy to take questions or requests for other pivots.

root-11 · June 22, 2020, 10:32am

Hi Steve,
I think the Windows bias will vanish by adding:

The 20M downloads that go via anaconda? I’m sure @teoliphant can provide the stats.
The 44M downloads that go via linux / apt? I’m not sure python.org ever sees these.
Others? All the server downloads go via Kubernetes / Docker / …?

steve.dower · June 22, 2020, 4:20pm

Sure, that’s the point: we need other data. This isn’t at attempt to encompass all the data though - someone has to publish the raw measurements (along with the biases) so that the meta-analysis can be done.

Though last I heard, Anaconda is just as biased towards Windows, so that may not “balance” it that much. Data from the Linux distros would be very interesting though!

EpicWink · June 22, 2020, 10:27pm

Maybe you should start being evil and put the Python build in the header of PIP’s request to PyPI

steve.dower · June 23, 2020, 8:31am

It is, in the user agent string at least. That’s a lot more data though, but if you pay for BigQuery then you can run the numbers against it.

Unfortunately, it’ll be much harder to separate “bot” installs (CI, deployments) from actual users, even if you have source IP addresses, so all its really going to tell you is “who has the biggest clusters and the least caching”

encukou · June 23, 2020, 2:13pm

Can’t speak for all distros, but for Fedora:

we don’t have this data; we don’t track what users install.
the package manager uses Python, so all installations of Fedora have Python installed (except some super-minimal container images). That’s 3.7 for Fedora 31, 3.8 for f32, 3.9 for f33.
would creating a VM or a container with Python count as installing Python, anyway? I install Python into a chroot(-ish environment) several times a day when I do packaging.

steve.dower · June 23, 2020, 3:26pm

Yeah, this is a tough question, because it depends on what you’re trying to find out, as well as what the person is actually doing.

For most questions, the number of unique users who have installed any version(s) is a pretty good indicator of “how many people will notice %CHANGE%”. But unique machines doesn’t reflect users, since that’ll count one person installing the same thing on 1000 machines as 1000 instead of 1, while a single Docker install won’t reflect the 1000 users who end up using it. Maybe they average out? But how can we ever know?

It’s a tough area to try and openly share data, but I tried. (Note that I never asked or suggested that anyone else should release their data, only that if people want it then I’m not the one who can provide it.)

shyam_swaroop · July 11, 2022, 12:12am

Is it possible to get these numbers for 2022? Are the download logs available for everyone?

I am creating a new python library and trying to figure out what minimum version of Python shall I target. These number will play a pivotal role in deciding things like, shall I use threading or shall I rely on asyncio etc.

steve.dower · July 14, 2022, 4:03pm

The updated numbers are in another thread, I did it in March. Because the download logs include personally identifiable information, we don’t share them publicly.

Your library should probably support all active versions listed at Download Python | Python.org and also the next version (3.11, currently in beta). Depending on your target audience, they might need some older versions as well. There’s no need to support anything already end-of-life.

shyam_swaroop · July 14, 2022, 7:28pm

Hi Steve,

Thanks for pointing me to the resource. It seems I should support 3.7 and above.