Is there a way of getting more detailed PyPI statistics

What I’m specifically interested in is ARM vs x86 vs x64 (for Windows and Linux), and musl vs manylinux.

Essentially I’m trying to make a judgement on which platforms get a fully compiled wheel vs Stable ABI wheel vs pure Python non-compiled wheel. pypistats.org (and similar sites) don’t seem to give this level of detail, but maybe I’m missing something.

1 Like

If the information is publicly available, it’s via the BigQuery Dataset.

1 Like

pypinfo has a good how-to guide for getting set up with BigQuery. It feels scarier than it really is because Google requires you to enable billing and put your bank details in but there’s a fairly wide free tier so it’s easy to avoid actually paying anything.

The database contains raw download URLs which you can parse to get wheel platform tags from and it contains libc variant and version and architecture for whatever downloaded said wheel so it should have what you need.

1 Like

Thanks a lot - pypinfo is just about simple enough for a poor innocent C/C++ programmer who doesn’t understand databases.

FWIW here’s what I’ve learned about Cython for the last month:

| system_name | cpu     | download_count |
| ----------- | ------- | -------------- |
| Linux       | x86_64  |     50,027,762 |
| Linux       | aarch64 |      8,905,060 |
| Windows     | AMD64   |      2,691,242 |
| Darwin      | arm64   |      1,089,903 |
| Darwin      | x86_64  |        376,277 |
| FreeBSD     | amd64   |        317,051 |
| Linux       | armv7l  |        261,729 |
| Linux       | i686    |         94,708 |
| Linux       | ppc64le |         64,604 |
| Windows     | ARM64   |         62,672 |
| Total       |         |     63,891,008 |

I didn’t manage to get exact numbers more musl vs manylinux but it looks to be 30x smaller.

Those numbers will be pretty much entirely CI downloads. That does not make them irrelevant but they are not necessarily reflective of what you might think of as “users”.

1 Like

Very true about it mostly being CI users.

From our point of view the point of providing a binary wheel is to provide an optimized version which will run faster and save CPU time on average. (And a Stable ABI wheel achieves the same thing but is a little less optimized). So nothing is going to stop working as a result of this, but a few things may get downgraded in a “speed vs space of PyPI” tradeoff.

From that point of view I don’t think there’s much difference between CI users and “real” users.

Personally I was surprised how common Linux aarch64 was, a little surprised how quickly MacOS x86_64 has gone away, relieved that no-one’s really using x86 32 bit, and a little surprised that Windows ARM64 wasn’t higher.

1 Like

CI users will download the binary once, run it once, and then throw it away which does change the calculations for some things. I’m not sure how to quantify all the different costs here but as an example if the CI user had to build Cython from source they would probably burn more CPU time building Cython itself than using it to do anything so they would be better off using a pure Python version of Cython.