Ok, so see New PyPI Statistics - Simple API Requests but as of last night we’re now requiring accesses to /simple/{project}/
into a different BigQuery table, which will let us independently query that data.
Just as a reminder the practical differences here are:
- The last version and many older versions of pip do not cache these pages, and the versions that do cache them only cached them for a maximum of 10 minutes.
-
pip install -U requests && pip install -U requests
will show as two rows in this table (unless you’re using one of the aforementioned pip versions and you don’t put a 11 minute sleep in there).- This also means that
python2 -m pip install ... && python3 -m pip install ...
will correctly register as two events.
- This also means that
And the caveats that remain the same:
- If someone is using a mirror, local or otherwise, we have no insight into their data.
- We’re tracking HTTP requests, not users. It’s possible that a single large entity could be responsible for 90% of the traffic and we’d have no way of knowing.
- This also means we don’t know why someone is installing the project, it could be CI or it could be real users installing software to use.
This data is very tentative right now, since we have less than 24 hours of data, however the current % of Py3 events is 33%, which is inline with the download event based numbers. Which seems to suggest that the caveats that got eliminated are not causing skewed data (it is possible of course that the remaining caveats are skewing it in some way!).