Metrics for quantifying Python 2 usage

Splitting this out from Packaging and Python 2, since essentially, that’s the only question we haven’t really decided upon in that thread.

What metrics should we use to quantify usage of Python 2 within the Python community (with a specific focus on figuring out when to drop support for them from Python Packaging tooling)?

We know that PyPI downloads can be easily skewed by a single group of systems or by one person deciding to skew them on day (don’t do that). :slight_smile:

WRT specifically pip and that topic of dropping Python 2 support, if users are using an old version of pip w/ Python 2, it does not matter if we maintain support for Python 2 in the latest pip. Based on this logic, I figured it might help to see what the numbers look like, for downloads from PyPI, by Python version, using pip >= 10.0, in the past 10 weeks. Here they are:

Python Version Downloads
2.7 1635783600
3.6 1287708367
3.7 611274139
3.5 434399441
3.4 41893671
3.8 1902828
3.9 59118
3.3 21658
2.8 200
2.7rc1 1
3.10 1

I don’t know why there’s 1 download, from the future, with 3.10. :slight_smile:

Python 2.7 is ~40.75% of the downloads (and the most downloads for single version), for pip versions released in the past 2 years.


I’ve compiled these numbers, from week-wise data that I queried, from BigQuery. The query is:

SELECT
  STRFTIME_UTC_USEC(timestamp, "%Y-%W") AS year_week,
  details.installer.version,
  REGEXP_EXTRACT(details.python, r"^([^\.]+\.[^\.]+)") AS python_version,
  COUNT(*) AS count
FROM
  TABLE_DATE_RANGE(
    [the-psf:pypi.downloads],
    DATE_ADD(CURRENT_TIMESTAMP(), -11, "week"),
    DATE_ADD(CURRENT_TIMESTAMP(), -1, "week")
  )
WHERE
  details.installer.name = "pip" AND
  details.installer.version LIKE "1_.%"
GROUP BY
  year_week,
  details.installer.version,
  python_version
ORDER BY
  year_week DESC,
  count DESC

The result I got, is at: https://pastebin.com/pUJvysHq.


@methane (super) helpfully figured out that AWS had a script that was downgrading to an old version (6.1.1, via <7.0.0) of pip, which was skewing our download numbers. That’s what prompted this idea.

Same numbers with percentages:

Python Version Downloads Percent
2.7 1 635 783 600 40.76%
3.6 1 287 708 367 32.09%
3.7 611 274 139 15.23%
3.5 434 399 441 10.82%
3.4 41 893 671 1.04%
3.8 1 902 828 0.05%
3.9 59 118 0.00%
3.3 21 658 0.00%
2.8 200 0.00%
2.7rc1 1 0.00%
3.1 1 0.00%
Total 4 013 043 024 100%
1 Like

Even though excluding awscli-cwlogs, download number of awscli is still huge.

AWS provides bundled installer for awscli.
https://docs.aws.amazon.com/cli/latest/userguide/install-bundle.html

I think the bundled installer should be recommended over pip install.

2 Likes

Yes. That said, it’s not just excluding cwlogs, it’s excluding folks who aren’t even using a new-ish pip version and there’s no point in keeping Python 2 support in the latest pip, when very few users on the latest pip use Python 2.

That’s likely not the case with Python 2, today and these numbers are skewed just like all our other download numbers (we might just have a heterogeneous cluster somewhere downloading 500 000 000 Python 2 packages but there’s no way for us to know)

None the less, I don’t think we’d stick with just one metric anyway so let’s see if anyone else has any ideas on how we could get better numbers.