Packaging and Python 2

I’m not entirely convinced that PyPI download metrics are the best gauge of language version popularity. Commercial users typically download from internal mirrors, Linux distros download once from PyPI, and I suspect that other redistributors have similar download patterns. Those all skew usage metrics since PyPI knows nothing about redistributor popularity. Maybe it’s the best we have, but I would caution to put too much faith in those numbers.

We’re not really looking at a general “language popularity” here. If a million people are using Python 3 on Debian and installing those modules using apt-get and a 1000 people are using Python 2.7 with pip to talk to PyPI (but zero people are using Python 3 for that). Then the fact that Python 3 is more “popular” in that hypothetical doesn’t matter at all to pip, because those Python 3 users are not pip’s users. Likewise with other redistributors like Conda or the like.

Those numbers do matter for the build backends, like setuptools.

The internal mirror thing does effect it, but I suspect not to the degree that people expect. I have had numerous commercial companies (large and small, including F100s) more or less admit to me they’re pulling straight from PyPI. I doubt that it’s going to be changing the resulting numbers by huge amounts.

Agreed. However, there are significant factors that probably skew PyPI download numbers - pip’s cache, universal wheels, etc. So while I agree that the specific factors @barry mentioned may be less important for determining pip’s usage profile, I still maintain that even though download numbers are the best we have, they should be treated with caution.

We do ultimately have to make a decision, though - even if we had no usable data to base it on, that would remain true.

What actual force any decision we make actually has remains a question, too. We can say we support Python 2, but without volunteer effort to deliver that support, it makes little difference in practice… :man_shrugging:

1 Like

FWIW, I started working on recording metrics for requests to /simple/*/ as well as file downloads. That should change a little bit of the “issues” (I hesitate to call them full fledged issues since they’re just constraints not real issues).

  • Pip’s caching is less of an issue. By that, I mean some versions of pip do cache access to /simple/*/, but not the latest version, and even the ones that did were only for 10 minutes so tox usage is maybe effected, in some cases, but the general case it isn’t, and not at all in the latest pip.
    • This is true for python2 -m pip install ... && python3 -m pip install ... usage on the latest pip.
  • pip install --upgrade will be tracked, even if the latest version is installed.
  • Mirrors, local or otherwise are still completely ignored.
  • Because we’re tracking hits to /simple/*/ we’re not tracking actual users but API requests. It’s entirely possible that 100% of the usage is one person sitting there spamming pip install --upgrade in a loop.
1 Like

It’s actually common to install packages using pip in a Conda environment, if those are not packaged in Anaconda or if the version is younger in PyPI.

@pzwang might have more detailed numbers, but IIRC Python 3 is already dominant in Conda usage.

Sure, and those people will be reflected in PyPI’s statistics then.

Heh, you’re right indeed.

Don’t PyPy and other VM users also use pip and PyPI?

1 Like

Yea. PyPI is mostly an opaque file host, so PyPy and others can use it as well for packages that support it.

I believe PyPy is intending to support 2.7 for a very long time? Probably Jython as well? Questions around when and how packaging related tooling could drop 2.7 support should at least look beyond CPython for transition planning purposes when the tooling gets used with those VMs.

1 Like

The numbers were looking at don’t differentiate betwee 2.7 CPython and 2.7 PyPy (well they can but the query I used doesn’t).

1 Like

I’ve only just found this discussion, but yes, PyPy has no Py2 EOL plans currently (and I can say for us $WORK-wise, we have no real plans to move to PyPy3 until it is a bit more mature, and until we run extensive benchmarking showing performance is close to equal, so we’re certainly at least a year or two away).

Obviously those of us who use it are still using pip today, so continuing py2 support is certainly in my interests, but I obviously understand the desire to drop it from a maintenance perspective…

1 Like

Here are some Python 2 vs 3 stats from Anaconda package servers.

Over last year:

  • Percent of users that downloaded Python 3 pkgs fraction rose from 78% to 87%
  • Python 2 fraction fell from 28% to 18%

In terms of Python 3 sub-versions:

  • Python 3.5: 9% (falling gradually)
  • Python 3.6: 58% (falling moderately)
  • Python 3.7: 31% (rising quickly, 8% gain/month)

Here is a web page with interactive graphs showing the above data and more: http://pwang.io/report_2018_10.html

8 Likes

Upon reflection, I should issue some caveats about this data:

  1. Our method of inferring unique users is not perfect, but it’s as good as we can do while observing things like GDPR and general data-privacy hygiene
  2. Anaconda’s user base skews towards data science & machine learning use cases, more than e.g. embedded systems or scripting within 3rd party apps (although conda is used as the pkg manager in some apps). Thus, our users on average will tend to be more students, individual practitioners, and data science teams with less legacy code base dragging them towards Python 2.
2 Likes

Ok, so see New PyPI Statistics - Simple API Requests but as of last night we’re now requiring accesses to /simple/{project}/ into a different BigQuery table, which will let us independently query that data.

Just as a reminder the practical differences here are:

  • The last version and many older versions of pip do not cache these pages, and the versions that do cache them only cached them for a maximum of 10 minutes.
  • pip install -U requests && pip install -U requests will show as two rows in this table (unless you’re using one of the aforementioned pip versions and you don’t put a 11 minute sleep in there).
    • This also means that python2 -m pip install ... && python3 -m pip install ... will correctly register as two events.

And the caveats that remain the same:

  • If someone is using a mirror, local or otherwise, we have no insight into their data.
  • We’re tracking HTTP requests, not users. It’s possible that a single large entity could be responsible for 90% of the traffic and we’d have no way of knowing.
    • This also means we don’t know why someone is installing the project, it could be CI or it could be real users installing software to use.

This data is very tentative right now, since we have less than 24 hours of data, however the current % of Py3 events is 33%, which is inline with the download event based numbers. Which seems to suggest that the caveats that got eliminated are not causing skewed data (it is possible of course that the remaining caveats are skewing it in some way!).

3 Likes

Even just for this point, it’s worth the effort. Thanks for doing this!

1 Like

As another data point, the 2018 Python developer survey results where just released, which includes self-reported adoption rates for Python 3.

1 Like

Going through the thread again now, I’ll try to summarize things.

Here’s a bunch of things I think we agree on:

  • What dropping support for Python 2 would mean
    • for pip: not being able to install/uninstall to environments on Python 2.
      • we could implement the ability to modify environments pip isn’t running in, which is a good idea for a lot more reasons than just Py2/3 code (comment)
    • for setuptools: not being able to build packages when run on Python 2.
  • pip, setuptools will be among the last packages to drop Python 2 support in the ecosystem
    • We don’t want to be maintaining support for the full tail of Python 2 packages.

Looking at what we discussed about the 2 broadest questions raised here:

i) How to decide when to drop support?

  1. Use metrics from PyPI (file downloads and requests to the simple API) to determine when pip support is dropped.
    • Question: What %age do we drop support at?
  2. Drop it on a certain date.

ISTM that we will stick with 1 here.

ii) Who would do the work of maintaining support till support is dropped?

This has been the most debated topic here and the one I don’t think we reached consensus on.

  1. PyPA members (i.e. volunteers).
    • Cost of maintaining Python 2 support.
      • “community support” – i.e. maintainers won’t do anything for Py2-only issues, unless someone files a PR (or support is dropped).
  2. Vendors
    • LTS branch
    • Factors affecting whether they’d fund/contribute work

Other things that were mentioned that I think are worth surfacing here:

  • Python 3 packages becoming the default output of build systems (comment)
    • “At some point, package maintainers will start taking for granted that python 3 is the only python.”
    • “[snip] PyPI works best remaining relatively neutral about the files it’s hosting.”
  • pip should learn to better “identify” CI contexts, annd add this info to the User Agent (comment)
  • (more?)

As an aside, I might have missed an important point somewhere here so please feel free to point it out.


setuptools has not been directly discussed much. There are different trade-offs there vs pip but I expect that it’ll be similar to what overall strategy we decide upon.

1 Like

Specifically to pip, what are the consequences if it simply starts pushing releases with requires-python >= 3.0 at some point? pip is already in a somewhat usable state, and as long as packaging specs keep backward compatibility (I believe they need to anyway?), old versions of pip would still continue to do what they need to do for people stuck on Python 2. This would create a similar effect to Python 2.7 feature freeze, creating incentives for people to migrate.

4 Likes

@dstufft covered that in his comment here (and follow ups): Packaging and Python 2

I agree with his assessment, that this would be similar to bifurcating the user base if we do it too early and this does add a burden to any new specifications we write.

If we decide that we don’t care about Python 2 when writing new specifications, it would limit adoption of said specifications.

Doing a python_requires change is how pip would eventually drop support.

2 Likes