Packaging and Python 2

(Peter Wang) #81

Upon reflection, I should issue some caveats about this data:

  1. Our method of inferring unique users is not perfect, but it’s as good as we can do while observing things like GDPR and general data-privacy hygiene
  2. Anaconda’s user base skews towards data science & machine learning use cases, more than e.g. embedded systems or scripting within 3rd party apps (although conda is used as the pkg manager in some apps). Thus, our users on average will tend to be more students, individual practitioners, and data science teams with less legacy code base dragging them towards Python 2.

(Donald Stufft) #82

Ok, so see New PyPI Statistics - Simple API Requests but as of last night we’re now requiring accesses to /simple/{project}/ into a different BigQuery table, which will let us independently query that data.

Just as a reminder the practical differences here are:

  • The last version and many older versions of pip do not cache these pages, and the versions that do cache them only cached them for a maximum of 10 minutes.
  • pip install -U requests && pip install -U requests will show as two rows in this table (unless you’re using one of the aforementioned pip versions and you don’t put a 11 minute sleep in there).
    • This also means that python2 -m pip install ... && python3 -m pip install ... will correctly register as two events.

And the caveats that remain the same:

  • If someone is using a mirror, local or otherwise, we have no insight into their data.
  • We’re tracking HTTP requests, not users. It’s possible that a single large entity could be responsible for 90% of the traffic and we’d have no way of knowing.
    • This also means we don’t know why someone is installing the project, it could be CI or it could be real users installing software to use.

This data is very tentative right now, since we have less than 24 hours of data, however the current % of Py3 events is 33%, which is inline with the download event based numbers. Which seems to suggest that the caveats that got eliminated are not causing skewed data (it is possible of course that the remaining caveats are skewing it in some way!).

(Steve Dower) #83

Even just for this point, it’s worth the effort. Thanks for doing this!

(Barry Warsaw) #84

As another data point, the 2018 Python developer survey results where just released, which includes self-reported adoption rates for Python 3.

(Pradyun Gedam) #85

Going through the thread again now, I’ll try to summarize things.

Here’s a bunch of things I think we agree on:

  • What dropping support for Python 2 would mean
    • for pip: not being able to install/uninstall to environments on Python 2.
      • we could implement the ability to modify environments pip isn’t running in, which is a good idea for a lot more reasons than just Py2/3 code (comment)
    • for setuptools: not being able to build packages when run on Python 2.
  • pip, setuptools will be among the last packages to drop Python 2 support in the ecosystem
    • We don’t want to be maintaining support for the full tail of Python 2 packages.

Looking at what we discussed about the 2 broadest questions raised here:

i) How to decide when to drop support?

  1. Use metrics from PyPI (file downloads and requests to the simple API) to determine when pip support is dropped.
    • Question: What %age do we drop support at?
  2. Drop it on a certain date.

ISTM that we will stick with 1 here.

ii) Who would do the work of maintaining support till support is dropped?

This has been the most debated topic here and the one I don’t think we reached consensus on.

  1. PyPA members (i.e. volunteers).
    • Cost of maintaining Python 2 support.
      • “community support” – i.e. maintainers won’t do anything for Py2-only issues, unless someone files a PR (or support is dropped).
  2. Vendors
    • LTS branch
    • Factors affecting whether they’d fund/contribute work

Other things that were mentioned that I think are worth surfacing here:

  • Python 3 packages becoming the default output of build systems (comment)
    • “At some point, package maintainers will start taking for granted that python 3 is the only python.”
    • “[snip] PyPI works best remaining relatively neutral about the files it’s hosting.”
  • pip should learn to better “identify” CI contexts, annd add this info to the User Agent (comment)
  • (more?)

As an aside, I might have missed an important point somewhere here so please feel free to point it out.

setuptools has not been directly discussed much. There are different trade-offs there vs pip but I expect that it’ll be similar to what overall strategy we decide upon.

(Tzu-ping Chung) #86

Specifically to pip, what are the consequences if it simply starts pushing releases with requires-python >= 3.0 at some point? pip is already in a somewhat usable state, and as long as packaging specs keep backward compatibility (I believe they need to anyway?), old versions of pip would still continue to do what they need to do for people stuck on Python 2. This would create a similar effect to Python 2.7 feature freeze, creating incentives for people to migrate.

(Pradyun Gedam) #87

@dstufft covered that in his comment here (and follow ups): Packaging and Python 2

I agree with his assessment, that this would be similar to bifurcating the user base if we do it too early and this does add a burden to any new specifications we write.

If we decide that we don’t care about Python 2 when writing new specifications, it would limit adoption of said specifications.

Doing a python_requires change is how pip would eventually drop support.

(Brett Cannon) #88

As @dstufft pointed out, it means that any new packaging innovations like pyproject.toml will not reach Python 2 packages and that could cause their adoption to drag on.

(Donald Stufft) #89

Yea, specifically it means that any project that wants to support Python 2, even if they’re primarily used for Python 3, cannot depend on any new feature in a Py3 only pip (or any other point of the toolchain). A lot of the features we’re adding are the kinds of features where a single project adopting them is cool, but where the real power comes from is when we can move large swathes of the projects over to the new feature relatively quickly.

Thus I think that dropping support for Python2, would effectively pin a lot of packages to never being able to update until they also drop support for Py2, which would greatly reduce our ability to rely on the network effects of packages adopting our new features.

The flip side of that is there is a chance that people would follow suit and drop support for Python 2 because pip did, because they want access to new features. However, if we look at Python itself there were large, impactful features and we did not see a mass exodus from Python 2 to Python 3, instead we saw a steady progress (and depending on which source of statistic you look at, have not yet broke 50% on PyPI). I think that most people are not going to drastically alter their support for Python 2 due to pip dropping support, unless 2 is already near dead and pip is just the final straw.

(Nathaniel J. Smith) #90

There’s an important subtlety here that we should think about. If our main goal is to make sure that packages feel free to use new pip features, then for that purpose we don’t have to worry about packages where their latest version has dropped python 2 support, even if they still get a lot of python 2 downloads.

For example, NumPy’s still getting hundreds of thousands of py2 downloads every day, but its dev branch is py3-only. So simply counting py2 downloads across pypi as a whole isn’t a very accurate metric for how many projects are blocked from adopting new py3-only features, and it’s getting less accurate over time.

(Donald Stufft) #91

I think it’s subtler than that even. If Numpy’s py2 version might be updated to take advantage of a new packaging feature then it still matters. It’s only when there is a Py2 version of a package that is never going to get updated.

(Nathaniel J. Smith) #92

Yeah. I’m not sure how much difference it will make in practice, since it seems unlikely that people will be eager to experiment with new packaging features in LTS releases. Also, in the case of numpy, the py2 LTS goes EOL on Jan 1 2020 :slight_smile: But there will certainly be many downloads after that.

(Chris Jerdonek) #93

As an FYI, today I posted a simple PR on pip’s tracker (#6273) using an approach suggested by @njs. The PR adds to pip’s User Agent string whether it looks like pip is running under CI. This addresses pip’s #5499 (“Differentiating organic vs automated installations”). Issue #5499 wasn’t in @ncoghlan’s list of three items in his post.

(Inada Naoki) #94

I expect most downloads are from provisioning tool (e.g. cloudinit, ansible), not only from CI.
Especially, awscli and it’s dependencies are downloaded about 1M/day. It is about 3x of pytest.
So I don’t think excluding only CI download makes download stats usable.
(I was sad when I heard Amazon Linux 2 is based on Python 2.7.)

Ranking only from downloads from macOS and Windows is considerable, although it is still far from real Python usage.