Upon reflection, I should issue some caveats about this data:
Our method of inferring unique users is not perfect, but it’s as good as we can do while observing things like GDPR and general data-privacy hygiene
Anaconda’s user base skews towards data science & machine learning use cases, more than e.g. embedded systems or scripting within 3rd party apps (although conda is used as the pkg manager in some apps). Thus, our users on average will tend to be more students, individual practitioners, and data science teams with less legacy code base dragging them towards Python 2.
Ok, so see New PyPI Statistics - Simple API Requests but as of last night we’re now requiring accesses to /simple/{project}/ into a different BigQuery table, which will let us independently query that data.
Just as a reminder the practical differences here are:
The last version and many older versions of pip do not cache these pages, and the versions that do cache them only cached them for a maximum of 10 minutes.
pip install -U requests && pip install -U requests will show as two rows in this table (unless you’re using one of the aforementioned pip versions and you don’t put a 11 minute sleep in there).
This also means that python2 -m pip install ... && python3 -m pip install ... will correctly register as two events.
And the caveats that remain the same:
If someone is using a mirror, local or otherwise, we have no insight into their data.
We’re tracking HTTP requests, not users. It’s possible that a single large entity could be responsible for 90% of the traffic and we’d have no way of knowing.
This also means we don’t know why someone is installing the project, it could be CI or it could be real users installing software to use.
This data is very tentative right now, since we have less than 24 hours of data, however the current % of Py3 events is 33%, which is inline with the download event based numbers. Which seems to suggest that the caveats that got eliminated are not causing skewed data (it is possible of course that the remaining caveats are skewing it in some way!).
Going through the thread again now, I’ll try to summarize things.
Here’s a bunch of things I think we agree on:
What dropping support for Python 2 would mean
for pip: not being able to install/uninstall to environments on Python 2.
we could implement the ability to modify environments pip isn’t running in, which is a good idea for a lot more reasons than just Py2/3 code (comment)
for setuptools: not being able to build packages when run on Python 2.
pip, setuptools will be among the last packages to drop Python 2 support in the ecosystem
We don’t want to be maintaining support for the full tail of Python 2 packages.
Looking at what we discussed about the 2 broadest questions raised here:
i) How to decide when to drop support?
Use metrics from PyPI (file downloads and requests to the simple API) to determine when pip support is dropped.
Question: What %age do we drop support at?
Drop it on a certain date.
ISTM that we will stick with 1 here.
ii) Who would do the work of maintaining support till support is dropped?
This has been the most debated topic here and the one I don’t think we reached consensus on.
PyPA members (i.e. volunteers).
Cost of maintaining Python 2 support.
“community support” – i.e. maintainers won’t do anything for Py2-only issues, unless someone files a PR (or support is dropped).
Vendors
LTS branch
Factors affecting whether they’d fund/contribute work
Other things that were mentioned that I think are worth surfacing here:
Python 3 packages becoming the default output of build systems (comment)
“At some point, package maintainers will start taking for granted that python 3 is the only python.”
“[snip] PyPI works best remaining relatively neutral about the files it’s hosting.”
pip should learn to better “identify” CI contexts, annd add this info to the User Agent (comment)
(more?)
As an aside, I might have missed an important point somewhere here so please feel free to point it out.
setuptools has not been directly discussed much. There are different trade-offs there vs pip but I expect that it’ll be similar to what overall strategy we decide upon.
Specifically to pip, what are the consequences if it simply starts pushing releases with requires-python >= 3.0 at some point? pip is already in a somewhat usable state, and as long as packaging specs keep backward compatibility (I believe they need to anyway?), old versions of pip would still continue to do what they need to do for people stuck on Python 2. This would create a similar effect to Python 2.7 feature freeze, creating incentives for people to migrate.
I agree with his assessment, that this would be similar to bifurcating the user base if we do it too early and this does add a burden to any new specifications we write.
If we decide that we don’t care about Python 2 when writing new specifications, it would limit adoption of said specifications.
Doing a python_requires change is how pip would eventually drop support.
As @dstufft pointed out, it means that any new packaging innovations like pyproject.toml will not reach Python 2 packages and that could cause their adoption to drag on.
Yea, specifically it means that any project that wants to support Python 2, even if they’re primarily used for Python 3, cannot depend on any new feature in a Py3 only pip (or any other point of the toolchain). A lot of the features we’re adding are the kinds of features where a single project adopting them is cool, but where the real power comes from is when we can move large swathes of the projects over to the new feature relatively quickly.
Thus I think that dropping support for Python2, would effectively pin a lot of packages to never being able to update until they also drop support for Py2, which would greatly reduce our ability to rely on the network effects of packages adopting our new features.
The flip side of that is there is a chance that people would follow suit and drop support for Python 2 because pip did, because they want access to new features. However, if we look at Python itself there were large, impactful features and we did not see a mass exodus from Python 2 to Python 3, instead we saw a steady progress (and depending on which source of statistic you look at, have not yet broke 50% on PyPI). I think that most people are not going to drastically alter their support for Python 2 due to pip dropping support, unless 2 is already near dead and pip is just the final straw.
There’s an important subtlety here that we should think about. If our main goal is to make sure that packages feel free to use new pip features, then for that purpose we don’t have to worry about packages where their latest version has dropped python 2 support, even if they still get a lot of python 2 downloads.
For example, NumPy’s still getting hundreds of thousands of py2 downloads every day, but its dev branch is py3-only. So simply counting py2 downloads across pypi as a whole isn’t a very accurate metric for how many projects are blocked from adopting new py3-only features, and it’s getting less accurate over time.
I think it’s subtler than that even. If Numpy’s py2 version might be updated to take advantage of a new packaging feature then it still matters. It’s only when there is a Py2 version of a package that is never going to get updated.
Yeah. I’m not sure how much difference it will make in practice, since it seems unlikely that people will be eager to experiment with new packaging features in LTS releases. Also, in the case of numpy, the py2 LTS goes EOL on Jan 1 2020 But there will certainly be many downloads after that.
As an FYI, today I posted a simple PR on pip’s tracker (#6273) using an approach suggested by @njs. The PR adds to pip’s User Agent string whether it looks like pip is running under CI. This addresses pip’s #5499 (“Differentiating organic vs automated installations”). Issue #5499 wasn’t in @ncoghlan’s list of three items in his post.
I expect most downloads are from provisioning tool (e.g. cloudinit, ansible), not only from CI.
Especially, awscli and it’s dependencies are downloaded about 1M/day. It is about 3x of pytest.
So I don’t think excluding only CI download makes download stats usable.
(I was sad when I heard Amazon Linux 2 is based on Python 2.7.)
Ranking only from downloads from macOS and Windows is considerable, although it is still far from real Python usage.
To bring it into actionable tasks for pip, here’s what I think we have:
We should update the message we’re printing on Python 2.7 currently, to add a link to pip’s documentation, in the next release. That can include a proper description of our final decision here.
We will drop support for Python 2.7 from pip, when usage falls below a threshold we are comfortable dropping support at. (we can discuss that separately)
Regarding the “big” question of who will maintain Python 2 support, as far as I can tell, there have been 2 proposed approaches:
PyPA members doing the maintenance work, the Python 2 CI will be kept green and maintainers will fall back to “external support” for resolving any future Python 2-only issues.
Python 2 support on mainline is maintained not by PyPA members, but by a vendor’s team who will make sure that pip’s master is kept in a working state for 2.7, and take responsibility/credit for the same.
Something interesting that was brought up is that urllib3 & requests are more or less holding onto Python 2 support until pip doesn’t need it any more. Should we just hang in waiting until pip reaches the usage threshold to end support? Can/should we drop support ahead of pip (since pip vendors us anyway)?
I think in general pip’s POV has been we generally hope that our dependencies will keep around support for the things we support, but that it’s not mandatory. If need be pip will either cope somehow (patching, bundling two versions, sticking with the last known version, something) or it will force the issue.
This question is still unresolved. I don’t feel any maintainer here prefers that the answer to this to being “the volunteers currently maintaining these packages”. (Do correct me if I’m wrong!)
What are our alternatives and how do we move forward on this?
I’m not sure if prefers is the word I would use, but I do think it’s the only workable solution. Anything else I could come up with either doesn’t actually solve the problem for why we would want to keep supporting 2.7 or its so cumbersome that it’s really just dropping support for 2.7 without coming out and saying it.
I’m also of the opinion that keeping 2.7 support isn’t much additional effort and last I looked still represents the majority of our users. They might have changed though but I’m not at my computer to pull up the numbers.