Packaging and Python 2

pitrou · January 13, 2019, 12:26pm

As long as the packaging ecosystem is maintained by volunteers, I think the last word should go to whoever maintains each tool or package.

Going forward, maintaing Python 2 compatibility should become a vendor-supported activity. Python 2 users should look towards companies such as Anaconda, RedHat, Microsoft if they want continued compatibility.

dstufft · January 13, 2019, 6:01pm

I don’t think that vendor supported is going to work for Packaging tools, because of the large amounts of network effect that is at play here.

An open source project can just use an unsupported version of Python 2.7 to test their package with, and can utilize backports to get a large chunk new Python features available to them even while keeping 2.7 compatibility. Unfortunately the same isn’t true for the packaging tool chain, you can’t extend pip 19 by installing an additional libraries from PyPI, you’re stuck with whatever features existed in pip 19 if you want to continue to support 2.7.

Thus I don’t think being aggressive in dropping support for Python 2.7 actually saves us any work, at best I think it just shifts the work around to make it harder to design new standards and features so that they can co-exist with whatever was existing in pip 19 as well as possibly forcing us to compromise future designs.

Ultimately, I think it will bifurcate the user base, and will cause Python packaging improvements to stall and remain stalled for years.

encukou · January 14, 2019, 9:56am

I don’t know what Red Hat will fund (despite working there), but it wouldn’t surprise me if we keep things working with 2.7, with any patches freely available :‍)
We’ll be happy to coordinate, and I’ll be happy to set something up if there’s interest. However, RH can’t do testing and care for Ubuntu/Windows/Mac. Is anyone interested in doing that, either in the PyPA space or in some LTS fork?

ncoghlan · January 14, 2019, 11:20am

Based on some of the replies, I realised I should probably clarify what I had in mind when I mentioned the idea of an LTS branch for pip (since I somewhat agree with @dstufft regarding the risks of delaying adoption of new packaging features if there turn out to be a non-trivial number of projects still publishing hybrid packages to PyPI, but also think there’s a significant cost in demanding that all potential new contributors learn how to write hybrid Python 2/3 code, rather than just letting them write for Python 3).

Specifically, I am envisioning something akin to what we’ve had for the last several years with the CPython 2.7 branch: the branch is still open to contributions, but the majority of CPython contributors simply don’t need to care that it exists. Instead, the only folks that need to care are those that are either working on Python 2.7 specific bug reports, or else are working on backporting fixes from the main Python 3 development branch.

So in a “pip LTS branch” model, I’d see there being an LTS branch in the main repo that kept Python 2.7 in its CI matrix, while the main line of development became Python 3.x only.

If there were a change where it was absolutely critical at an ecosystem level that Py2.7 users be able to benefit from the enhancement in order for the change to be effective at all, then that change would be made to the LTS branch in addition to the main line of development (if that means that 2020 nominally ends up having more than 12 months in it, at least as far as the LTS version numbers are concerned, so be it).

Personally, I expect such changes to be relatively rare once the PEP 517/518 implementations settle down - the main things where I could see us really wanting a Py2.7 compatible implementation is if funding is forthcoming for an actual implementation of package signing in the PEP 458/480 model, as well as for any enhancements that get made to the platform compatibility tagging model.

dstufft · January 14, 2019, 1:06pm

I’m not a fan of the LTS branch model, largely because I think it is either pointless or significantly more work. Either the delta between master and the LTS branch is small, in which case the branch isn’t really buying you much besides some minor niceties OR the delta becomes large in which case backporting changes becomes a nightmare (I think it took weeks to backport the SSL improvements from 3.x to 2.x for instance, most of which was reconciling differences between the two branches). I think the latter problem will be a lot worse for pip than it was for CPython, largely because while CPython has a lot of modules which are largely independent other than using the public API of each other, pip’s code base weaves around itself and doesn’t have nearly the same clear boundaries that Python itself has between modules (although maybe other tools have a better story here?).

One of the typical goals of dropping 2.x support (being able to do a bunch of compat code cleanup and start utilizing 3.x only features) actively makes the nightmare case far more likely.

The main benefit of a LTS branch model is that it lets you continue new feature work that doesn’t get backported without having to deal with 2.x support in that branch. However, I don’t feel like that is a very big benefit given the cost in one case (or the pointless-ness in the other).

Ultimately I think that an LTS branch is likely going to be just dropping support for 2.x, but in a way where we pretend we haven’t. I think that we need to either just be upfront and drop support or we need to keep support for installing/packaging in the “primary” tools (I don’t care if someone makes a new build backend or something that is py3 only) in some fashion as I described above.

Having worked on major projects that are Py3 only and major projects that straddle the line between the two, I don’t honestly feel that straddling the line is particularly difficult. There’s nothing major in Python 3 that is going to benefit pip (and likely other projects) other than the ability to do clean async code to allow us to do concurrency more cleanly. Everything else I can think of in Python 3 is either backported (via PyPI) or isn’t really that big of a deal.

As far as new contributors go, I suspect that the “hairy-ness” of the code bases and the debt we’re still paying down from the era of gridlock is a far larger barrier to entry than needing to deal with the syntax to straddle the boundaries.

All that being said, I don’t explicitly think that we should keep 2.x support for as long as anybody is using it, anywhere, ever. All I think is that we shouldn’t look at the dates at all (except in that we shouldn’t drop support before Python itself does) but we should rather let usage inform our timeline instead. That doesn’t mean we need to target < 5% usage like pip has historically done, maybe < 20% is the right answer (as a random number pulled out of my ass). I think that the current 60% number (as per the PyPI statistics) and a projected 50% number in 2020 is a bad place to draw the line.

pradyunsg · January 14, 2019, 3:43pm

It might be useful to explore this avenue – extending and enabling pip to install on an interpreter it is not running on. This enables the codebase to be Python 3 only while providing a way for Python 2 users stay on the newer releases.

I remember that someone had posted a PoC for this somewhere in pip’s tracker once (I can’t seem to find it right now; on mobile)

steve.dower · January 14, 2019, 4:03pm

This would also enable installing into a venv that doesn’t have its own pip in it, which would significantly improve venv’s creation time (I recently benchmarked venv on Windows being about 0.5s --without-pip and 40s otherwise, so I’m very in favour of this for a number of reasons).

dstufft · January 14, 2019, 4:18pm

Yea, like @steve.dower said, there’s a lot of benefits to doing this (allowing people to only have one pip on their machine is a big one).

I think for pip specifically it wouldn’t be super hard, trying to go off memory I think we inspect the current running Python for:

Basic computer/Python traits for determining compatibility tags, user agent, etc.
- Should be pretty easy to do in a subprocess and return to a host process.
Compiling the .py(c|p) files.
- Again, easy to do in a subprocess, Python already has a command to do it (python -m compileall).
Determining paths to install files to.
- There’s a lot of legacy code here where we’re monkeypatching (IIRC) and using distutils to determine the path instead of just using sysconfig and computing it ourselves. It would be useful to figure out why and if we can switch to just computing it ourselves given values from sysconfig. If we can compute it ourselves, then this is pretty trivial, but it gets more complex if we have to continue to the logic that we currently have.
Determining what versions of a project are already installed.
- I suspect this is the hardest bit of code, because we’re currently using pkg_resources, and that doesn’t have a “target” mode. Probably the best route here is to get something into packaging that can enumerate the installed packages and other information we need (including any standards we might need to draft) and make that a targeted API instead of a “only the current environment” API.

I think that’s everything?

Of course that would solve the problem for pip but not for build backends, particularly setuptools since that’s the backend that the bulk (all? I forget if flit supports 2.7 or not) 2.7 using projects are going to be using.

willingc · January 14, 2019, 7:11pm

(all? I forget if flit supports 2.7 or not)

From Flit docs:

Flit requires Python 3 and therefore needs to be installed using the Python 3 version of pip.
Python 2 modules can be distributed using Flit, but need to be importable on Python 3 without errors.

pf_moore · January 15, 2019, 9:37am

This would be an immensely useful feature in many ways. It’s something I’ve considered for a long time, but never had the free time to work on. I’m a strong +1 on this, both as an independent feature and as an approach to maintaining Python 2 support without needing to ensure pip can run under Python 2. And yes, I know, “install Python 3 to allow you to install packages in Python 2” is suboptimal for users.

pf_moore · January 15, 2019, 9:56am

Let’s get specific here. Who, precisely, will support pip on Python 2 going forward? There’s a lot of talk about the amount of work needed to support Python 2, but it’s not at all clear to me who will do that work. I’m on record as saying that I won’t, for example.

As things stand, the work of Python 2 support is a hidden overhead on all support activities. You can’t avoid it - if you work on a PR and the 2.7 CI fails, it needs fixing, even if 3.x is all working fine. The significant advantage of a LTS support branch is, as @ncoghlan pointed out, that it separates the concerns. People interested in maintaining Python 2 can work on the LTS branch, people not interested can ignore it and work just on master. If it’s a lot of effort to maintain the LTS branch, that means that Python 2 support is a lot of effort. If it’s trivial, Python 2 support is trivial. No problem - either way, it should be the people willing to provide Python 2 support that pay that cost.

The massive disadvantage here is that this arrangement splits the (already small) pip maintainer base. It adds overheads that we don’t have the resources to support. But in effect that’s just exposing a problem that we already have - we wouldn’t be talking about dropping Python 2 support if it wasn’t a drain on resources that we didn’t want to carry.

I’d like to see some commitment from the Python 2 using community to provide maintenance resource for pip under Python 2. That doesn’t need to wait till 2020, it could happen now. PRs for Python 2 issues, help in looking at how we set up and maintain an LTS branch, etc. Here, of course “the Python 2 using community” probably means “the enterprise distributions” - because getting commitment from grass roots users isn’t really sustainable or practical (although if one or two individuals come along and say they’ll specifically cover Python 2 support for a few years, that might work).

dstufft · January 15, 2019, 4:30pm

So there are a few parts of this.

One is that the code base, as it stands, works on Python 2.7. So no real additional effort is required to make what already works there continue to work, particlarly since no new releases of CPython will be coming out past 2020 so we’re effectively going to be targeting a fixed point, so it won’t really be changing out from underneath us.

Another part is new incoming bug reports that only effect 2.7 code bases. This is, IMO, the bulk of the effort of supporting a particular runtime where the code is already working. As I mentioned earlier, I’m perfectly happy declaring 2.7 in “community support” or something that means that if a 2.7 only bug report lands, we either close it or tag it as community support or something, and we basically just ignore it until the community opens up a PR or we finally fully drop support for 2.7, in which case we close it.

The last part is the part you mentioned is that any new PRs will have t continue to make sure they run on 2.7. For this, I don’t think there is any way around the fact that people writing those PRs are going to have to make sure it runs on Python 2.7. Typically that isn’t very hard, and largely means either locating backport libraries from PyPI for Python 3 features or avoiding a few language level changes.

This is not really true, because it’s not necessarily Python 2 that’s causing the pain, but simply the decisions made on the Python 3 branch.

For instance, right now we can’t use asyncio or trio. The fact we have to keep maintenance going prevents us from making decisions that make it hard to impossible to continue Python 2 support. But in the hypothetical of asyncio/trio/etc if the master branch switched to using that, effectively all future patches written for pip master branch would have to be written from scratch to support the non asyncio/trio’d Python 2 code base.

Now maybe you’ll say “well we’ll be careful not to introduce anything that can’t be backported to Python 2 if someone puts in the effort”. Which first of all, sounds wholly impossible to me, because until we’re running the test suites on Python 2 we don’t really know what Python 3-isms we’re adding and how that’s going to affect Python 2. But beyond even that, just the fact you have two disparate sources means that backporting becomes harder the longer the two sources are split. CPython managed to get away with it largely because (A) they weren’t backporting features (which we would need to enable) so the backported patches were generally pretty small and (B) CPython doesn’t allow broad cleanup pull requests., which doing just that is one of the stated goals of a backport. The more of a delta the two branches have just from code churn, the more difficult it is to maintain the backport branch, not because supporting Python 2.7 is hard, but because of the nature of split branches.

I do not think it is meaningfully possible for us to maintain a LTS branch that gets anything more than select bug fixes pulled back to it. Anything else is effectively a fork IMO and would be divergent development streams that need patches effectively rewritten to go between them.

In either case, a LTS branch does not mean that “If Python 2 support is trivial, then a LTS branch is trivial”, because IMO, the bulk of the effort of maintaining Python 2 in that model comes directly from the model itself, not from trying to continue to support Python 2.

I basically don’t care about this at all, because I don’t think the LTS branch model is in any way workable for the actual problems we would have dropping support for 2.7, and If we keep 2.7 support in mainline, then there’s no significant porting effort needed, and I don’t think we need to care about fixing 2.7 only bugs in master. The only real, additional, work in my proposed model only comes from ensuring that any new changes to the code base don’t break 2.7 support, and there’s not much that an external contributor can do to contribute there. It’s not reasonable to hold up every PR and say that this PR can’t merge until Red Hat or someone comes along and adds in 2.7 support in that PR.

Ultimately, I don’t think dropping support even saves us real effort, it just shifts effort around and makes other activities a lot harder. Keeping new code changes working in 2.7 is not very hard, but the knock on effects of dropping support for 2.7 when it’s still the largest source of traffic will likely be far more work.

dstufft · January 15, 2019, 4:33pm

To be clear here, I don’t care about making it optimal for users. They’re on a legacy runtime at that point. As long as it’s “reasonably possibly” without hamstringing our ability to continue to improve packaging when most of PyPI is still supporting 2.7 then I think that’s absolutely fine.

brettcannon · January 15, 2019, 5:25pm

That was actually what I was getting from Paul’s statements: that an LTS is best-effort by the community to keep functioning, but it’s essentially a freezing the features of the e.g. last 18.n.n release with Python 2.7 support and future work goes straight to master. So yes, they would diverge and that would be expected.

Longer-term support would seem to be providing a way to point pip at an interpreter and have it install into the proper location. Black, for instance, is already doing this by saying it only works in Python 3.6 or newer but can format Python 2 code. It would help do away with the whole “pip versus python -m pip” issue. I also selfishly like that idea as having a tool like pip switch to that model would mean it wouldn’t be nuts for the Python extension for VS Code to require Python 3 to be installed.

dstufft · January 15, 2019, 6:29pm

Feature freezing is the problem though. Because for any project that wants to maintain support for installing to a 2.7 project are not going to be able to use any of the new features of packaging. Let’s pretend for instance that PEP 517/518 hadn’t landed until after we dropped support for 2.7. Suddenly 100% of projects that want to continue to support 2.7 can no longer use 517/518 even if they want to.

We’re effectively going to be creating the same “Straddle the compatability line” between 2.x and 3.x that Python had, except with packaging and it’ll be “pip 19 and pip 20+”, except it’ll be worse because you won’t be able to easily backport things like we could with Python to augment the older version like we could with Python (e.g. you can install pathlib from PyPI for 2.x, and just continue to use that from the stdlib on 3.x).

Maintaining random bug fixes for 2.7 is the absolute least interesting or useful thing to do for supporting 2.7.

brettcannon · January 15, 2019, 9:40pm

Yep.

Sure, but we have that problem now simply due to people not upgrading pip as quickly. I’m not suggesting that any LTS release would be supported indefinitely, but maybe for the rest of 2020.

Otherwise I say don’t worry about it and let the people who are going to get paid to support CPython 2.7 also support pip starting in 2020.

dstufft · January 15, 2019, 9:52pm

Yes-ish. But upgrading pip is often times easy or trivial. For example every new virtual environment created by virtualenv automatically get the latest pip upon creation. If I remember correctly whenever we release a new pip it becomes the number one downloader within 1-2 days (of course the long tail takes longer, but the long tail is almost entirely pinned to whatever distros ship).

Moving from Python 2 to Python 3 is non-trivial.

One of the lessons learned in packaging from before is that tying new features to language runtime means they get little to no adoption. That makes it much harder for us to evolve the ecosystem because we rely heavily on making small improvements, getting feedback, adjusting then building on it.

brettcannon · January 15, 2019, 9:55pm

That’s only virtualenv; venv doesn’t get this.

Yep, this stuff is hard, especially when no one is being paid to pull along the legacy stuff. Basically I’m just trying to advocate for whatever makes your lives easier.

ncoghlan · January 18, 2019, 5:20pm

To be completely clear, I think volunteer community contributors (even to packaging tools) should drop Python 2.7 support in 2020.

The scenario Donald paints as a “nightmare” (where the main line of development diverges and backports become painful enough that nobody will do them for free) is exactly the outcome I want: the LTS branch effectively becomes frozen in time not by policy, but by the grossly out-of-whack effort/reward ratio.

At the moment, we’re creating a situation where it’s effectively impossible for redistributors like Red Hat to derive business value from “We keep pip working on Python 2.7” because pip’s contribution process is set up to demand that labour be provided by the contributors that only personally care about Python 3.

I’ve written about this before in the context of the upstream community dropping Python 2.6 support: https://www.curiousefficiency.org/posts/2015/04/stop-supporting-python26.html

Stop demanding that volunteers subsidise large enterprises, and instead leave deliberate openings where vendors can not only step in and do the work, but also be in a position where they’re able to explicitly take credit for that work.

dstufft · January 18, 2019, 5:51pm

I feel like we’re talking at odds here.

I don’t think there’s anything to be done to “keep pip working on 2.7”. I think basically every released pip ever works on 2.7 and still works, with little to no patching required. Hell, just for kicks I went and installed the oldest known pip there is, pip 0.2, from PyPI and other than having to set --index-url to point at HTTPS it was able to completely successfully install requests, bpython, cryptography, and all of their dependencies. If a 10 year old pip (from 2 years before 2.7 was even released!) still functions perfectly fine on 2.7, I’m guessing pip 19 is going to continue to work just fine without any (major?) effort.

What I’m worried about isn’t “does pip still work on 2.7”, but the network effects of effectively mandating that everyone who wants to continue to support 2.7 has to use only the features available in the packaging toolchain circa the end of 2019. If we add a new feature to a Python 3-only using pip 20, then a project that wants to maintain compatibility with 2.7 is not going to be able to use that feature unless that feature is carefully designed to be a progressive enhancement and compatible with the entire toolchain circa end of 2019. I think it’s going to cause us to have to expend even more effort in making standards, and likely result in compromises that will make our standards worse, not better. I think it’s going to put us into “distutils2” terrority where we’re making improvements that most people can’t or won’t use because it’ll force them to have to drop 2.7 compatibility.

By leading the way, we’re effectively telling all of the other open source projects that they either must drop 2.7, or they need to adopt a LTS model themselves, or they have to ignore any improvement we make until they do the above.

I think not only would it not save us any effort (it would just shift the effort from when writing new PRs to when designing standards) but it would hamstring future work on Python’s packaging that we likely wouldn’t break free of for years.

I don’t care at all about trying to maintain some frozen in time snapshot of pip that works on 2.7. That’s basically zero effort anyways. The only thing that matters is whether or not 2.7 gets new features backported to it, and I don’t think Red Hat (or any of these distros) care at all about the network effects of 2.7 support and I don’t think they’re going to invest in backporting features, particularly when the “harm” in not doing so is going to affect a packaging toolchain other than the one they actually want people to use.