Idea: a common organization to keep orphaned projects alive

[TL;DR: I’m wondering if we (as in group of volunteers) should form an organization dedicated to taking over orphaned projects, and keeping them alive until they find a new dedicated maintainer.]

I think the way we handle orphaned packages in the Python ecosystem is suboptimal right now. This is especially problematic for packages that serve as dependencies for many other projects — html5lib is one recent example. I’d like to discuss three specific problems:

  1. People tend to notice that a project is orphaned pretty late — usually when things break.
  2. People with stakes in using the project don’t tend to be interested in becoming maintainers of the orphaned package.
  3. All things considered, people usually need to resolve the problem fast, so they go for short-term solutions that result in technical debt.

First, I’ll expand on these points.

Noticing orphaned projects

How can we tell that a project is orphaned? For the scope of this discussion, let’s go for being abandoned in the strictest sense — there is a problem requiring maintainer’s attention, and the maintainers cannot be contacted. But first, people need to have a reason to try to contact them.

So usually the first step is that something starts affecting other packages — say, a new version of a dependency of the orphaned project breaks it in a way that affects consumers, or the consumers are trying to add support for a new Python version, but the orphaned project doesn’t work. Then you look at the issue tracker — perhaps it’s been reported already, a patch has been submitted and the maintainers didn’t respond to it for quite a while. Or perhaps it hasn’t been reported yet, so you need to report it, perhaps make a patch and wait.

Good news is, downstream packagers are often in a better position to notice these problems early. We often unpin dependencies, run upstream test suites, test packages with new Python versions early — so there’s a good chance that if something goes amiss, we will know early.

Unfortunately, we aren’t always really in position to deal with that. Given that usually the ratio is something like one unpaid downstream maintainer for perhaps hundreds of upstreams, some of them working on their packages full-time, we only can handle as much. Let’s say that in a week, I get a dozen or more new CI reports that the test suites in the most recent versions of some Python packages. In my experience, perhaps nine out of ten such reports are false positives — test cases that are flaky, or make too fine assumptions, or perhaps that evasive pytest crash that I really reproduce in one out of thousand test runs, and CI hits surprisingly often… In the end, it might be already fixed in git, or perhaps will be fixed for the next release, so I don’t really have the time nor energy to spend on it. But I’m digressing.

With things like html5lib, we have a real chance of knowing early. Because we hit these bugs and they block us. We file bugs, submit patches, and we backport these patches to our repositories. Unfortunately, even when we know that something is amiss, we tend to assume we’ve done all we could.

No interest in taking projects over

Unfortunately, even when it’s pretty clear that a project is abandoned, people simply aren’t interested in taking it over.

For me, as a downstream packager, the reason is simple. I don’t see myself as a good choice of maintainer for a project such as html5lib that I personally don’t know or don’t use in any of my projects. I’ve noticed that it’s abandoned because some package in Gentoo needed it, and this caused issues for us, so I went out searching for a solution. It’s one thing to report a bug and submit a patch, or to take a patch that some other person submitted. It’s an entirely different thing for an arbitrary downstream maintainer to claim the package.

Just to be clear, I’m not saying I’m opposed to putting the necessary minimal effort to keep it alive (though I guess I regret whenever I say something like that). It’s just that it feels like I’d be one of the many downstream maintainers arbitrarily deciding I deserve to own this particular package I have barely any clue about.

The problem is, there doesn’t seem to be any interest in taking projects over by people who actually have stakes in them either. Over time, I have asked a few people who have forked the dependencies of their projects to keep them alive if they’re interested in maintaining them going forward — and the general answer was “no”. Some of them indicated they’ll be open to accepting pull requests on their forks, but not taking over the original package to do the same thing.

Short-term solutions

So when projects are orphaned, and other packages are affected, people are looking for short-term solutions to fix the problem. The html5lib case is perhaps the most instructive: what really happens is that people start forking the original package, independently, to apply patches.

What we end up with then are:

  • the original orphaned package on PyPI, broken
  • patched versions of the original package distributed by various downstreams
  • forks of the original package using the same package name (which means, say, installing html5lib-modern overwrites html5lib with the “patched” version, and installing html5lib overwrites html5lib-modern with the broken version — and Python package managers are entirely happy with the resulting dependency graph)
  • forks of the original package using a different package name
  • packages vendoring a patched version of the original package
  • completely new packages meant to replace the original package

For me, as a downstream packager, this is a serious concern. What used to be a single orphaned package that we could track and patch is now perhaps half a dozen different packages, with different patches and most of them are effectively unmaintained anyway. On top of that, some of them cannot be installed simultaneously (since they use the same package names). Though again we are in an advantageous position — because we at least can patch that out to some degree; so in Gentoo we only have one html5lib package that we originally patched, and we patch everything else to use it.

My idea: an organization to take care of orphaned packages

So here’s my idea: I’d like to propose forming an organization that is specifically dedicated to taking care of orphaned packages. I volunteer to join it, and I think some other downstream packagers may be interested as well.

We generally already hit problems with packages early, report them and suspect packages of being orphaned. We also already have to patch them downstream to resolve issues. However, rather than repeating that work independently, we could instead collectively keep these packages alive for the time being.

That said, I don’t see it as equivalent to a “dedicated maintainer”. Rather a stop-gap solution: a repository for shared patches, a group of people to file PEP 541 requests, and keep publishing fixed versions until either the original project is revived or a new dedicated maintainer is found — which is when we hand the PyPI project over.

What are your thoughts?

2 Likes

This sort-of already exists as Jazzband · GitHub which is run by @jezdez (and possibly others) I believe?

There’s also New project: compat-fork

From what I understand, jazzband aims to be a “permanent” umbrella for projects, rather than something specifically focused on proactively taking care of abandoned packages.

Oh, nice, thanks. I was searching for “abandoned” and “orphaned”, so this didn’t come up. I suppose that makes things easier.

Yes, my compat-fork project sounds similar to what you propose; I explicitly aim to avoid “heavy” maintenance like new features—just the minimum to keep things going. You’re welcome to join the project as a co-maintainer if you’re interested.

I don’t see how to apply, so I guess you have to add me. I’m mgorny on GitHub. Thanks!

1 Like