Python core development dynamics

Hello,

This isn’t mean to spark a large discussion, but I’d just like to bring a few facts forward. It seems established that Python is still increasing in popularity (which is almost surprising given how popular it already is in many fields). However, another question is whether Python core development is becoming more or less active.

On a quantitative basis, it’s easy enough to get an idea. Commit graphs both at Github and OpenHub indicate that repository activity has been decreasing since ~2010. The Github transition has not stopped, much less reversed, that trend (yet?). Neither has core-mentorship.

If we want to talk qualitatively, obviously it’s a bit harder. What I chose to do is peruse a single month (September 2018) of repository checkins. Since I could not examine every checkin in detail, I went by their titles. The answer seems to be that Python is mostly in maintenance mode (small bugfixes and extremely minor improvements). One important improvement stands out, though: the conversion of zipimport to a pure Python implementation.

Interestingly, the author of the aforementioned zipimport rewrite, Serhiy, expressed concerns about Discourse lacking a NNTP gateway. With Victor been he’s the single most active core developer for the last few years (not to mention that he often tackles difficult topics and that his work is generally high quality). Given our lack of attractiveness for new core developers, it would be good, at least, not to risk losing high-profile developers simply because we forgot to ensure they were ok with game-changing decisions.

Regards

Antoine.

1 Like

Obviously this is all a balancing act. I think a key thing to remember is this is still an experiment and currently time boxed to about 3 months. If after this time people overall still want to stick with mailing lists then I expect we will just go back to how things were, but I think we need to at least this this a try to attempt to address some shortcoming in mailing lists (which I will enumerate elsewhere where there’s discussions about this experiment are going on).

1 Like

Right, the experiment is fine with me. I just share the concerns of people who complain about the timing.

While the overall trend is down since 2010, I’d also suggest that the period around 2008-2010 might also have been somewhat of an outlier as well. Python 3 was brand new, a lot of things were just changing all at once. There was also a lot of activity generated by having two different lines of development in 2.x and 3.x which would have effectively duplicated the number of commits (since each commit to 3.x’s HEAD would also come with a commit to 2.x’s HEAD).

In addition, Guido also took his “permanent vacation” on July 12, so I would suspect everyone is being more conservative in the kinds of changes they’re making to Python than they otherwise would be, since there’s a general feeling of being rudderless while we wait for a new decision making process to be decided on.

Ultimately though, Python is a mature language and I think that means that the bulk of the activity on it is going to be maintenance. A ship this big isn’t a nimble thing, and every change has the possibility to break something so it’s good to be generally conservative. In addition I think (well I hope?) that our packaging tools are steadily improving, so people feel less need to get every feature they’d like from the language into the language, and are happier depending on something from PyPI instead.

It’s also important to remember this isn’t an either-or choice. Maybe the mailing list is fine for python-committers, because it’s a controlled environment with a smaller group of participants. The reputation and moderation features of Discourse are less important there.

But maybe the size and open nature of python-ideas or even python-dev means that those features become more important. So maybe Discourse is right for those forums.

Yes, it means there will be multiple ways of communicating about Python, but oh well, that’s already the case.

3 Likes

Yea, I made a similar point about the size and nature of python-committers making the problems that Discourse solves less of an issue for python-committer than for larger, mainly public lists. I think that given the sort of unique point in time where we have an otherwise small list that is likely to get a fair amount of traffic which makes it nice for a trial run without causing near the amount of disruption that say, trying it out on python-ideas would be.

That being said, it doesn’t have to be an either/or choice, however there are benefits to trying to move as many of our discussions as possible here. Mainly around the ability to move discussions between categories. If someone starts a thread in python-dev that’s better off being on committers, if they’re both on Discourse, we can simply move the thread. That’s less likely to be a specific thing we do with python-committers because it’s a controlled list of people who are generally better aware of the right “forum” to discuss things in, but for other smaller lists it will still apply.

Of course there’s a related problem that putting too many disparate categories onto a single Discourse instance might also have it’s own issues. For instance maybe mixing python-list and python-dev/-ideas on the same instance ends up still being too much and makes things like “Latest” and “Top” not usable. I know that Rust has two discourse instances, one for development related topics, and one for the user community, so if we like Discourse, that split is something to keep in mind too as we (hypothetically) scale out the use of Discourse.

1 Like

There’s definitely a balancing act between contributors (long-time to newer, email-centric to chat-preferring). One size will never fit all. We already have many communication channels, especially if you take into account communication from other Python projects beyond CPython which many of us work on (Jupyter, SciPy, NumPy, CircuitPython, etc.).

There are multiple goals as we look at communications methods:

  • ability to moderate, pause, or lock discussions that veer off topic
  • reduce the friction for contributors in their CPython workflow
  • reduce the bottleneck and inaction that results from many messages that do not move topics forward constructively
  • enable meaningful discussion of technical issues, tradeoffs and priorities

At the end of the day, we will all need to compromise in some way if we wish to reach the macro-goal: “Growing and sustaining Python as a high quality language”.

I would love to see more use of “Yes, and perhaps…”, “Good point, maybe if we add…”, “I hear what you are saying… and there may be a solution that meets both of our goals” in our discussions instead of “No.”, “That’s a bad idea”, “That won’t work”. Rarely do these discussions or decisions need to be “all or nothing”.

2 Likes

I’ve muted the categories that I’m not interested in, and new posts seem to not be showing in those views (the ones that were already “tracking” from before my setting change are still there). So I doubt this will be a problem if we can help people find those settings.

(Also, Ctrl+V to paste images directly into the reply gets a big +1 from me.)

Muting a whole category is a bit all-or-nothing, though. In email, I subscribe to python-list, but regularly don’t bother looking at it unless I have free time. That’s why I like gmail’s “list XXX - NN unread” UI - it lets me see what lists I can deal with based on the amount of time I have available.

And yes, Ctrl-V to paste images (and the whole formatted text thing) is a huge plus, that it’s easy to ignore on a “well, obviously you get that with a forum but don’t with a mailing list” basis. When we hit python-list, it might be interested to see the effect there, with people pasting screenshots of their editor rather than copyable code snippets, though :slight_smile:

… and being able to edit your post - it’s so nice that I could fix a typo just now!

There are some other options next to the “Muted” setting in Steve’s screenshot… IIUC, they’re:

  • Email me a copy of every post in this category
  • Don’t notify me for posts in this category, but do show me an “unread count” when I look
  • Email me whenever someone starts a new “topic” (thread), and if I’m interested I’ll subscribe to manually
  • Mute this category entirely

Thanks. Email options are pointless IMO - I don’t want emails reminding me to check Discourse, that defeats the object (for me - I was happy with mailing lists, but if we’re switching to Discourse I want to read it in Discourse, not get email pings telling me I need to read Discourse).

That leaves “show me an unread count when I look” which I don’t understand what it means. I’ll have to experiment with it. But honestly I don’t expect to need it. I either subscribe to a category (= list) in which case I participate on an equal basis with any other category, or I don’t subscribe in the first place. So I’m not likely to actually want to mute or suppress categories I’m subscribed to in the first place.

Well, I tried multiple times to warn that the Python core development is slowing down and the number of active core developers is decreasing. To me the trend is now obvious. For example, you can compare the growth of the number of open bugs or open pull requests to the decrease of the new core developers per year. More work with less (active) people.

I’m working on mentoring to get new core developers aboard, but it’s a long and slow process :slight_smile:

2 Likes

@pitrou @vstinner While I haven’t pulled stats, I believe that Python core development participation competes with other projects for contributors. If the contributor experience has too much friction (long delays in responses and merges 2+ weeks), a contributor will often move on to a project that has better responsiveness.

While I believe mentoring is important, I think that our current workflow should better surface PRs into buckets for greater visibility in the hopes of closing/merging quicker (open 1-2 wks, 2-4 wks, < 3mos, < 6mos., < 18 months, > 18 months - assuming an approximate release cycle). Ideally the same for issues, but PR velocity is likely more important initially since these are the ones that impact people who have taken the time to open a PR.

4 Likes

I have created some statistics of the commit rate over time on the CPython repository.

commits

hist

3 Likes

Here is the commit rate annotated:

2 Likes

Wow! Thanks for those graphs @pablogsal. If you’re willing to play more with graphing I would suggest two additional statistics:

  • number of commits per release (this probably implies ignoring the bugfix branches)
  • number of commits per day, but smoothed using e.g. a 1-week or 1-month moving average, to make the overall trends more visible
2 Likes

Wonderful graphs, Pablo!

I feel obliged to point out that the apparently low recent commit rate is possibly due to the new GitHub-based workflow, which may have led to more changes being “squashed” into a single commit rather than merged in as-is. Therefore, ISTM that comparing the commit rate before and after the adoption of the GitHub workflow is an apples vs. oranges comparison.

2 Likes

Before Github, we had a patch-based workflow, so a patch was refined offline until it was finally accepted and committed as a single changeset.

What could have changed with Github is that pre-merge CI reduces the probability of introducing CI regressions in the repository, so less fixup commits.

But the commit rate graph doesn’t show a break in the overall trend around the Github transition, so I think the trend simply carried over.

1 Like

Here are the requested plots:

Maybe we can repeat these experiments taking into account the diff in lines ((added lines - removed lines)/day).

2 Likes

I expect it to be more difficult to interpret (sometimes we remove a lot of code due to some cleanup or module / platform removal decision). But it may be insightful nevertheless.