Collecting more feedback about contributing to Python

Have you tried contributing to the development of Python itself, or have considered doing so? I’d like to hear your thoughts and experiences! I’m collecting such information to guide work during the upcoming core-dev sprint on making contribution easier and friendlier.

You can reach out publicly or privately. I’ll keep private stories to myself, only mentioning specific relevant points from them without mentioning who sent them.

Public stories will be added to the dedicated repo, which already includes many such stories which I have previously collected.

For more info on the core dev sprint, see the dedicated information website.

2 Likes

Tal, how would you like to hear thoughts?
Replies here? Github issues? A Google form?

For public stories, replies here work well. You can also send me direct private messages here.

You’re also welcome to create a PR for the contribution stories repo if you prefer.

Hi Tal,

I don’t know if I’m missed the sprint, but I’d like to give you my
experiences.

TL;DR:

I am a junior core dev who has been absent from active development (not
precisely by choice) for three or four years now. For reasons that will
be explained, I have to effectively learn (or relearn) all the processes
from (not quite from) scratch. In my experience, the barriers to being
able to contribute are much higher than they were when I wrote my first
patch for CPython.

(Or maybe I’m just too old and not sufficiently motivated to climb
that hill.)

Hi Steven, you haven’t missed the sprint, it begins next week.

I’d love to hear more details about your experiences, including specifically the barriers which seem higher today.

I can chip in on this one just a tad:

I’ve once submitted a pull request against cpython, and some windows build step failed.
But I was not allowed to see the logs for that step. So what could I do?

On one hand, automation and restrictive gating saves developer time, but it’s tough on newcomers, it’s hard to understand what’s going on and what could be wrong. For example, I still have no clue what all those different bots do and how to use them.

I think there’s also a vicious cycle when it comes to new versions.
(Sorry if I’m just too dumb to have figured it out on my own, I’m not asking for help)

  • Python 3.9 was in beta or rc phase
  • I wanted to test my contribution using user code
  • I’ve tested on Mac, and wanted to test on Linux, ergo Docker
  • User code needed a dependency, dep uses CI flow
  • CI was set up for 3.8 but not 3.9, because there were no official (smth build chain)
  • Pypa response was that ABI was not yet stable, so they would not provide (smth)

Having been essentially told “no” by multiple individuals with authority, I gave up on the idea.

Dima, thanks for telling about that experience!

Would you mind mentioning the specific PR so that I could get a better understanding?

Also, since you’ve posted this publicly, would it be okay if I added this to the contributor stories repo?

Sure go ahead and add it, the pull request was https://github.com/python/cpython/pull/19402
Today, if I try to view the failed test run, I get The logs for this run have expired and are no longer available. which I guess is another problem considering how long the entire process of pr/review often takes!

1 Like

What happened to the rest of my post???

I had a long and detailed explanation following the “TL;DR” and it’s
just disappeared. I wondered why Tal asked for more detail. I thought to
myself “How much more detail do you want?!?” but now I understand.

Grrrr. I shall try resending it.

Discuss seems to mangle emails something shocking. Deleting quoted text,
inconsistently mangling linebreaks, and now deleting almost my entire
post.

A thought comes to mind: I set off the longer discussion from the TL;DR
with a row of hyphens. Everything below the hyphens was deleted. Time
for an experiment: the next non-blank line will be five hyphens,
followed by a line of text. Let’s see if it is deleted.

Hi Tal,

I wondered why you asked for more detail. Here’s the detail which I
initially sent, but Discuss ate. (This time I won’t offset it with a row
of hyphens.)

When I first began contributing to CPython, the process was very
simple, simple enough that somebody with no professional programming
experience could get started:

  • I already had Python installed, including the .py source files.

  • I made a backup copy of the .py file, and edited the original.

  • If I messed up, I could revert to the backup.

  • Once I was happy with my edits, and the Python tests still passed, I
    looked up how to use diff from the command line to create a patch
    file, and uploaded that to b.p.o.

The hardest part was remembering which order the files should go for
diff: is it diff original changed or diff changed original?

Now of course I appreciate that this was not so simple for the core
devs. They have to review the diff, apply it, confirm that the tests
still pass, etc. But as a contributor, the process was about as easy as
it is possible to get. The barrier to entry was close to zero for anyone
on a Linux system, as I was.

(I guess it may have been a little higher for Windows users, if they did
not have diff available.)

Since then, development moved to hg, then to git. Each change has lead
to a significant increase in complexity, something which full-time
programmers may not even realise since they may be so familiar with the
process that they don’t have to think twice about it.

I am not a professional programmer. I have, occasionally, been paid to
program, but not for some years now and even then only as a very small
part of my duties. Shifting from “just submit a patch” to “use hg” was a
big jump in complexity for me, but I could understand the model and get
it to work.

I wasn’t an expert, but I was able to push through changes to the master
repo without seriously breaking anything, and after PEP 450 (statistics)
was accepted, I was given core dev permissions.

Just as I was getting comfortable with hg, two major things happened:

  • development moved to git;
  • and I got ill with a serious auto-immune disease.

Between the two, I lost all momentum. To make things harder for me,
Github stopped supporting my OS and browser and due to financial
difficulties I wasn’t able to upgrade my system until recently, so for
three or four years I was effectively incapable of contributing even if
I had the time and inclination.

Because of my long absence from making active contributions, I have
forgotten everything I knew about the process and have to relearn from
(not quite from) scratch.

I understand that CPython is now big enough and complex enough that we
cannot realistically go back to “just upload a diff”.

Even if I could upload diffs to b.p.o. for someone else to deal with, I
don’t want to be That Guy who won’t follow the standard process. I want
to contribute, and I want to pull my weight when I do so, not make more
work for others.

I expect that most contributors to CPython are professional or full-time
programmers who know git well enough that there are no significant
barriers for them. But for the rest of us, the amount of stuff you have
to do before you can contribute your first line of Python code seems to
be a lot bigger now than when I started. I’m not sure this is an
accurate of complete list:

  • install Python
  • install git
  • create a github account
  • create a b.p.o. account
  • set up a ssh key
  • configure github to accept it
  • fork the CPython repo
  • download your fork to your PC
  • fork your fork
  • make your changes
  • ensure the tests still pass
  • write a What’s New entry
  • update the docs
  • push your changes to your local repo
  • push them from your local repo to github
  • make a PR
  • make a b.p.o. issue
  • link the PR and the issue
  • sign a contributor agreement
  • wait for review
  • get the changes accepted

Have I missed anything?

Github is not as user-friendly for beginners as some git experts seem to
think. For example, I have a fork of the CPython repo dating back to the
initial change-over from hg. Last week I spent an hour trying to find
some way to update my fork to be up to date with the current version
before giving up. I’m sure that I will solve the problem, it’s a matter
of getting sufficiently motivated to put aside the time and energy
(another hour? two? five minutes? no way of knowing). But it’s another
barrier to getting productive.

Thank you for reading.

You may be aware of it already, but I periodically find myself referring to the devguide’s git boot camp, where it covers this case:

Scenario:

You forked the CPython repository some time ago.
Time passes.
There have been new commits made in the upstream CPython repository.
Your forked CPython repository is no longer up to date.
You now want to update your forked CPython repository to be the same as the upstream CPython repository.

Solution:

git checkout master
git pull upstream master
git push origin master

As someone who only really learned git (beyond non-trivial purposes) within the last year or so, the devguide’s Git bootcamp is fantastic for some more involved actions that I always seem to forget. I’ve actually referred to it a few times outside of CPython development, and have guided others towards it for learning how to use git when contributing to open source projects.

2 Likes

Since I fixed a few documentation issues, one pain point I have has always been not being able to add the “skip issue” and “skip news” labels myself. Most documentation fixes doesn’t require them, but lacking these labels cause GitHub CI to fail.

I really hope Python could give contributors the permission to add these labels themselves, at least for people who have contributed before. We should trust people more.

To help us understand, could you elaborate what specifically made this a pain point? Did you simply not like have the CI show a red X and failure status?

I ask because when it’s just the missing issue and news on doc fixes, those will always be ignored by core devs reviewing the issue, and the required labels added before merging. So those “failures” are not actually meaningful for doc-only changes.

There are several drawbacks.

  1. When picking a PR to review, reviewers are not able to differentiate between real and false failures like mentioned above, without clicking into a PR and see what’s it about. If there’s no false positive, then reviewers can just pick the ones that have passed all CI checks.
  2. The first problem is made worse since non-core contributors are not able to add the “doc-only” label
  3. Failed CI check frustrates contributors, especially considering doc fixes are more likely to be contributed by new contributors.
  4. Needing to add the “skip-news” label is an extra overhead for reviewers.

Meanwhile, I don’t see why giving contributors the permission to add these two specific labels is a bad idea. First we should believe that most people will do the right thing, and second, the PR is gonna be reviewed anyway, so even if there’s an inappropriate label (which should be rare), it can be corrected by the reviewers.

Hope that explains my reasoning.

Is it possible to allow adding just those specific labels for all contributors? AFAIK, GitHub doesn’t provide that level of granularity when it comes to adding labels to issues and PRs [1], from my understanding it’s pretty much all or nothing; e.g. Triagers can add all labels.

It’s not an issue though because auto-merge requires core approval and is not misused (the GitHub Python Triage team is also pretty limited in size, with only 5 non-core dev members at the moment). But I don’t think we could reasonably allow any contributor to add all labels, as it would seem to require giving “Triage” access to all contributors. In addition to some labels being problematic for open access, there’s other permissions like “Close, reopen, and assign all issues and pull requests” which could be disruptive if anyone could close PRs (besides self-authored, of course).


[1] - See “Apply labels” under Repository roles for an organization - GitHub Docs.

You’re right about labels, I do wish GitHub allows a fine-grained control on labels, or let people suggest which label to apply.

I find that it is possible to have labels with issue template

Might be something to consider in the future if we migrate issues to GitHub.

1 Like