New and one-time contributors

aldwinaldwin · June 13, 2019, 10:29am

Making my first patch to a Python library locally, was very satisfying. After discovering the cpython git repository, it seemed that the library hadn’t been touched in 6 years. Feeling comfortable that the patch wouldn’t break other code, it seemed time to contribute this small stupid feature. A roller-coaster of feelings started.

Over the course of many days, I was reading the dev-guide, trying out anything that wouldn’t bother the python developers, changing the few lines of code constantly to be conform the PEPs, other bpo’s and other people’s code and even prepared a todo-list for the big moment of adding bpo and pr.

Reviewing and double checking everything over and over again just before hitting the submit buttons and the git push, and then in a rush of adrenaline going through the submit steps. Of course, the todo-list was missing a lot of steps that was overlooked in the huge dev-guide. But ok. With not too much hope, it was added to the 6987 bpo’s and 945 pr’s. Now patience. But to my surprise, there was fast response of Karthikeyan Singaravelan with very good instructions. Thank you. That gave hope again.

Now for weeks I’ve been following IRC, mailing lists, discuss, bugs and git. Thinking maybe I could contribute something more. But the list of bpo’s is so big, and the list of pr’s is growing also. It gives a feeling that small features, adjustments and documentation adjustments first get a triage, but then get forgotten. Giving a small poke felt wrong as I didn’t want to disturb. The fun is probably working on future features deep in the core, what I understand, and other features are much more important and bigger and needed.

Probably a big amount of the bpo/pr’s can get a closed status with ‘out of date’/‘duplicate’/‘won’t fix’ resolution. What if each week, each core developer finds 5 to just set to close status, that doesn’t need work or is out-dated or got irrelevant or the contributor didn’t sign the Contributor Agreement. 75 core devs x 5 closes is 375 a week. Within 3 months it would be much more breathable to go through the bpo/pr’s.

Still, this experience is great. A good source to bump into libraries and ideas to try out. Even tried out cpython and cyton for the first time and won’t be last time. Even thinking of adding some PR’s about the dev-guide issues I ran into. But as said, I don’t want to pollute the long list much more for now.

If I would have more knowledge and find time to help, but I need many more years following all these forums, mailing lists and read much more code to feel comfortable.

xtreak · June 13, 2019, 2:06pm

Hi @aldwinaldwin,

Thanks for taking time to contribute back. I hope some one gets to cmd.py PR you have raised bpo-37030: hide undocumented commands in cmd module by aldwinaldwin · Pull Request #13536 · python/cpython · GitHub

The dev document is available in GitHub at GitHub - python/devguide: The Python developer's guide in case you want to propose a change. There is also Core Workflow - Discussions on Python.org to discuss workflow related questions.

Since it’s all unpaid volunteer time it’s hard for everyone to close out 5 issues a week. The beta release got a lot of PRs merged bringing open PR count to below 1000. Also some of the issues don’t get a straightforward resolution since it depends on agreement during the discussion, module maintainer’s availability, language design etc. Any attribute that is added and documented as part of stable release has to be supported till EoL for the release which is 5 years. Adding and removing a public API requires a deprecation period of one stable release to be removed in the next cycle so any addition would remain in use for next 10 years for the two cycles and has to be supported too. So even if there is a small change and it’s on a part of standard library untouched for many years it would require more attention and some one has to volunteer to commit. This is not to imply modules don’t accept improvement but that rate of change/acceptance to a module unless there is a maintainer to it could be low.

aldwinaldwin · June 14, 2019, 7:12am

@xtreak
Again you pointed me to more interesting information (pep581), discussions (triage team) and workflows. The workflow is good, but so much to be aware of and old discussions to catch up with.

What I meant, is to change issues to a close status without any development. They could be always reopened again by the creators if they still want it. Situations like these:

‘open’ since 2008: https://bugs.python.org/issue2628
=> 4000 issues didn’t had any activity in last 15 months. around 5200 not in last 6 months. will they ever get attention again?

‘awaiting changes’: User posted a pr but didn’t come back for the changes like:
https://github.com/python/cpython/pull/1210 : awaiting changes since Oct 2017. (multiple pr’s of this user louisom it seems)
=> difficult to check how many of the ‘awaiting changes’ don’t have activity. It seems brettcannon went through many of them already.

So, suggestion:
=> if no activity for 3 months then send message like brettcannon does (“Try to help move older pull requests forward, …”) with a bot and set status ‘awaiting changes little more’.
=> if no activity for other 3 months then set status closed due to no activity with message they are always welcome to reopen.

Result? One time/new contributors would stay little more alert and get triggered faster to make the needed changes or restart their discussion if they are really interested in a fix or feature. Also bringing the bugs list to actually relevant 2000 recent issues would be nicer to go through to find ‘easy’ issues and improve the result of the ‘random issue’ function.

steven.daprano · June 15, 2019, 4:57am

‘open’ since 2008: Issue 2628: ftplib Persistent data connection - Python tracker

You know what would be helpful for an eleven year old patch like
that?

For somebody to check that the patch still applies to the current
version without breaking anything, that the tests are sufficient, write
a “What’s New” blurb, etc. If it doesn’t apply, to patch the patch or
write a new patch. You don’t have to have commit privileges to
(unofficially) review a patch or to bring it up to date.

Of course a core dev still has to give it a final check before
committing, but unprivileged users can help restart stalled tasks.

[…]

So, suggestion:
=> if no activity for 3 months then send message like brettcannon does (“Try to help move older pull requests forward, …”) with a bot and set status ‘awaiting changes little more’.
=> if no activity for other 3 months then set status closed due to no activity with message they are always welcome to reopen.

Result? One time/new contributors would stay little more alert and get
triggered faster to make the needed changes or restart their
discussion if they are really interested in a fix or feature.

Often the bottle neck is the core devs, not the new contributor. In my
opinion, closing tickets written by new contributors because the core
devs don’t have time to review them is going to discourage new
contributors even more badly than having a ticket stay dormant.

taleinat · August 2, 2019, 12:48pm

@steven.daprano, I think that your wording might be interpreted as a bit rude or condescending, though I’m sure you didn’t mean it that way. I suggest that we be extra-careful replying to contributors sharing their hardships in trying to get started, since it can be hard to share these stories and feelings openly, and we need to show that we welcome them without criticism.

I agree that doing so for the many outdated patches on bugs.python.org (a.k.a. “bpo”) would be a great help. We haven’t been telling this to most would-be contributors, though. As part of the process of improvement, we’ll collect many suggestions and decide on how to begin improving, and this could be part of that.

At this point, though, I’m purposefully just asking to hear stories from many people tried to contribute. This is because I already know that my perspective as a core dev is too different. Also, many people have very different experiences, and we could learn much more from a large number of stories than from just a few.

So, please, let’s try not to propose solutions quite yet; we’ll come to that stage soon.

I agree. Mass closing of issues is usually perceived badly for various reasons, and IMO we should avoid it. On the other hand, to my understanding, this isn’t what @aldwinaldwin proposed:

What if each week, each core developer finds 5 to just set to close status, that doesn’t need work or is out-dated or got irrelevant or the contributor didn’t sign the Contributor Agreement.

ISTM that going through old issues (regardless of their authors) and closing those that are truly no longer relevant would be a clear improvement.

aeros · August 8, 2019, 12:01am

As a more recent contributor, I’ve found this to be incredibly important. I don’t know if others have had the same experience, but as someone who has admired the Python development community long before my first PR, the words and tone of core developers are very impactful to me.

The vast majority of my interactions have been great, especially with you, Terry, Carol, Mariatta, and Brett (and others, but you all have stood out for me personally). Taking the time to respond to my PRs and feedback has meant a lot to me and provided me with a lot of motivation to further contribute.

I wouldn’t consider any of my interactions with core devs to have been negative. But, from my perspective at least, I have sometimes felt that not everyone is equally concerned about prioritizing positive interactions.

In the majority of cases, it’s not so much what is said, but what isn’t said. I have no issue at all with being rejected, but being ignored is quite difficult. I’ve had a few times where I’ve left feedback on PRs and issues, and it was closed or merged without being regarded. If there was something incorrect and I receive criticism (even a one sentence reply), I learn something from the experience. But if my submission (PR, feedback, issue, etc) is entirely disregarded, it essentially feels like I wasted my time.

Also, I’m sure this has been said many times, but a simple “thanks” to PR authors and those who provided feedback goes a long way. It may take additional time from the core devs to take the time to do so, but I can guarantee the returns are worthwhile. Especially to newer contributors, their first impression can make a massive difference in their motivation to further contribute. All of the times you’ve said it have certainly made a difference to me (:

Edit: I wanted to mention that I completely understand that in some cases, it’s not possible for the core devs to be able to reply to every single bit of feedback. But I definitely think that at least responding in some way, even it’s just a simple thanks, thumbs up, or brief constructive reponse makes a huge impact and should certainly be a priority when possible.

pitrou · August 8, 2019, 10:42am

Would you care to share an example? Otherwise it’s a bit difficult to judge concretely what you’re talking about (e.g. what kind of feedback it is).

pitrou · August 8, 2019, 10:44am

Hmm, but how can you know this self-selected sample of respondents will be unbiased? For all we know, perhaps the majority of contributors has a different experience than the one that will stand out from your study.

jdemeyer · August 8, 2019, 11:16am

My first impression with CPython development was not as positive as I hoped. I’ve been a long-time contributor to SageMath and to a lesser extent also Cython. So I’ve always been interested mostly in the C API and I had experience reading and understanding the CPython source code to debug obscure bugs.

At some point, a Cython upgrade broke something in SageMath. After some debugging, this turned out to be a CPython bug so I opened bpo-25750 and added a patch (this was late 2015, before github). It’s really an obvious bug once you look at the code and the fix was really simple, just a few lines of code.

Nothing much happened to that bug report until beginning 2017, when Victor Stinner left a few replies but then it died down again. In March 2018, I opened PR 6118 (my first PR for CPython) with the same patch that I posted before on bpo. Berker Peksag asked for a testcase (a very reasonable request) so I added one. After a few months of discussions, especially about the testcase, I removed the testcase again and created PR 9084 for just the testcase. Eventually everything got merged but it look almost 3 years for a really simple and obvious bugfix.

taleinat · August 8, 2019, 12:28pm

True, but I can’t possibly think of a way to get an unbiased sample.

I’ve only received about 10 such stories so far, and I don’t have a real hope of getting more than a few dozen stories. With such a small sample, there really is no way of avoiding bias.

But there is much we can do even with a small, biased set of stories. For example, it should be easy recognize themes that come up so often that they are obviously worth our attention. Another example would be to consider elements of stories that are alarming or worrisome even if they come up only once or twice.

taleinat · August 8, 2019, 12:32pm

I’ve heard of quite a few such stories, and suspect that they are incredibly common. It’s very useful to get this level of detail about a specific incident, clearly describing the timeline and the relevant discussions, people, bpo issues and GitHub PRs.

Many thanks for sharing, Jeroen!

aeros · August 8, 2019, 5:09pm

My more recent instances of this happening have been from PR review comments being disregarded (PR is merged or closed without a response to any of my comments, and never marked as resolved). If it would be helpful, I can PM someone specific instances with the details, but I wanted to avoid calling anyone out publicly. My positive experiences have definitely outweighed those cases, but if any of those instances happened to have been my first PR review, I would’ve been significantly less motivated to continue doing so.

uranusjr · August 8, 2019, 5:54pm

My first meaningful contribution (IIRC) was not as smooth as I’d like either. I hit a bug while working with msilib, which let me to a 10 year old open ticket (at the time). The issue already had a patch attached, core developers seemed to like it, but felt it should also address an (only loosely related IMO) memory issue. I took the patch and created a PR, and also addressed the memory issue. Nothing. Finally Steve Dower discovered the PR and merged it another two years later.

Of course my example is only another data point, and there’s no guarentee this is actually how the majority of contributors got. But I’d venture to guess that at least for lesser-used modules, it is usual for patches to require unporpotionally long waiting times, since they receive little attention in the first place, and generally require more back and forth since the code is much less viewed as wildly-used ones.

aldwinaldwin · August 12, 2019, 4:57am

Adding a positive note to this:

Taken the last 3000 commits on the master (just over a year of commits), there are 524 different authors, of which 321 have only 1 commit and 122 got 2-4 commits through. The 81 others are more seasoned and familiar/core developers probably with 5 commits or more. Or +600 out of 3000 commits were from some 440 small contributors, so they don’t get ignored at all.

Of those 440, many are probably new contributors, because going down the different branches, there is a big increase of contributors:

*branch: amount of authors => {distribution amount commits} >3000 commits cover period since then till now)

master: 524 => {‘1’: 321 , ‘2-4’: 122 , ‘5-10’: 38, ‘>10’: 43} >2018-07-24
3.8: 482 => {‘1’: 298, ‘2-4’: 109, ‘5-10’: 35, ‘>10’: 40} >2018-07-04
3.7: 202 => {‘1’: 126, ‘2-4’: 37, ‘5-10’: 19, ‘>10’: 20} >2017-10-04
3.6: 137 => {‘1’: 69, ‘2-4’: 25, ‘5-10’: 16, ‘>10’: 27} >2016-10-28
3.5: 93 => {‘1’: 30, ‘2-4’: 23, ‘5-10’: 8, ‘>10’: 32} >2016-01-10
2.7: 158 => {‘1’: 79, ‘2-4’: 36, ‘5-10’: 13, ‘>10’: 30} >2014-04-06

Side note for new contributors: mentioned on another post by @aeros were some interesting links to read ( written by @vstinner ) while you wait for a PR to get merged (or rejected):

pitrou · August 12, 2019, 9:53am

Indeed, one thing that was expected of Github was to ease contributing small changes or fixes. Also, the different workflow versus our previous Mercurial setup can easily explain the huge increase of number of contributors as measured by git stats (most of the time, when we were using Mercurial, patches were committed with the core developer’s identity, and the patch author was mentioned in the commit message).

Another question, though, is whether first-time contributors are retained or they simply stop contributing.

pitrou · August 12, 2019, 10:03am

I’ve been on both sides of such issues (as a contributor needing feedback but mostly as a core developer).

In my experience, the problem is that no core developer feels concerned enough about the issue to take the time to push it to completion. It can be compounded by the fact that the affected code doesn’t have an active maintainer (msilib was historically maintained by Martin von Löwis, a very well-respected and once prolific core developer who sadly has been missing from Python core development for several years now). Other concerns can mingle (most of us are volunteers and, with limited free time or varying motivation, we can tend to prioritize aggressively - I know it’s what I do nowadays).

Often, such lingering patches are finally pushed by a core developer that’s motivated and dedicated enough to drain the queue of issues, regardless of their topic. But that role is not always filled by someone.

vstinner · August 12, 2019, 11:01am

I don’t know well what are the good practices on Windows, but I’m not sure that MSI is still the preferred way to distribute software on Windows. I deprecated bdist_wininst in Python 3.8:
https://docs.python.org/dev/distutils/builtdist.html#creating-windows-installers

Maybe it’s time to deprecate the msilib module of the stdlib?
https://docs.python.org/dev/library/msilib.html

uranusjr · August 12, 2019, 8:13pm

msilib is indeed on PEP 594’s deprecation list.

aeros · August 15, 2019, 6:36am

I can’t speak for every first-time contributor on GitHub, but after I had my first PR merged into CPython, I’ve continued to contribute. My first PR was opened a couple of months ago in June. Since then, I’ve had several other merged PRs and have done a number of PR reviews.

Personally, I think that the platform can make it easier for first time contributors to get started, and GitHub’s workflow was quite easy to learn the basics of. However, I suspect that retaining contributors has far more to do with their personal motivations rather than the platform.

Some of it might be significantly influenced by how well their first PR went. If they didn’t receive a response from anyone until several months after it was opened, I suspect the chances of that author attempting to open more PRs in the future would be significantly diminished.

pitrou · August 15, 2019, 8:43am

And you’re also actively posting on mailing-lists. Which is all good

Agreed. My aim was to moderate the “positive note” from @aldwinaldwin. If we have many people making first-time contributions but never coming back, then perhaps it’s not doing the project a lot of good (especially as on-boarding those first-time contributors spends experienced contributors’ time).

You’re certainly right. Unfortunately we can’t really change that until we have more active core developers (which probably requires recruiting more core developers).