Https://github.com/python/ is now using a new CLA bot

Docs community is the only active group.

Whoever wants to drive this forward.
I don’t remember anyone agreeing to ask Van. I’d call “ask a lawyer” a next step rather than an assigned action item.
And yes, now that the CLA is easier to sign, it’s probably an unnecessary step .

This was sometimes done in the past for trivial changes that also skipped the issue and a NEWS entry, most commonly fixes for typos, grammar, outdated URLs, and such.

I don’t think we need this sort of loophole anymore as signing the CLA is fully automated now and takes less than a minute.

This is less of a technical issue and more a process one. Signing of CLAs is often not easy for employees of companies.

Regarding the change in how CLAs are processed:

This should definitely be signed off by the PSF board, since it touches upon the very core of what the PSF is meant for - to protect the IP in Python.

In particular, the contributor identification using just the Github account name (and the associated email address) may be too vague to fulfill the legal requirements.

2 Likes

FYI: I’ve pinged the board about this.

2 Likes

TL:DR; making it easy for contributors to sign CLAs dilutes the value of those CLAs significantly.

As someone who (until recently) handled CLAs for employees of an enterprise organization, I’ll echo the comments made by @malemburg. Employees of that organization do not have permission to sign a CLA of the form that the PSF uses themselves, it has to be signed by one of a small number of people in the company with that permission. This is not unusual for large organizations.

Using a GitHub-account-driven signing workflow makes this extremely difficult to handle, and also leads to contributors signing without realizing that they don’t have permission to do so… in the long term, this harms the PSF because they don’t actually receive the licenses to the contributions that they believe they have received.

In addition, automatically approving all email addresses in a GitHub user’s profile as authorized under the signed CLA is quite concerning to me, and could easily lead to a requirement that employees of such organizations must have separate GitHub accounts for their personal and work activities. Many people do not want to do this, for good reasons (and technically it is against the GitHub ToS although that’s clearly not enforced).

CLAs are extremely difficult to handle correctly, where ‘correct’ means “in a way which provides high confidence to the receiving organization that they have obtained the permissions they require.” In my experience as an organizational facilitator for such things for 9+ years, I’d bet that less than a handful of receiving organizations actually do it ‘correctly’, and those that do are seen as ‘difficult to work with’ as a result.

I have spoken on this topic a number of times and collaborated with many people in the OSS community to try to find better ways to handle these things, and would be happy to volunteer some time to assist the PSF in this if that is desired.

4 Likes

Thanks for jumping on this! We control how the CLA signing process looks like so I can definitely tweak it if I get instructed on what should change.

Fundamentally I don’t think it’s the ease of signing a CLA that dilutes its value, it’s rather the lack of information. As long as we have everything we need, I don’t see a difference between:

and:
Screen Shot 2022-04-20 at 15.27.46

The new CLA checking process is in fact more thorough than the previous one. The logic is as follows:

  • every pull request consists of commits made by humans or scripts identified by email addresses;
  • if any of those emails is unknown to us, we ask for the CLA to be signed for it;
  • to successfully click on the “Sign in with GitHub to agree” button, that GitHub user needs to have the missing email address(es) listed among their verified emails on GitHub.

GitHub trusts those emails as well. They’re verified so that they can send password reset links and other sensitive notifications to those email addresses. I think it’s reasonable to trust this flow.

On bigcorp employees

Now, are there any holes here? Yes, we assume the emails listed in commits are truthful but we can’t verify it unless we force everybody to use GPG-signed commits which is unrealistic. More importantly, I don’t think the CLA process – whether the previous one or the current one – is meant to be secure against malicious actors.

I mean, instead of carefully forging emails in commits to circumvent the CLA check, that corporate employee could create a new Gmail and open a new account on GitHub to anonymize themselves. Maybe some do that already, hard to say. In the old CLA process, the form will take any input you give it. Enough to create a new Gmail, a new BPO account, and sign the form as John Smith. So again, I don’t think we’re here to police malicious actors. We’re really not equipped for that.

I agree with you that employees of large corps can now easily click through an invalid CLA without thinking. But they could do so before just as well, only it took a real person to verify the form and update their profile. If we want to ensure corporate employees think twice before clicking the button, it should be easy for us to add an additional screen to click through that directs the person to the old-style form if they self-identify as an employee who doesn’t own the IP they produce. Maybe we want something like what TensorFlow’s CONTRIBUTING.md says:

Interestingly, Google’s CLA system is also based on emails and allows for fully automated signing.

IANAL

If anybody here lets me know of any tweaks to the new process that are required, I’m happy to implement them. But if you mention blockchains or smart contracts, I quit.

2 Likes

The only ones who can advise us on what is acceptable for CLAs to the PSF are the PSF board who would presumably seek advice from or defer the question to PSF legal council (Van or otherwise).

1 Like

What about NFTs? :joy:

Right, and surely, the PSF board and legal counsel must have reviewed and signed off on the newly revised process for it to have been officially implemented and acting on their behalf?

Yes, indeed, all of this is a risk-management process and the risks are identified and evaluated by the board and their legal counsel. My presence here is primarily just to ensure that some of the not-as-noticeable risks are noticed, and to try to ensure that the process put in place doesn’t put undue friction in front of corporate contributors.

Please consider that corporate contributions, when done properly, will almost always need some sort of ‘exception workflow’ where the person operating the GitHub account which owns the PRs cannot sign the CLA which covers the contributions that include their corporate email address. With the BPO workflow, that was done by getting a copy of the CLA and hand-signing it and then submitted a scanned copy of the signed document. It’s ugly and slow, but when the authorized signer is not the PR submitter (or any of the authors of content in the branch being submitted), some process must allow for it.

There are a number of issues that have been raised, either directly or tangentially, including:

Q1: Can we have an effective CLA based on a GH username rather than an email or a signature?
A1: Yes, we can. The applicable law (for the US, which is the relevant jurisdiction here) is that electronic signatures can be used for these purposes. An electronic signature is very broadly defined to include any “symbols or other data in digital form attached to an electronically transmitted document as verification of the sender’s intent to sign the document.” So a GH username can be used to authenticate a document if it is the intent of the person for their username to do so. However, we should update our CLA text to make the intent explicit.

Q2: Do we have to have the person’s legal name and address?
A2: No, the law does not require the person’s legal name and address. However, it would make it easier to track someone down in case we had an issue. Ultimately this is a risk management question, not a legal question. We may want to have a heightened verification process for anyone who is on the core team.

Now, the really tricky one (as noted by @kpfleming, among others):

Q3: Does using the GH username change the risk profile for Python?
A3: Yes, it does. It makes it more likely that an individual CLA (ICLA) will be used when what we really want is a corporate CLA (CCLA).

Q4: Why does this matter?
A4: This really gets to the core of why we have a CLA - it is mostly (but not entirely) there to protect Python against a situation where someone contributes something to Python that they did not own due to their employment status. Random developers - even high level ones - do not have the authority to out-license the organization’s IP. It must come from someone with authority. In practice that means someone from legal, someone VP or higher, or someone with an express delegation of authority from someone who has authority. The problem is that lawyers and VPs are usually not on GH and do not usually have GH usernames. Thus it makes it significantly more likely that someone who does this via GH is going to not be the right person.

Q5: What can we do?
A5: There is no really good solution, because lawyers. A couple possibilities come to mind, however, that we could probably enact:

  • We could only accept ICLAs via GH usernames, everything else requires an eSigned PDF.
  • We can talk to major Python backers - and periodically poll core devs relative to their employment status- and reach out to get broad CLAs from that targeted set of companies. That would solve the problem for many devs.
  • Google’s solution is that there is an email list that is controlled by someone with authority; anyone signed up to that email list is approved. We might be able to script something similar with email->repo integration.
  • We can disallow email blinding and scan for non-free email addresses. (This seems fraught and likely to cause trouble.)
7 Likes

TL;DR, right there :slight_smile:

1 Like

I can suggest one possible way to mitigate some of this: if the domain in the email address of the commits is a domain that has been verified on GitHub, there’s a very very strong chance that the GH user making the submission doesn’t have permission to contribute it solely under an ICLA.

There would certainly be false positives, but it would be a strong signal to the submitter that they might need to ensure that they understand the copyright relationship of their employment.

2 Likes

What does verified in GitHub mean?

I believe that Kevin is referring to this: Verifying or approving a domain for your enterprise - GitHub Docs

1 Like

It means the owner of the domain name has demonstrated ownership of it by putting special DNS records in place.

I’m not sure about this, since I have a number of GH-verified domains associated with different GitHub orgs I’m part of that we have GHP websites deployed to (Spyder-IDE.org, Star-Fleet.tours, Gerlach.CAM, UAH-AMS.club, rSpaceX.com, etc, at least some of which are used for emails (including my commit and GH emails) and none of which are associated with employment or employment contracts. I’d be concerned about false positives under this approach.

2 Likes

Is there a way to check if someone has signed the CLA?

Context: PEP 545 needs documentation translation coordinators to have a CLA signed, but they may not have pushed to the cpython repo.

Not at the moment. I will be adding that this week.

But the same CLA bot runs on many projects in the python/ organization and they share a database, IOW you only need to sign once.

2 Likes

Thanks for the fast reply! Ping me when I can test it: someone want to translate the doc to Ukrainian, and I’d like to check if he has signed the CLA before creating the repo for him in the python organization.

Oh while I’m at it: Is there a way for him to sign the CLA without opening a PR?

1 Like