Modernizing the Contributor License Agreement (CLA)

The CLA topic has come up in several recent discussions:

We have been using the current CLA since 2004, when Larry Rosen created it. Since then, the world has progressed and today, we have more or changed requirements compared to those days.

I would like to kick off a process to come up with an updated CLA, which implements our needs for protecting the Python IP, as well as improve the CLA user experience and address new requirements which are not yet covered by the CLA.

Here’s a collection of requirements. This is expected to be adapted throughout the discussion and extended as necessary.

Requirements:

  • R1. The CLA needs to enable the PSF to defend the IP rights in software/documentation distributed by the PSF, which includes the contribution, in court. [1]

  • R2. The person or company signing the CLA should keep the copyright in the contribution, but permit the PSF to use the contribution and relicense the contribution under an open source license. [2]

  • R3. The initial license, by which the contributor licenses the contribution to the PSF, needs to include a patent clause, which makes sure, that the PSF can distribute software incorporating the contribution, without limitations imposed by patents owned by the contributor. [3]

  • R4. The CLA needs to make a clear distinction between individual contributions and corporate ones. [4]

  • R5. The CLA has to include a provision allowing the PSF to store and maintain personal identifiable information (PII) for the purpose of tracking the IP rights of contributions to the PSF. [5]

  • R6. The CLA signing process has to be legally sound and allow identifying the contributor for the purpose of tracking IP rights. [6]

  • R7. The term contribution should cover any submission made by the contributor to the PSF project, requiring an explicit opt-out statement for submissions which do not fall under the CLA. [7]

  • R8: The CLA should be applicable to any PSF software/documentation, which the PSF may want to distribute or make publicly available. [8]

  • R9: The CLA should include wording stating that it covers all contributions submitted on or after the date of approval by the contributor and that it supersedes any previously signed PSF CLAs for those contributions. [9]

Once we have collected all needed requirements, we can then approach the PSF Legal Counsel to draft up a new CLA and the PSF board to have it approved.

Please comment away :slight_smile:

Resources:

Footnotes:

[1]: Since the PSF is a US non-profit, this mainly refers to US law. However, it would be beneficial to make this sound under international law as well, in order for the PSF to be able to defend IP rights in other parts of the world too.

[2]: The current CLA already covers this requirement.

[3]: The Apache License v2 (AL2) covers this requirement. Inclusion of 3rd party software under other licenses can still be decided upon by the PSF/Steering Council, bypassing the CLAs, including e.g. software which is MIT or BSD licensed (those licenses don’t include a patent license).

[4]: The IP rights of employee contributions are often owned by the company employing them. Accordingly, a CLA signed by such an employee would not be valid.

[5]: In order to track contributions, the PSF has to maintain records on these. Several jurisdictions how require explicit consent to such storage (e.g. the GDPR in the EU), so this new requirement should be addressed as well.

[6]: We are currently using a Github bot for CLA signing. In some circumstances, this does not permit the PSF to keep records which allow contacting the contributor or identifying the person by other means. We need to make sure that the PSF is willing to take the risk of not being able to contact contributors, or change the logic to only have Github users with valid email addresses signing the CLA form.

[7]: The current requirement to mark contributions is also rarely met in practice – this is usually only done for larger contributions of e.g. new modules and subsystems, if at all. This opt-out mechanism will simplify contributing to a PSF project. It’s the standard approach taken by the AL2. By using the AL2 as initial license we are probably inheriting this feature, but the CLA should make this explicit. The wording of having to add notices to all contributions can then be removed from the current CLA.

[8]: The current CLA already appears to cover this requirement, but it would be better to include documentation explicitly as possible contribution (it’s currently limited to software).

[9]: The current CLA does not explicitly mention the effective date or whether it applies to past or only future contributions.

Edit History:

  • Changed from inline footnotes to manual ones, so that footnotes stay visible.
  • Added additional comment to footnote [7].
  • Added new requirement R9 to clarify the effective date of the CLA and its applicability.
  • Added note referring to inclusion of 3rd party software under non-AL2 licenses.
  • Added CLA used by the CLA Bot to the resources.
3 Likes

Your list of requirements seems sound, but based on my experience on the other side of this equation (corporate and personal contributions), I think you’ll need to be satisfied with only partially meeting some of them. Meeting all of them 100% is possible, but the resulting process and document would be so onerous that contributions would suffer as a result.

For R2 and R4, please consider that a corporate CLA often cannot be signed by the person making the contributions (or if they do so, it would be unenforceable as they did not have permission to bind the company’s IP to the terms of the CLA), and so both the document and the process need to accommodate the CLA process/document and the contribution process/content being handled at separate times by separate people. From the corporate side, once the CLA has been signed by an authorized party, the CLA manager at the company (who will likely be a different person from the signer) will want to be able to add and remove authorized contributors without having to endure the signing process again.

This presentation is nearing six years old but I think it’s still relevant and accurate: FOSDEM 2017 - Make your Corporate CLA easy to use, please!

2 Likes

This severely limits the licenses usable for contributing to Python. It also means that third-party MIT- or BSD-licensed code cannot be reused (vendored) in contributions to Python, unless the contribution is made by the original author (assuming that code even has a single author).

I’m sure we already have cases of MIT- or BSD-licensed code incorporated in the Python source base, by the way, so this doesn’t seem retroactively actionable either.

What does an “opt-out statement” look like?

That’s a valid point.

Note that we are only discussing the contributor agreements, though. Inclusion of 3rd party software under other licenses can still be decided upon by the PSF/Steering Council, bypassing the CLAs.

Quoting from the Apache License v2:

For the purposes of this definition, “submitted” means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as “Not a Contribution.

Good point.

I wonder how we could make this process as simple as possible. Perhaps the CLA bot could check whether the Github user is registered under the org account of a signing party ?!

Adding extra complexity to manage possible contributors on the org accounts seems out of scope, if we want to keep things simple for both sides.

Yes, while I was at Bloomberg we helped to implement that in the CLA Assistant tool, which was successfully deployed by a number of large companies. The rule was that if the contribution branch was hosted in a repository owned by a GitHub org which was marked as “Corporate CLA signed” in the CLA Assistant database, it was accepted, regardless of the identity(ies) of the contributors involved. This completely eliminated the need to add and remove persons from the CLA database, as the org could manage authorization by granting (or removing) permission to write to branches in repositories they already control.

1 Like

This sounds like a workable solution for R2 and R4.

So for companies, the CLA process would require two steps:

  1. The company officials sign the CLA with the PSF. Their Github org account is registered with the CLA bot.

  2. The company employees contributing code automatically get the CLA signed flag set in PRs, when the bot sees that the org account is registered under the company CLA, so they don’t have to go through any approval steps.

That’s possible, although identifying ‘company employees’ is somewhat challenging; that’s why we chose to use the location of the source branch/repository instead.

1 Like

Sorry if this is a dumb question, but what happens to (a) people who’ve signed the existing CLA, and (b) contributions made under the existing CLA? Does this supersede? Is it going to run in parallel, with newer contributors signing the newer CLA?

1 Like

Ok, so I misunderstood your suggested process.

Binding to both repo and branch sounds a bit too strict, though. For CPython we use release branches and feature branches. If a company would just be able to register one branch in their repo, it would not be possible to contribute code to e.g. feature branches and the main branch with the same CLA registration. Was that intended with Bloomberg ?

Wouldn’t it be easier to simply register the company’s repo fork for the CLA and regard any PRs submitted via this fork to fall under the CLA ?

1 Like

Not a dumb question at all.

Existing CLAs would, of course, remain valid. However, the PSF or SC may ask for renewal of the CLAs under the new terms in certain cases, e.g. when a person’s email address is no longer working, a company CLA has the be renewed due to acquisitions or mergers.

Likewise, people may ask to sign the new CLA to replace the previous version, e.g. to register new email addresses or Github accounts.

Perhaps we should add a requirement saying that when signing the new CLA, this supersedes previously signed CLAs and will be used for any new contributions – without making those older signed CLAs invalid.

Another good point you raised is clearly stating to which contributions the CLA applies. The current CLA does not indicate whether it also applies to previously made contributions. I know that we did have a CLA form in the past (back in the days when we started using CLAs), which explicitly included wording to also cover past contributions.

It’s probably best to only cover any future contributions, starting with the effective date of the CLA.

How about this extra requirement:

  • R9: The CLA should include wording stating that it covers all contributions submitted on or after the date of approval by the contributor and that it supersedes any previously signed PSF CLAs for those contributions.
1 Like

Thanks for the clarifications. I’ll be watching the thread with interest.

I’ve added a note about this to the footnotes of R3.

I also added comments to the other points which were raised and added R9.

Sorry if I lead you in the wrong direction: there is no registration of repositories or branches. There is only registration of the organization itself.

The way it works is that when CLA Assistant reviews a PR, it obtains the URL of the branch that is the source of the PR. It then removes the branch name and repository name, and extracts the organization name (the ‘username’ part which indicates the owner of the repository). If that organization name is marked as “active CLA” in the CLA Assistant database, then the PR is considered approved under the CLA terms. The repository name and branch name are not part of the analysis.

This could also be used for non-corporate CLAs as well if desired, because it relies on the location of the contributed code and not on the name of the person who opened the PR. For individual PRs those will almost certainly be the same, of course, but it’s not required for them to be the same.

1 Like

Ok, so instead of looking up org membership, the bot simply uses the org name embedded into the PR source URL. That sounds like a workable and even more flexible solution.

@ambv: Would such a logic be possible with the current bot implementation ? (I’m not sure whether you based this off of the bot @kpfleming is referring to.)

I would like to register that the Jython project considers itself a PSF project (has done for most of its lifetime), since it gets some practical support and is based almost entirely on PSF IP. We ask contributors to sign the CLA and would like to use the same process (and CLAbot if possible) with the least additional friction possible. A process that only works in Python · GitHub is a step backwards.

This seems to assume that the contribution is made from an account owned by the employer. Or I may misunderstand still.

In a pattern I have seen, the employee creates a personal GitHub account but performs work covered by the employer’s claim. (It is paid work, or the employer is just that imperious.) This evades your automation, as I understand it, so there is still a risk needing vigilance on both sides.

1 Like

It does not require that the ‘username’ (the account submitting the PR) be owned by the employer. It does require that the branch location (source branch of the PR) be in a repository owned by the employer.

In my experience after putting this in place it worked quite well, as it centralized contributions for projects (multiple employees working on the same project, sometimes on the same branches, did so in a single fork repository), and it allowed the employer to easily control the list of GitHub accounts (usernames) permitted to submit PRs covered by the employer’s CLA (by controlling ‘write/push’ access to that fork repository).

In a situation where an employee also makes contributions to the same project on a personal basis, this model also works well because in that situation the PR source branch would be in the employee’s fork repository, not in the employer’s fork repository. In that case the CLA checker would check to ensure that the individual had signed the CLA.

2 Likes