The author/maintainer distinction problem (and PEP 621)

Giving this some thought, I think @bernatgabor actually has a good point here.

From what I’ve seen on PyPI, the most common use for these fields is to provide some sort of “contact” information (like a info@ address or mailing list address) because PyPI does not otherwise surface any contact information or provide a way to contact maintainers. Even the example (setuptools) does this, providing distutils-sig@python.org as an email address.

Furthermore, calling this “maintainers” or “authors” has often confused new users, since this doesn’t actually on have any effect on who can “maintain” or “author” the project on PyPI.

Perhaps this should just become contact instead?

I think that depends on the opinion of the PyPI maintainers and whether they want to update PyPI to reflect this. :wink:

Renaming it to contact totally fits my worldview of what the Author/Maintainers fields are traditionally used for, so I’m game for that more neutral wording while still populating Author in the core metadata.

I’m not sure about real word usage (how could I know—I’m a undergrad student with a bunch of projects nobody uses), but I think author has more to do with copyright contacts (e.g. if someone, including current maintainers, to ask for relicense) and maintainer is for support/maintenance/patching contacts.

1 Like

Giving this some more thought, I don’t think Author/Author-Email/Maintainer/Maintainer-Email map well to something like “Contact” (even if that is their primary use), and I think like @pganssle suggests, we probably need a new metadata version to remedy this instead.

1 Like

Perhaps this should just become contact instead?

My instinctive reaction is “let’s not try to add yet-another word to the vocabulary” here.

Taking a few minutes to think about it, I do think that “authors” is a MUCH better name than “contact”, since:

  • it corresponds directly to an existing field(s), by name.
  • users are already familiar with using this name, for indicating this type of information.
  • it is also used in other packaging ecosystems, i.e. consistency with other tooling in this space (this matters for the same reasons that the PEP says we use dependencies instead of a different word there)
  • it’s not adding another word to add to further confusion in metadata of “specifying relevant people” (which we all agree needs a cleanup).

I think that the direction we’re taking with this PEP, of keeping a single “authors” field for specifying people is a good one. Notably, this doesn’t lock us out of doing something else later, if we don’t want to go down the “single field, up for interpretation what being in this field means” path.

I’m pretty certain that we’d want to keep the “authors” list regardless of how we go though. Not doing so would be… odd at best. Nearly every Python Packaging tool and tooling in other ecosystems has this field (with this exact name or “author”). I think if we’re deviating from “what everyone in this space is doing”, we should have a decent well-rounded line of reasoning on how we’re different. I don’t see one mentioned here, for using “contact”/“maintainers”/something-else instead of “authors”, so if there’s one, please state it explicitly.

I think there is a lot of nuance/details to discuss here, if we start going into the weeds of “who” all should be specified in a project’s metadata and what that should look like - and that be a can of worms for a separate, dedicated PEP IMO.

The current choice in the PEP is compatible with basically all the directions we might want to take for this metadata declaration in thefuturr (reminder: basically all tooling uses authors) and is (IMO) good enough to move forward until someone finds the time+energy for that other PEP.

2 Likes

If folks really feel we should solve the “authors/maintainers” problem fully in this PEP, I strongly suggest stating so and explicitly requesting that the moderators to split out the discussion related to “specifying people in package metadata” into a separate, dedicated thread.

Otherwise, that discussion will take over this thread and we won’t be able to easily discuss anything else about the PEP.

With regards to authors / maintainers, I think that the major problem here is that the {Author,Maintainer}{,-Email} core metadata fields are not sufficiently expressive in their current form to handle what we’re trying to enable in PEP 621, but modifying the core metadata standards is an explicit non-goal of PEP 621.

As currently proposed, we’re changing the meaning of the Author field, effectively deprecating the Maintainer field, and I guess adding some semi-standardized way of specifying maintainer metadata. I think that repurposing an existing field for this purpose is going to be a problem in the long run, because you’ll get some packages where the Author and Maintainer fields mean one thing, and another where the Author and Maintainer fields means another thing, with nothing in the generated metadata to indicate what meaning they are using.

I suggest the following course of action:

  1. In this PEP, re-design the author / maintainer / author-email / maintainer-email specification to hew more closely to the meaning of the fields as they currently exist¹, not as we would design them if we were designing it today. Seek provisional acceptance of the PEP in the same way that PEP 517 and PEP 518 are not yet finalized.
  2. Write a new PEP to deprecate the {Author,Maintainer}{,-Email} fields in favor of something designed from the beginning to avoid the author/maintainer confusion (and possibly to give richer metadata about the authors).
  3. Update the PEP 621 standard afterwards to include the new field, then finalize it.

The benefit to doing it this way is that the metadata remains clean and consistent, but this PEP isn’t blocked on fixing “how to specify authorship”. The downside to this is that we’d be including a field in PEP 621 that we fully intend to deprecate, which is not off to a great start.

¹I’m thinking we may want to put the authorship stuff into its own separate table, so that it doesn’t pollute the top-level [project] namespace with a bunch of cruft.

1 Like

Would something like the following work?

contributors =  [
  {role = "Author", name = "Me"},
  {role = "Author", email = "@"},
  {role = "Maintainer", name = "Myself"},
  {name = "I"},
  {role = "Designer", name = "Them"},
  {role = "Patron", name = "Us"},
]

While two roles are well-known and already in use: Author and Maintainer, further roles could be defined separately in the future (akin to this). If role is not specified, the default could be set to Author. If role is neither of those two, then the item could be safely ignored until a new PEP decides what should happen with such contacts.

Beyond email, it could also be extended to contain maybe URL, physical addresses, etc.

I’m suggesting a deprecation in the core metadata spec, and a separate PEP (and thus a separate discussion, per @pradyunsg’s suggestion) for deciding how to specify project members.

A stated goal of this PEP is that it should only standardize things that already have a lower-level specification, and there’s nothing for “role” to map on to. The point of my post is that I do not think we can fix the problem with the core metadata spec by designing a clever mapping between TOML and core metadata (particularly since the current proposal implicitly repurposes one of the existing fields).

1 Like

I agree with prioritizing Authors. Credit of authorship is a very important part of open source.

In past metadata discussions we had talked about a nuanced list of dicts that would state exactly what kind of blame each listed author or maintainer has for the project. Winds up being too complicated.

We had also suggested that our use of the email format for Author/Maintainer-Email: could technically allow an RFC 2822 address-list of multiple Name <email@example.org>, Another <email2@example.ocm>

I agree that this is not going to get you the static metadata of your dreams, i.e. dependency trees from sdists instead of wheels.

1 Like

Yea, this works well IMO.

My suggestion: we add a maintainers field for this, that maps just like authors does to Author/Author-Email. This doesn’t lock us out of making either of these words into tables later, and we can drop one-or-the-other without much pain. :slight_smile:

2 Likes

So how would you want to change the PEP then? Add an equivalent maintainers field to mirror authors but keep the proposed semantics otherwise as-is?

While I value the sentiment I find very hard to draw the line when autorship starts and ends. In a world where people tend to move on to other projects, and people take ownership of projects when one does become author? Is it when they modified x percent of the code? Is it whoever created first release version? For example @pganssle is the maintainer of python-dateutil for like 5 years now and probably modified a lot of the code by now is author? Is someone still author who did not been around for 5 years, and much of the code he initially contributed is no longer in use?

What happens with this information? What tools currently read these fields (Author, Maintainer), what do they do with them?

They seem to be displayed on PyPI, for example python-dateutil:

There seems to be two kinds of maintainers (the second one relates to the PyPI users registered to the project I assume). Author is mentioned first (is it intentional or alphabetical order?).

What are the other current use cases? What tools read and use these fields?


This PEP in its current state seems to accept multiple authors, but the core metadata seems to be able to handle only one. Is that right?


In the mean time, I would like to reframe a bit my previous suggestion, not sure I had expressed my intent clearly. I will go off a bit in some “maybe in the future” at first, and then try to tie it back to the current PEP at the end of this message…

So for this PEP, as others have already expressed, no need to try and give a more concrete definition to these fields(Author, Maintainer). They are in the current core metadata specification so they need to be in PEP 621 one way or another.

For a potential future revision of core metadata I think I would advocate for more of a free style list of credits, in the same vein as the list of URLs. Each project would be free to choose labels (a role: Author, and Maintainer obviously which could be backwards-compatible, but also new ones Contributor, Designer, Patron), and a pair name / contact (email, URL, or physical address), for the people and organizations they want to credit.

I won’t go into how it would look like in the core metadata but in pyproject.toml, I propose it could look like the following:

credits = [
{role = "Author", name = "Me", contact="@"},
{role = "Author", contact = "https://localhost/"},
{role = "Maintainer", name = "Myself"},
{role = "Designer", name = "Them", contact = "That house in that town"},
{role = "Patron", name = "Us"},
{name = "I"},  # default role could be "Contributor"
]

A list of recommended roles (with semantic) could be curated by PyPA (PyPI) in the same vein as the trove classifiers or the work being done in warehouse to specify the labels for the project URLs.

Some here (@dustin, @brettcannon, maybe more) are in favor of framing this field as a list of contacts (not credits), I believe it could accommodate this as well (each project could chose what is more meaningful to them and their users).

I see that there might be some potential redundancy between this and the Project-URL fields though, that might need to be addressed.

From my point of view, if I were to want to contact someone about a project found on PyPI I believe I would look at the list of URLs first, and the list of authors, and maintainers second.


And to get back to the current PEP

Today, with the current core metadata specification, by ignoring everything but the first Author and Maintainer, parsing such a table would be equivalent to:

credits = [
{role = "Author", name = "Me", contact="@"},
{role = "Maintainer", name = "Myself"},
]

And could map to this in the current core metadata:

Author-email: "Me" <@>
Maintainer: Myself

So it is quite a simple proposition in the end and very close to the current proposition. There would be no deprecation of Maintainer, no need to change the current core metadata, but still would offer enough room to extend later if we want to (we don’t have to) according to some of the wishes expressed in this discussion.

PEP 621, could potentially mention that other roles can be added past the first Author and Maintainer, but advise them against doing so as there is nothing to back them up yet (not in the package core metadata, or on PyPI at least).

If you have contributed then you are an author. It is good that we continue to have maintainer. The end user just wants to know who to email and doesn’t need fine distinctions between maintainership or authorship.

1 Like

Yup. I think that sounds fair.

We should also remove the “This use of the field is a slight deviation” paragraph, and add a note to the Rejected Ideas section about this. :slight_smile:

Oh sure, I wasn’t trying to suggest that we couldn’t accommodate such a suggestion. My argument is such a distinction isn’t necessary in structured metadata. If you want to thank someone but not list them as a contact then say so in the README. I don’t think this requires any tools to read such data. (If you want to see who has contributed to a project then look at the commit history.)

1 Like

I have opened https://github.com/python/peps/pull/1485 to put maintainers back in. Usual caveat that I am waiting on co-author approval before I merge it.

I think this is a great point. The current metadata conflates “who to contact” and “who to give credit to”.

IMO it is confusing to end users that may be multiple “avenues” of contact, and that some of them (like Author-Email) might actually be invalid (e.g. in the case that the Author still wants/deserves credit, but is no longer involved in the project).

4 Likes

This change has been committed.