PEP 639, Round 2: Improving license clarity with better package metadata

PEP 639, describing the proposed new License-Expression and License-File metadata fields, was recently thoroughly rewritten to implement the major additions and changes agreed upon in the previous thread, add the corresponding PEP 621 project source metadata keys, and address a number of ancillary issues that were exposed during the process.

Furthermore, simple and advanced examples, user advice, migration guidance, a terminology section, and more was added, many other sections were expanded, the many alternative/rejected ideas over the course of its development were thoroughly documented, much out of date material and references were updated, it was reformatted to the modern metadata PEP structure, and considerable copyediting and cleanup was done.

Following primarily editorial review and revision on python/peps#2164, much helpful feedback, and several rounds of discussion, it was decided by all involved to merge the changes as proposed for now and continue the discussion of them further in a more appropriate venue, namely this Discourse thread.

Please see the Open Issues section of the PEP for the primary open issues, and please read the Rejected Ideas section for summaries of the many previous suggestions and past iterations of this PEP, many of which if not discussed at length previously, were items that I too initially thought was the most obvious approach and implemented in a draft of the PEP, and only later discovered the intractable issues with the same (the most notable example being using the existing license key for the License-Expression field).

Of course, while I’ve tried my best to fairly and thoroughly analyze and document them (and in some cases, like the license key, initially preferring and fully implementing them and then later backing everything out), I of course welcome your feedback on them, and would appreciate any ideas on how to address the concerns raised, flaws in my analysis or unrecognized benefits of such approaches. I would also be particularly grateful for input from tool maintainers on the practicalities of implementing the spec and how we can better support that.

Thanks again to everyone for your input and support thus far, and I look forward to continuing to work with you all to get this in a state we are all happy with. And of course, all credit for the original PEP is due to @pombredanne , without which none of this would be possible!

Thanks for doing this!

I’m a strong -1 on deprecating the license key in the pyproject.toml file. It is a much more convenient name. I think, instead of adding the license-expression key, we should use the gap that PEP 621 intentionally left.

The approach this PEP now takes, adding distinct license-expression and license-files keys and simply deprecating the whole license key, avoids all the issues identified above, and results in a much clearer and cleaner design overall

Yea, except I disagree that this is “simply” giving us a cleaner design. No, it’s leaking metadata format details to the user — something that was intentionally avoided in PEP 621 (see authors key, or optional dependencies table, or scripts table). We shouldn’t need the user to care/know what metadata field things are stored in, or even what form they’re stored in.

If the license expression was made the string value of the license key, as reserved by PEP 621, it would be slightly shorter for users to type and more obviously the preferred approach.

Agreed. :slight_smile:

However, it is far less obvious that it is a license expression at all, to authors and those viewing the files, and this lack of clarity, explicitness, ambiguity and potential for user confusion is exactly what this PEP seeks to avoid

I disagree. This key with usually have a value like MIT rather than an expression. And, even when it does have an expression (MIT OR Apache or whatever), it will be immediately clear that it is supposed to be a expression. And, when it’s not, the validation on the build/publish tooling will catch issues. And it absolutely does not need to be “in the face” that this is an SPDX expression. Heck, that would probably be a bit more intimidating. :slight_smile:

all to save a few characters over other approaches

I think you’re misjudging how important short and clear names are. :slight_smile:

By putting it as license-expression instead of license, you’re making the form we want people to adopt more clunky.

it is not possible to separately mark the [project] keys corresponding to the License and License-Expression metadata fields as dynamic.

Why is it not sufficient to have it explicitly note that if license is specified as a string in the pyproject.toml file, then License in METADATA can be dynamically generated?

Heck, why do we even want to generate a License field dynamically if an expression is generated? I’m inclined to suggest that having a License field and a License-Expression field should be an error when parsing metadata. And that tools should present warnings instead of automatically converting legacy declarations.

1 Like

Thanks for your detailed feedback!

I appreciate you taking the time to carefully consider the points I put forth and address them. Overall, while I still lean on the side of the current approach, I’m a lot less convinced than before it is objectively required, and more a matter of judgement, if we dispense with some of what I’d previously taken as necessary requirements as included in the previous version of the PEP and/or requested here (backfilling License, allowing legacy conversion, etc).

To be frank, some of it comes down to sunk cost fallacy of having spent hours and hours and hours of my volunteer time and effort designing and implementing the previous approach, exhaustively analyzing the implications, redesigning it with the current approach, reimplimenting that, writing up the justification and then defending it at length. Some of it comes down to personal pride, given the implication that despite seemingly overwhelming thought, research, analysis and effort, I was still wrong and wasted my time like an idiot, heh. And some of it is the real cost of having to spend many more hours carefully rereading, rethinking and rewriting all the many places that this will affect, at least without being convinced it is the best overall approach. But that’s just how it goes…after all, I already re-wrote this specific element three times trying to get it right, what’s one more, eh :smirk:

I agree that mirroring the core metadata field names and structure is far from vital, and as we’ve both mentioned (I on the PR and you here), PEP 621 sets a clear precedent departing from that. I do think that it is a good thing, not a bad thing, to maintain consistency between the two, if there aren’t more compelling reasons involved one way or another, but I think we both agree that there are (we just disagree about which those are :grinning_face_with_smiling_eyes: ).

In any case, what I was referencing here (and what I tried to make more clear with my rewrite) was not that the design was cleaner due to paralleling the core metadata spec, but rather due to cleanly separating the two different keys with two different sets of syntax, semantics, fields and purposes, for dynamic and relative to a confusingly identically named key in other existing config formats.

The much more critical issue here, which is as yet unaddressed, is that we would be using the license key to mean a totally different thing, syntactically and semantically, than it means it all the other existing config tool-specific config formats. Validation or not, this is just asking for author and reader confusion and frustration. The footgun may have an interlocking safety, but its still a footgun :laughing:

The other point in favor that I’d made before, which doesn’t seem to be in the current draft, was that it automatically made License and License-Expression mutually exclusive without tools actually having to check for it and the spec having to specify it.

I understand your point here, even if I don’t personally see it entirely the same way, and its certainly a matter of judgement. I actually thought more people would favor explicitness here and oppose the previously-implemented approach on those grounds, which one of the factors that eventually led me to shy away from it.

I do want to point out that a license expression is a fully inclusive superset of a single identifier; the latter is a perfectly valid example of the former. I’m much less concerned about invalid identifiers, since tooling following the spec will catch it, and more with making explicit to authors and readers what the field represents at a glance (and that they are allowed to use a full license expression, not just a single identifier) without having to carefully dig through the spec or succumb to the temptation to guess based on the content. But maybe this isn’t so big an issue in practice as I’d feared.

Short? Sure. Clear? I’d argue less so, since without reading the spec license could be anything—free text, an identifier, an expression, full license text, a license file path, etc—while license-expression makes it clear it is exactly that, an expression.

I do agree that’s a pretty good reason, one I cited originally. I personally still don’t think it outweighs the issues above and below, but I can understand why you feel strongly about it.

Actually, the PEP as currently written explicitly proposes allowing exactly that :grinning_face_with_smiling_eyes: , and includes a justification accordingly. I mention this because some previously still raised an issue with it, to make clear the implications, but this is not presently a blocking issue.

Because @brettcannon requested it, it seemed reasonable at the time, I included it in the original proposal and no one objected to it:smirk: At present, copying the verbatim value of license-expression to License is allowed for backward compatibility, but not required or explicitly recommended, and if done License and License-Expression MUST match exactly or its an error. I’m not opposed to disallowing it completely if needed and left it as an open issue, though as of now it doesn’t really affect using the license key for both the License and License-Expression so long as it isn’t required to be dynamic to do so.

The PEP as originally proposed and agreed had detailed guidance on converting legacy metadata to aid adoption, with relatively tight restrictions on when this was allowed and requiring warnings, etc., and I didn’t modify that too much aside from the various related changes, and some needed refinements and further tightening. This does modestly reduce the ease of potential adoption, but this wouldn’t necessarily need to be done during build (which dynamic wouldn’t affect), and realistically most projects using this probably aren’t using the license key in the PEP 621 metadata anyway as opposed to tool-specific formats (which are mostly unaffected by this).

1 Like

I’m very strongly in favour of using license in pyproject.toml, and frankly I don’t understand any of the arguments being made here.

For example, I’m not at all clear why the license data would ever need to be dynamic (beyond the technical point that it’s defined in something like setup.py rather than pyproject.toml, and will get copied from there into the metadata), for example, and I wasn’t able to navigate the PEP to quickly find a clear explanation of that. And I’m not clear why it matters anyway. Sorry if I missed it, but for me, the license should definitely be a (notionally, if not technically) static metadata entry.

That makes it a backward-compatibility issue, not an impossibility. And I’m not convinced it’s impossible to deal with. Certainly, the field might be an expression or legacy free text, but is “if it parses as an expression, it’s an expression, otherwise it’s free text” that bad? Or “it’s an expression if it’s metadata version 2.3, it’s free text in earlier versions”? Or even “expressions must be parenthesised, otherwise it’s free text”, although I don’t like that much as I expect things like “MIT” which work as both expression and free text are common.

And before anyone says “but it’s licensing, and getting it wrong could have legal implications” or whatever, my point here is that the people who are most likely to get it wrong are probably also the people least likely to care. And I care more about the project author’s opinion than the licensing specialists, personally. To extend your analogy, gun enthusiasts know enough to be safe, and people who just hang the footgun on the wall as a decoration won’t get hurt by it :wink:

I’m looking at it from a user perspective, and in that context this is clearly false. The license spec is the project license. That should be it. As a project author who doesn’t really care about licensing, I want one field to think about and that’s it.

There’s enough bad feeling about licensing debates where people are harrassed about their licensing choice (or their unwillingness to “care enough” about their license choice) that I want Python to actively support people taking a “bare minimum” attitude to licensing, even if that means that we have to do extra work to retain the “obvious” keyword across this change.

1 Like

And I appreciate that you took the time to put them forward clearly!

Hehe, this was definitely not what I was implying. I appreciate the amount of effort you’ve clearly put into this, and I’m sure a lot of us around here do. Plus, it’s not about right/wrong here but a choice of tradeoffs – and convincing each other what they should be. :wink:

I appreciate the frankness! :slight_smile:

Well… Here’s a sound-bitey response: I think we should prioritize matching user expectations, instead of trying to match the underlying metadata format.

This was a point that went through a bunch of discussion when PEP 621 was being authored + up for discussions. The final rationale for picking something like dependencies was to match user expectations from other ecosystems. Let’s apply that reasoning here as well?

Would users be familiar with license as an SPDX expression? From the survey of other language tooling in the PEP, it looks like there is certainly a lot of prior art for using license as SPDX expressions.

Are there any ecosystems that use license-expression as a name for the user to specify the license of their project? I can’t really find any.

If it helps make the case, I was one of the two people who proposed what the final structure for the license key in PEP 621 ended up being and this was one of the main reasons I didn’t want us to use bare text format for anything other then SPDX – because users have existing expectations for what that means. :slight_smile:

1 Like

I would perhaps like to see the requirement for build tools to validate the license expression lifted. This is a heavylift operation that more lightweight backends will be reluctant to support. I would personally pivot the PEP away from Core Metadata and leave how the license field is interpreted to package indexes. That would also resolve issues like, differences in SPDX parsing[1] and updates to the SPDX license catalogue that might not be propagated to backends “cleanly” (e.g. backends might have to cut a new release to bump the license catalogue meta-package’s version so that existing users will be upgraded). OTOH, relying on the package index as a means to verify that license expressions are valid might be undesirable, especially considering that PyPI doesn’t have an officially-supported metadata API. But it might just be too late to make drastic changes like I’m proposing and I fully appreciate the “sunk cost” of what is a very polished PEP :slight_smile:


  1. Lightweight backends might not treat this as a license expression at all, for instance, and might not support conjunctions. ↩︎

PyPI does have metadata validation which is what this PEP augments. I don’t see how the lack of a gimme-metadata-of-an-already-uploaded-artifact API is related to validation of the artifact when it’s being uploaded.

The PEP says:

I don’t see how this requires the build tool to perform validation – they should, but they’re also permitted to not.

As I understand it, this would also eliminate the primary benefit of this PEP – clear machine-parsable declaration of the package license.

How would this be different from the existing free-form License field in core metadata?

1 Like

It’s not directly relevant, but it’s tied to the point I’m making below.

For the license expression field’s validity to be ensured someone has to validate it. And if your package isn’t uploaded to a package index then it might or might not be valid depending on whether the backend performs this optional (in theory) validation step. This is a special (and I would think undesirable) condition in Core Metadata because no other field is allowed to maybe be valid that comes out of a backend. This is different from e.g. rejecting a package on upload for including a classifier which the package index does not support, because CM does not contain a list of supported classifiers.

Trivially, because it would be validated by the package index, which is something the package index can’t realistically do for a “grandfathered” License field.

As an aside, for Rust’s Cargo, the PEP says:

  • Rust Cargo [11] specifies the use of an SPDX license expression (v2.1) in the license field. It also supports an alternative expression syntax using slash-separated SPDX license identifiers, and there is also a license_file field. The crates.io package registry [25] requires that either license or license_file fields are set when uploading a package.

But this is incorrect as I understand it. Cargo’s documentation says:

The license field contains the name of the software license that the package is released under. The license-file field contains the path to a file containing the text of the license (relative to this Cargo.toml ).

crates.io interprets the license field as an SPDX 2.1 license expression.

My reading is that Cargo doesn’t care if this is a valid SPDX expression but crates.io does.

The slash separator is also deprecated.

I’m not sure how I could have made it much clearer, sorry. I describe in detail the multiple potential scenarios under which either the the project key corresponding to the License field or the License-Expression field would be required to be dynamic, but not the other, as well as the other problems and ambiguities this creates, right under the Rejected Idea in question, and also directly link the primary specification section that describes this, which has been present well before I revised the PEP.

For convenience, here’s the relevant except of the rejected idea that specifically discusses this:

Most importantly, it still means that per PEP 621, it is not possible to
separately mark the [project] keys corresponding to the License and
License-Expression metadata fields as dynamic. This, in turn, still
renders specifying metadata following that standard incompatible with
conversion of legacy metadata, as specified in this PEP’s
Converting legacy metadata_ section, as PEP 621 strictly prohibits the
license key from being both present (to define the existing value of
the License field, or the path to a license file, and thus able to be
converted), and specified as dynamic (which would allow tools to
use the generated value for the License-Expression field.

For the same reasons, this would make it impossible to back-fill the
License field from the License-Expression field as this PEP
currently allows (without making an exception from strict
dynamic behavior in this case), as again, marking license as dynamic
would mean it cannot be specified in the project table at all.

Furthermore, this would mean existing project source metadata specifying
license as dynamic would be ambiguous, as it would be impossible for
tools to statically determine if they are intended to conform to previous
metadata versions specifying License, or this version specifying
License-Expression. Tools would have no way of determining which field,
if either, might be filled in the resulting distribution’s core metadata.
By contrast, the present approach makes clear what the author intended,
allows tools to unambiguously determine which field(s) may be dynamically
inserted, and ensures backward compatibility such that current project
source metadata do not unknowingly specify both the old and the new field
as dynamic, and instead must do so explicitly per PEP 621’s intent.

All that said, per @pradyunsg and my previous discussion above, none of these are as hard blockers as I originally assumed; the first point still stands but is not as critical in practice as I initially framed it to be, the second point is resolved by language I included in the present version of the spec and the third point is still an issue in theory but not likely to matter as much in practice given the practical reality of current PEP 621 usage.

I agree, which I clarify above (though to note, the previous presumed hard blocker was the dynamic field handling, not the key name confusion); its just a point of likely user confusion when switching between all existing tools and PEP 621 keys, which use the same key with different syntax and semantics (and metadata field mapping).

Its not clear if you’re referring to the license key or the License field, which are two very different things for the purposes of this PEP and a critical distinction I take great pains to make and define clear and consistent terminology around (in fact, the very one that caused serious issues when interpreting PEP 621’s intent around the very point of most crucial interest here). However, either way, I think we might not be on the same page here. Per the present iteration of this PEP, as discussed at great length and agreed by all on the previous thread, the License field is not re-used for the new license expressions, so there is no backward compat impact there.

The potential issue is with the license PEP 621 key, specifically its interaction with the dynamic key and potential user confusion around its name being the same as the very different license key in other tool metadata formats, but that actual value itself is unambiguous to tooling because it was reserved for this purpose by PEP 621, as @pradyunsg makes a repeated point of. All the suggestions proposed here are well out of line with the consensus on the previous issue and what is implemented on the PEP, and are entirely orthogonal to the specific issues addressed.

Like gun enthusiasts who behave irresponsibly and ignore basic safety rules with where they point their firearm rather than treating it with the respect it deserved, as a potentially deadly weapon if not handled carefully, irresponsible package authors are if anything the potential perpetrators, not the victims here, while the latter are those downrange that the package author might not realize they are affecting. As they are the creators of their packages, and typically of most if not all of the code, the issue of licensing matters (legally and practically) least to them, whereas it does pose an issue for downrange (er, downstream) users, particularly those who are the most safe, careful and respectful about always ensuring they have permission before using, modifying and distributing others’ code.

They are welcome to use their gun as they please in their own private rural farm where it won’t hurt anyone, and teach their kids how to use it, but once they go out into the world, share their gun and use it with others, that’s where the problems for everyone arise if they don’t follow the basic rules before they shoot. And sometimes, if others contribute to their code, their very bullets ricochet back to hit them if they aren’t careful.

In any case, this is all a bit of an aside, as an aim of this PEP is to make it as simple, clear and obvious as possible for authors to add basic licensing metadata and be done with it, without dealing with different fields, ambiguities, custom classifiers, manually specifying license files, etc. And this PEP includes a simple step by step guide aimed directly at this very class of users, telling them the simplest possible approach of how to add a license and be done with it, and not have to ever worry again about it, in the least practical time.

Of course, as it also makes clear, authors are under no obligation to it at all, but if they do use it, this PEP ensures they understand and use it accurately and unambiguously. Otherwise, it is little or no better than using it at all anyway, and quite possible worse, and this PEP has ultimately failed in its most fundamental goal as established from the onset.

No, that is false :smirk: , because again, in the context of a TOML key specifying a project metadata field, “the project license” has no clear meaning, just like saying “pass the day of week to the function” has no clear meaning—do you mean the string day of week? In what form? The day number of week, as an int? A datetime object? An enum? A custom class? etc. In this case, as already mentioned before, this could mean a variety of different things (two of which are already included as options under the existing PEP 621 license field, with a third reserved). The goal of this PEP is to make adding license metadata as clear and obvious as possible, without users having to guess what they should put there.

This PEP goes to great lengths to support package authors who want to spend the minimum practical time specifying their license choice and then never needing to worry about it again (or be bothered by people asking them), by concisely describing, in plain, friendly language, step by step, exactly the minimum that such authors need to do in the most common, straightforward situations. Furthermore, it includes carefully tailored guidance for tools on how to minimize user mistakes and frustration, requires they provide helpful error messages for common cases without leaving users guessing, and includes detailed provisions for how tools can convert legacy to new license metadata automatically, and backfill legacy metadata from new metadata.

Again, if authors prefer not to specify a license, or actively refuse, this PEP makes very clear that it in no way requires them to do so at all, at any point, and explains in plain language what this means for them and their users. Of course, if users do choose to use the new gun in this PEP, it has a safety trigger that will not go off if pointed at someone, but they are by no means obligated to use it at all, and they can continue shooting away in their urban backyards with their current ones, if they so choose. This PEP, of course, cannot ethically encourage irresponsible behavior, and makes the consequences of their choices clear, but by no means does it take them away from them and seeks to make it as straightforward and painless as possible for them to make responsible ones, if they so desire.

Oh I know :wink: , I didn’t mean to imply that you had imply that, only that I, in my less entirely rationale moments, had thought that way. And yes, very true! Based on your feedback thus far, we’ve managed to get past the seeming blockers and boil things down to just those key tradeoffs, with which we don’t entirely agree but we understand, can see and respect the validity of the other’s point of view.

You do make a good point there, and one I did take note of previously. One meta-issue, I think, is that I put too much emphasis at the very beginning on consistency between the core metadata field names and the PEP 621 key names, and continued to bring it up again when it was mentioned, when in reality that’s mostly a red herring.

The real crux of the issue, I see it, is user expectation from other Python packaging tool formats, which essentially all use the lciense field to mean something quite different from SPDX expressions. It was good that it was reserved in PEP 621, no doubt, to avoid this ambiguity, but the real question is do we believe the potential confusion from other non-Python ecosystems using license for SPDX identifiers, expressions or something relatively similar, and the project table using license-expression (and reserving license) is greater than essentially all existing Python packaging tools using license to mean something that is usually quite far from a SPDX expression.

In the very long term, if those custom tooling formats go away and leave only PEP 621, then this problem eventually disappears and consistency with other ecosystems becomes unambiguously more important (a point brought up by @ofek on the previous thread). However, I’m not sure if optimizing for that hopefully not so distinct, but realistically far-off future is worth the obvious confusion now, where the vast majority of users encountering this key will be more familiar with other Python packaging tool formats than the other ecosystems that use SPDX.

This is something I’ve considered and brought up myself before, and a topic on which I’d certainly appreciate feedback from packaging tool authors on. The license-expressions full-featured, production-ready reference implementation by the original author of the this PEP, @pombredanne , is equipt to handle this for packaging tools as needed. However, as @pradyunsg aptly mentions, the specification does allow build tools to not validate license expressions, if they have sufficient reasons not to do so.

At least as phrased here, this would be a major pivot away from this PEP as it has been construed since the beginning. If we don’t define the field content at the core metadata level, I’m not sure there is a clear justification for adding a brand new field in the first place if we’re not going to specify its syntax or semantics as any different from the existing license field.

On this point, the PEP specifically accounts for this by mandating a particular SPDX expression syntax version and SPDX license list version, while allowing later versions to be used by tools so long as they maintain backwards compatibility with the indicated version, such that existing packages will not suddenly have invalid metadata, and packaging tools required to be updated, if a new incompatible SPDX version were to come out. This would require a new PEP and a new metadata version, which would be a rare occurrence. Given the list and the syntax has been fairly stable for some time and evolves slowly these days, with essentially all used licenses accounted for, the situations in which this would matter would be rather small to begin with. See this rejected issue for more details and links.

This would be done at upload time. Also, as this PEP specifies, publishing tools (twine) should also perform this validation, so there would still be a backstop for build tools that lack license expression validation. Hypothetically, we could defer the SHOULD (or MUST, even) to here, if requiring build tools to do so were a serious burden.

Well, correct me if I’m wrong, but is this not true of the description (long_description/readme field) that PyPI only validates on upload, too, and twine optionally can?

Neither does this PEP maintain a list of supported identifiers; that is done by the specified SPDX license list version.

(Picking a somewhat random, but relevant, part of your post to quote)

OK. There’s a lot of history on this discussion, going back over a very extended period. And the PEP itself is very long, compared to the average PEP. I haven’t been following the discussions at all, and I frankly don’t have the time to give the PEP itself the sort of detailed reading that it clearly needs. In addition, I’m very much a “licensing skeptic” in the sense that I don’t have much patience for debates over complex details of license interactions and similar, so my point of view is probably too aggressively “keep it simple”.

I think the realistic thing for me to do at this point is to acknowledge that I can’t keep up with the discussion here, and leave it to others to discuss and decide. As a result, I’d like someone else to volunteer to be PEP delegate for this PEP. Such a volunteer needs to be “accepted by the other PyPA core reviewers, the lead PyPI maintainer and the default PEP-Delegate for package distribution metadata PEPs” according to our process, but hopefully that won’t be an issue. If we don’t get a volunteer, I’m not sure what will happen as the process isn’t clear, but I assume the PEP will need to go direct to the SC.

I’m happy to stay on as PEP sponsor, and advise on process-related issues in that capacity.

2 Likes

Uh oh, @pf_moore please don’t go just yet, I want consensus on this to be over soon b/c it’s the only thing blocking Hatch 1rc0 :slightly_smiling_face:

It seems to me that the main issue here is the UX, particularly the desire for the license key to persist as the sole interface. So:

  1. a string value for license will map to the new License-Expression core metadata field
  2. a table value for license will:
    • deprecate the file and text keys
    • add the paths and globs keys which populate to the new License-File core metadata field

@CAM-Gerlach Are you comfortable making those changes? Quite literally only this (comparatively small) section would need to be modified https://www.python.org/dev/peps/pep-0639/#project-source-metadata. Everything else AFAICT that you and @pombredanne have written remains the same: License core metadata field is deprecated, etc.

Sorry. But having a different PEP delegate for this PEP shouldn’t make it harder to get it accepted. On the contrary, I hope it’ll make it easier as I won’t be repeatedly asking for clarifications and going over points that have already been debated and discussed.

Hey Paul, I regret saying that and I’m really sorry if that came off as rude or dismissive; it was absolutely not kind or thoughtful of me and I apologize. I’d stayed up well past 7 am in my timezone responding to messages on this, which was a poor decision on my part, and my mind and judgement wasn’t where it should have be when replying to the feedback you’d taken the time to offer. And I also hope I didn’t take the gun enthusiast analogy too far; it just seemed to offer a lot of promise as a metaphor for illustrating my point of view as well.

To note, as this PEP tries to make clear in the user scenarios section (added primarily in response to your concerns at the start of the last thread, actually), and noted before in the “How to teach this” section, the simple cases get simpler with this PEP (just drop in a license file, enter the license short ID and you’re done, no need to look up the right classifiers, figure out to put in the license field, manually enter the license file path, etc). On the other hand, the complex cases weren’t even possible to express before, but are quite common among larger popular packages, and is where the PEP offers naturally given their complexity and importance the PEP takes most of its the time to explain and specify them clearly.

Also, for what its worth, I didn’t get the time to reply to your thoughtful post on the PR yesterday but @pradyunsg offered his help reducing the PR’s verbosity, and furthermore I actually discovered, tested and confirmed a more aggressive strategy for shortening the length of the PEP, moving most of the less critical sections you mentioned last night and others (Appendix 2, Appendix 3, the user scenarios, maybe the advanced example, and perhaps the rejected ideas or all but a summary/most critical) out of the PEP entirely, to separate linked reST documents in the PEP subfolder, which is now possible thanks to the new build system which will become the default in PEP 676. This will cut the PEP’s total length by well over half, on top of the reductions @pradyunsg and I achieve elsewhere.

Its naturally sad to hear you’re considering stepping down, but I know it must have been a difficult decision and not one you’ve taken lightly, and I hope my own careless words and actions haven’t played a significant part in it. It would certainly be greatly preferable to have a PEP delegate vs the SC, since the former would presumably be someone with packaging expertise, can actively following and engage in the discussion, and who can give a level of feedback on what’s needed to reach an state at which the PEP can be approved, as opposed to just having to submit it blind to the SC, cross our fingers and hope we don’t need to start all over with it after the years of effort and so many people’s contributions (if I understand the process correctly). Isn’t @brettcannon a SC member and involved with the packaging community? What about @dstufft ? Any other suggestions?

Thanks, I continue to very appreciate your great support and advice there and elsewhere, in providing many of the major suggestions and recommendations that led to the PEP as it is today.

That sounds very neat!

Did you happen to see my question here? PEP 639, Round 2: Improving license clarity with better package metadata - #14 by ofek

I really think that’s the only critique being expressed by anyone i.e. that the cost/benefit analysis described in https://www.python.org/dev/peps/pep-0639/#source-metadata-license-key does not justify the rejection of the idea.

1 Like

Not at all. Your comments were perfectly fine. I’m really grateful that you picked up this PEP and breathed new life into it, and I’m seriously impressed at the amount of work you put into it. My response wasn’t based on any frustration with what you’d said, but simply from a realisation that the PEP needs more time than I currently have available, and I don’t want to do it a disservice.

Far from it. And to be clear, I’m not thinking of stepping down as PEP delegate in general, simply looking for someone else for this PEP.

In fact, I think it would be healthier for the community in general if we had more variety in PEP delegates - the defined process is that any PyPA core reviewer can volunteer to be a PEP delegate, and I’m simply the fallback if no-one volunteers. But in reality, we very rarely get anyone volunteering, and as a result, I’ve ended up covering most PEPs.

Like you, I’m not at all sure that letting the decision pass to the SC is ideal, for all the reasons you mention, and I’d probably still step in if we can’t get a volunteer. But I hope we do (I won’t offer any suggestions - to be honest I feel it needs to be a personal choice and I don’t want to put anyone under any pressure).

1 Like

Sorry, I was replying but had to step away for a bit to help my roommate (I removed my post saying this above)

I agree that it seems to be the general consensus, my own feelings aside, that the license PEP 621 key should be retained and used for the License-Expression field, and per the discussion with @pradyunsg , the concerns with that specific approach as suggested on the PEP are not as serious as initially thought, while the reasons for it are more compelling. While personally I still lean in favor of making it a separate key, it now ultimately comes down to a matter of preference, and the that approach is indeed workable with some compromises. Given that clearly seems to be the way things are going, to expedite things I’m fine with going ahead and propose reverting to that approach, and implement it once we sign off.

That said, the specific approach suggested here not actually the "license expression = string value of license" one being advocated, as I understand, by @pradyunsg , @pf_moore and others, and is not workable for all the much stronger reasons mentioned in the Add expressions and files subkeys to table section (which, since discussion had largely moved on from that approach, unlike the following one has not been updated as recently to reflect the more salient reasons uncovered), not the less serious ones that @pradyunsg and others responded to in the Define license expression as string value section.

On top of that, it has a much more fatal flaw: it makes it completely impossible to specify both license files and a license expression, as required by the larger and more complex projects (including the relatively moderate-complexity one cited in the example, setuptools), without adding yet another subkey to license and ditching the simple string value, which is what PEP 621 and people here are advocating for. This means license would have at least five separate subkeys, with even more complex mutual exclusivity—paths cannot be used with globs, neither paths nor globs can be used with file (and maybe not text either?),file and text cannot be used together per PEP 621, expression (or whatever we call it) can be used with one of paths or globs but not text (and not file either?) As you can see, this gets complex very fast, which just making a simple string value as envisioned in PEP 621, no more, and moving license-files to a separate key all avoids.

Ah, my sweet summer child, if only it were that simple :grinning_face_with_smiling_eyes: that approach would require undoing, at minimum, pretty much everything done in eb2e8740 (which changed the PEP from something very similar to your approach, to the current form), including a full rewrite of the whole PEP 621 section (which is 200 lines long). On top of that, a number of items added after this commit would need major rewrites, including the examples, the user scenarios, the backward compat section, parts of the how to teach this section, much of the conversion specification, and certain elements of the abstract, future PEPs, and a smattering of usages elsewhere, plus redoing many of the rejected ideas. To note, license-files is used 42 times through the PEP, and license-expression 28, while `license` (with backticks) is used a further 42, not to mention their subkeys and the other implications.

The approach I propose that is more in line with this discussion, by contrast, still requires significant changes but is substantially less disruptive, and simplies the PEP overall rather than adding another layer of complex interactions, which I will do so in a followup post to keep it short, focused and easy to follow and not get lost here.

1 Like

Given the discussion here, to expedite the process and align the PEP more closely with the general consensus and the expectations in PEP 621 as people are requesting, I propose essentially doing what @pradyung and others have urged:

  • Remove the separate license-expression key
  • Make the flat string value of the license key map to the License-Expression metadata value, as reserved by PEP 621, and deprecate the table subkeys
  • Update the Converting legacy metadata guidance accordingly, to reflect that legacy license.text metadata cannot be automatically converted during build if it is specified statically in [project] and can be warned instead
  • Keep license-files as it is
  • Simplify and update the rest of the PEP accordingly

In addition, to reduce the total length of the PEP by over half, I propose:

  • Moving the user scenarios and rejected ideas (over 50% of the current body text) to the appendix
  • Once PEP 676 is implemented, moving all appendices except the basic example (over 2/3rds the total length of the PEP) to separate supplementary file(s) in the PEP-639 directory, linked from the PEP; this would make the PEP itself far more focused and manageable while still preserving those resources for posterity for those who need them
  • Merging python/peps#2155, which will reduce the rendered length by another five full pages by eliminating the redundant 100-link references section (which I already converted to inline links)
  • Eliding the PyPA glossary portions cited in the Terminology section and replacing them with links, and trying to hopefully move at least some of those terms of general interest that aren’t there already to the main PyPA glossary instead.
  • Further reducing verbosity and excessive verbiage through the rest of the PEP with the kind assistance of @pradyunsg
4 Likes