Increasing pip's/PyPI's metadata strictness (summit followup)

At the PyCon packaging minisummit, maintainers discussed:

Pip and PyPI already or will soon know cases where package metadata is incorrect; how firmly/strictly should metadata correctness be enforced?
There’s general agreement that strictness should be increased; the question is: how quickly and to what extent?

I turned those notes into packaging-problems issue #264 to provide a tracking issue covering the various TODOs necessary to plumb this through the parts of the toolchain. I’m sure I missed stuff; please add things, then I’ll turn bullets into checkboxes and we can make more concentrated progress.

(people mentioned: @pganssle @dstufft @dustin @EWDurbin @crwilcox @ncoghlan @pradyunsg)

If people are looking for opinions mine are:

  • As strict as possible
  • As soon as possible

:smile:

1 Like

Agree with @brettcannon - strict and soon.

Also, it would be helpful to have versioning_scheme, changelog_url in package metadata and pypi. PYPI should have clearer standards on what needs to be populated before listing.

A consistent framework to store and access this info, would create a market signal / expectation that all of this is necessary info…

@surfaceowl is changelog_url really necessary? You can already add arbitrary URLs to the side panel on PyPI as project_urls. What would this be used for?

@agronholm – we already have a number of optional fields on core metadata that are helpful for downstream developers - I think changelogs would be similarly useful.

The main interest is to reduce the friction and work needed to reliably understand the summary of material changes between releases of python packages. Today - everyone does this differently, if at all.

Not everyone uses or reliably adheres to semantic versioning. Most package owners don’t publish a changelog at all… while Python itself, and some package owners publish changelogs, some like [pip] and django call them release notes, some have their summaries generated by sphinx… while others point you to commit history, which creates a lot of unneeded work if there are no material changes. This problem is compounded when a dev has update a large number of packages in a project.

Standard metadata for changelogs would:

  1. make it easier to find breaking and major changes
  2. save time when only minor changes - giving devs an easy way to skip digging for info
  3. create a common pathway to find the info - also a timesaver
  4. signal changelogs are important - improving consistency in the python ecosystem.

Relying only on arbitrary URLs, there were be no market signal to devs that changelogs are valuable - and they would not be populated. That seems to be the case today, as many authors don’t add arbitrary URLs for changelogs, even though they can.

Using a standard name in the package description, but enabling devs to link to whatever they want (e.g. releases or changelogs) has the benefit of not forcing a particular style/naming choice.

1 Like

I think that there are two topics intertwined in your reply: the changelog URL and changelog metadata. Are you proposing standardized changelogs in addition to the changelog_url field in packaging metadata? If so, are there other packaging ecosystems doing the same? Which ones?

Also, my 2 cents on the topic: as strict and as soon as possible.