Hello all!
I’m opening this as a follow-up to PSA: PyPI now supports project archival and Adding a mechanism to deprecate a published project.
Context
Python packaging has three conceptual sets of “lifecycle” states:
-
There are classifiers for development status, e.g.
Development Status :: 7 - Inactive
for an inactive project. These are defined in project metadata meaning they’re defined per-file, not per-release or per-project, and have no effect on resolution/upload semantics/etc. -
There is “yanking” as defined in PEP 592. Yanking is a soft-deletion mechanism that tells installers not to install a version/file unless that version/file has been precisely locked to.
-
PyPI itself has project statuses, which have both user- and admin-states. For example, PyPI administrators can “quarantine” a project, which has the effect of marking all project releases/files as yanked until further action. Separately, project maintainers can “archive” a project, which does not affect resolution in any way but disables new uploads to the project and signals (currently only in the Web UI) that the project is no longer active.
- In addition to these states, there are “facts” about a hosted project that are defined by how that project was uploaded or otherwise processed that can’t be encoded in the project metadata at upload time. For example, PyPI knows whether a given file was uploaded via Trusted Publishing, but the index APIs have no way to signal that state.
(1) and (2) are both currently represented in the index APIs (both HTML and JSON), but (3) isn’t. I think we should expose (3) too!
Proposal
The index APIs should have additional fields (or a single, composite field) that allows the index itself to express a project’s status markers.
From a review of PEP 503 and PEP 691 and their living PyPA counterparts, I believe that the “meta” component of the index responses is a good candidate for storing this information. From the living spec:
- All JSON responses will have a
meta
key, which contains information related to the response itself, rather than the content of the response.
Ref: Simple Repository API - JSON Serialization
My interpretation of that is that any metadata the index knows about a project is “information related to the response itself,” rather than “content of the response.”
Furthermore, because project statuses and similar are at the project level rather than the release/file levels, it would be confusing and duplicative to express them at the latter levels (which is pretty much all the Index APIs specify).
Concretely, this is roughly what I envision, in both JSON and HTML index forms:
JSON:
{
"meta": {
"api-version": "1.3",
"project-markers": {
"project-status": "archived",
"x-trusted-publisher": true,
}
},
"name": "holygrail",
"files": []
}
HTML:
<head>
<meta name="pypi:repository-version" content="1.3">
<meta name="pypi:project-markers:project-status" content="archived">
<meta name="pypi:project-markets:x-pypi-trusted-publisher" content="true">
<title>Links for holygrail</title>
</head>
Or in prose:
- Both index API formats gain a new
meta
namespace,project-markers
(please suggest a better name!) project-markers
is a key-value mapping of project status identifiers to values. An empty mapping or missing mapping has no semantics.- There are two kinds of project status markers:
-
Markers that begin with
x-
are particular to the index that serves them. Their semantics are defined by that index, and mirrors SHOULD NOT copy those markers unless sensible in that mirror’s context.I’ve given
x-trusted-publisher
as an example: an index (like PyPI) may wish to set that to indicate that a project has one or more Trusted Publishers registered to it, and mirrors may or may not wish to re-mirror that state (maybe they do for policy reasons, or maybe they don’t to avoid implying that the mirror itself supported Trusted Publishing). -
All other markers have well-defined meanings and values that will be both specified in the PEP and kept updating in the subsequent living PyPA spec. To start I propose only a single marker:
project-status
, which will start with only two possible values:archived | quarantined
, corresponding to the current lifecycle states known to PyPI.
-
Implications
Project status markers have no direct downstream implications for installing clients: unlike yanking, the presence of a marker itself does not affect resolution.
The goal with placing these markers in the Index APIs is to allow installing clients to eventually (if they so choose) support user control/visibility over project states. For example, a user may want their installation step halted entirely if a project becomes quarantined, or may want a warning report containing the list of archived/inactive projects that they depend on.
Over time, another implication of this feature is that the Development Status
classifier namespace becomes less useful. This is already somewhat true because of its inclusion in the metadata (meaning that projects have to do a new release to change it, and that it’s tied to the release/file cycle rather than top-level project state). However, one potential outcome is that these classifiers could be deprecated and removed over time.
Alternatives considered
An alternative to exposing these states in the Index APIs is to expose them in non-standard APIs instead, e.g. PyPI’s pre-existing JSON APIs. This would allow adoption by installers and other index clients that don’t use the standardized APIs, but would hamper adoption by those that choose to stick with the standard ones (and IMO we should be encouraging standard use as much as possible!).
Open questions
- Does the layout proposed above make sense? Is it too verbose?
- Is this proposal too open-ended? In particular, are
x-
markers a bad idea?
CC @miketheman @dustin @sethmlarson as parties who I know/suspect will be interested in particular