PEP 714: Rename dist-info-metadata in the Simple API

There’s more details on the pip and Warehouse issue trackers, but to summarize:

We’ve begun rolling out PEP 658 on PyPI, however we introduced a bug in the PEP691 JSON representation on PyPI where instead of the actual key of dist-info-metadata, PyPI is emitting it under data-dist-info-metadata. However, when we went to fix this, we noticed that pip breaks when dist-info-metadata is specified in the JSON output, because it assumes HTML style string only API (so it looks for the string “true”, or the link hash string to parse, rather than a real boolean or dict of hashes).

As best as I’ve been able to tell, this bug dates back to pip 22.3 at a minimum, so roughly 8 months.

This has put us into an uncomfortable position where we have a bug on PyPI that we can’t fix because it breaks pip, but the versions of pip it breaks are widely deployed and are already included in many places such as ensurepip, Linux distributions, etc.

Roughly speaking I think we have 3 real options [1] on moving forward.

  1. Delay fixing the bug in PyPI (and effectively delay PEP 658 support) until a fixed version of pip has been released, and released “long enough” that we feel comfortable breaking them. This would be a hard break where you could not even upgrade pip with pip, and would have to use something like get-pip.py to fix it.
  2. Basically do (1), but allow fixing it sooner by special casing pip in PyPI to not emit PEP 658 metadata so that a pip install --upgrade pip works (but only that command, if you upgrade anything else with pip it won’t work).
  3. Amend PEP691 so that we use a different key name for PEP 658 support, so that old versions of pip will ignore it, and new, fixed, versions of pip can be adjusted to use the new key name.

Of the three options, I think the third one is the only one that is really palatable. A hard break to a widely deployed version of pip, even with the ability to upgrade that (2) gives us, is basically never going to be something that isn’t going to cause a lot of pain and strife in our user base. That means we likely wouldn’t be able to fix the bug, and thus take advantage of PEP 658, for 5-10+ years.

The downsides of (3) is basically that it requires more churn for both clients and servers who have already implemented PEP691+PEP658. I suspect that the number of people who have implemented both are pretty low, at this time I’m only aware of pip, PyPI, and proxpi [2], so there’s not a lot of churn to be worried about.

We could also have (3) recommend clients support both keys if we’re really worried about it, since churn is hardest to deal with in clients since there’s a lot more people using clients than there are operating repositories.

I plan to write a PEP to propose (3), it likely will be within the next couple of days depending on my free time, and Paul has already graciously agreed to be the PEP-Delegate for it. Unless people feel strongly that we don’t need a PEP to fix it and should just amend PEP691 (and PEP 658).

Concretely I plan to propose that we:

  1. Rename dist-info-metadata in the PEP691 JSON output to core-metadata.
  2. Rename data-dist-info-metadata in the HTML output to data-core-metadata.
  3. Say that clients MAY support the legacy dist-info-metadata key as an alias for core-metadata in the JSON and HTML outputs (prefixed with data- for HTML).

I suspect this will be entirely uncontroversial and won’t generate much discussion, but I want to just throw something up now so people are aware, and to give people a chance to weigh in if they feel strongly about the solution here before I write the PEP.


  1. Some people have suggested using user-agent to return different results, however that doesn’t work at all for PyPI as that would greatly reduce our CDN caching effectiveness and likely make it impossible for us to keep up with the amount of requests that would hit our backend servers. ↩︎

  2. I think? ↩︎

10 Likes

+1 from me on the proposed approach.

On point (3) we should probably also say that clients MUST ignore the dist-info-metadata key if the core-metadata key is present (to avoid ambiguity in broken cases where the server serves both, but they have different values :slightly_frowning_face:)

Do we want to say anything about servers? Such as servers MUST NOT serve the old key if they are serving the new one (and hence are following the new spec)? That would make the previous point sort of unnecessary, but I think it’s still worth having.

1 Like

Have the pip developers ever discussed what “long enough” is in this situation?

And mousebender (depending on your definition of “implemented”), which would mean any of its dependents [1].

But I thought pip chokes on that appropriate key with valid values, so how would that work? Or would pip choose to not follow this recommendation?


  1. If Network Dependents · brettcannon/mousebender · GitHub is correct, then it’s essentially one project. ↩︎

I don’t think we’ve ever discussed what long enough is explicitly, because I don’t think we’ve ever had a hard break like this before that would affect almost all uses of pip+PyPI, the closest I can think of we’ve ever come to was PEP 470, which affected 1.5% of projects hosted on PyPI, or maybe PEP 440, which was a similarly low % of projects hosted on PyPI, but I don’t think we included the % in PEP 440’s text.

I think the answer from pip developers point of view would be “that’s up to the individual repositories to decide when they no longer support a particular version of pip”.

I think the answer from PyPI’s point of view would be “whenever the active users of the broken versions of pips fall below some threshold”, but I don’t think we have explicit policies because it would depend a lot on how bad the breakage is (in this case, really bad) and how bad not fixing it is (in this case… medium? it blocks a really important feature for optimizing resolution, but nobody in insecure or silently broken without it).

My offhand guess is we’d probably be look at timeline of years.

And mousebender (depending on your definition of “implemented”), which would mean any of its dependents .

Good to know, thanks!

But I thought pip chokes on that appropriate key with valid values, so how would that work? Or would pip choose to not follow this recommendation?

There’s nothing we can do to fix those old versions of pip, the idea is that new versions of pip would support both keys, so that if there are index servers out there that already correctly implemented PEP 658 + PEP 691, then the new version of pip would work against those older repositories.

I don’t feel strongly about supporting both keys though, it mostly feels pretty easy to add to a client since it’s just checking two keys instead of one and helps prevent some small amount of churn, but I suspect the real answer is it’s really fine either way.

I’d say we should go with (3) here, because that seems like the least problematic variant of this change.

proxpi implements PEP 658 as both client and repository. It would take me on the order of minutes to update, test and release proxpi using the new key, and then it’s a matter of users updating.


Would the updated key mean a new version of the HTML and JSON repository APIs? If so, the JSON API wouldbe v1.3: would I have to implement v1.2 (PEP 700) first?


Is consistency between the HTML and JSON APIs this important? As far as I know, pip has no issues with the metadata attribute on the HTML API, so there’s an option to not update it. I suppose it means special-casing that attribute on clients which combine the processing of the two formats (I think pip does this).

Would the updated key mean a new version of the HTML and JSON repository APIs? If so, the JSON API wouldbe v1.3: would I have to implement v1.2 (PEP 700) first?

I’m not revving the version because I don’t think it’s important in this case, and I would rather not have to make people implement the interim versions to fix this bug.

Is consistency between the HTML and JSON APIs this important? As far as I know, pip has no issues with the metadata attribute on the HTML API, so there’s an option to not update it. I suppose it means special-casing that attribute on clients which combine the processing of the two formats (I think pip does this).

I don’t think it’s majorly important, but it feels like one of those things that are going to forever be a weird footgun that clients and servers run into where there is just this one key that is completely different between JSON and HTML. I suspect the amount of effort is minimal to switch both keys and the amount of effort is minimal to only change the JSON key, so I’m just prioritizing long term simplicity.

1 Like

I’ve gone ahead and wrote the PEP real quick, which has already had a number assigned, so I present to you PEP 714 (full text below as well).

This basically just proposes what I outlined above:

  • Rename the fields to (data-)core-metadata.
  • Allow clients to read from either the new or old names, but say that they MUST prefer the new name if it exists.
  • Allow servers to emit only the new names OR both the new and old names, says that if they emit both they MUST be equal.
  • Recommends clients support both, recommends that servers only emit the new ones, but softly says they can emit the old names on HTML safely.

I think this represents the minimal amount of churn, while keeping the two keys in sync for long term maintainability. It allows both PyPI and pip to fix their respective bugs (and PyPI to do it by implementing this PEP directly) without having to treat it like an emergency. The PEP does call out that we don’t typically update specs because of implementation bugs, but justifies why I believe it makes sense in this case.

I suspect in PyPI we will implement this by just using the new names, we maybe will add the old names to the HTML representation, but I suspect that we won’t bother since I believe the majority of clients that support PEP 658 that talk to PyPI, prefer JSON anyways, but if someone felt strongly about it we could add the HTML form as well.


PEP 714
PEP: 714
Title: Rename dist-info-metadata in the Simple API
Author: Donald Stufft <donald@stufft.io>
PEP-Delegate: Paul Moore <p.f.moore@gmail.com>
Discussions-To: https://discuss.python.org/t/27471
Status: Draft
Type: Standards Track
Topic: Packaging
Content-Type: text/x-rst
Created: 06-Jun-2023
Post-History: `06-Jun-2023 <https://discuss.python.org/t/27471>`__


Abstract
========

This PEP renames the metadata provided by :pep:`658` in both HTML and JSON
formats of the Simple API and provides guidelines for both clients and servers
in how to handle the renaming.


Motivation
==========

:pep:`658` specified a mechanism to host the core metadata files from an
artifact available through the Simple API such that a client could fetch the
metadata and use it without having to download the entire artifact. Later
:pep:`691` was written to add the ability to use JSON rather than HTML on the
Simple API, which included support for the :pep:`658` metadata.

Unfortunately, PyPI did not support :pep:`658` until just
`recently <https://github.com/pypi/warehouse/pull/13649>`__, which released with
a `bug <https://github.com/pypi/warehouse/issues/13705>`__ where the
``dist-info-metadata`` key from :pep:`658` was incorrectly named in the JSON
representation, to be ``data-dist-info-metadata``. However, when
attempting to fix that bug, it was discovered that pip *also* had a
`bug <https://github.com/pypa/pip/issues/12042>`__, where any use of
``dist-info-metadata`` in the JSON representation would cause pip to hard fail
with an exception.

The bug in pip has existed since at least ``v22.3``, which means that it has
been released for approximately 8 months, long enough to have been pulled into
Python releases, downstream Linux releases, baked into containers, virtual
environments, etc.

This puts us in an awkward position of having a bug on PyPI that cannot be fixed
without breaking pip, due to a bug in pip, but that version of pip is old enough
to have been widely deployed. To make matters worse, a version of pip that is
broken in this way cannot install *anything* from PyPI once it fixes its bug,
including installing a new, fixed version of pip.


Rationale
=========

There are 3 main options for a path forward for fixing these bugs:

1. Do not change the spec, fix the bug in pip, wait some amount of time, then
   fix the bug in PyPI, breaking anyone using an unfixed pip such that they
   cannot even install a new pip from PyPI.
2. Do the same as (1), but special case PyPI so it does not emit the :pep:`658`
   metadata for pip, even if it is available. This allows people to upgrade pip
   if they're on a broken version, but nothing else.
3. Change the spec to avoid the key that pip can't handle currently, allowing
   PyPI to emit that key and a new version of pip to be released to take
   advantage of that key.

This PEP chooses (3), but goes a little further and also renames the key in the
HTML representation.

Typically we do not change specs because of bugs that only affect one particular
implementation, unless the spec itself is at fault, which isn't the case here:
the spec is fine and these are just genuine bugs in pip and PyPI.

However, we choose to do this for 4 reasons:

1. Bugs that affect pip and PyPI together represent an outsized amount of impact
   compared to any other client or repository combination.
2. The impact of being broken is that installs do not function, at all, rather
   than degrading gracefully in some way.
3. The feature that is being blocked by these bugs is of large importance to
   the ability to quickly and efficiently resolve dependencies from PyPI with
   pip, and having to delay it for a long period of time while we wait for the
   broken versions of pip to fall out of use would be of detriment to the entire
   ecosystem.
4. The downsides of changing the spec are fairly limited, given that we do not
   believe that support for this is widespread, so it affects only a limited
   number of projects.


Specification
=============

The keywords "**MUST**", "**MUST NOT**", "**REQUIRED**", "**SHALL**",
"**SHALL NOT**", "**SHOULD**", "**SHOULD NOT**", "**RECOMMENDED**", "**MAY**",
and "**OPTIONAL**"" in this document are to be interpreted as described in
:rfc:`RFC 2119 <2119>`.


Servers
-------

The :pep:`658` metadata, when used in the HTML representation of the Simple API,
**MUST** be emitted using the attribute name ``data-core-metadata``, with the
supported values remaining the same.

The :pep:`658` metadata, when used in the :pep:`691` JSON representation of the
Simple API, **MUST** be emitted using the key ``core-metadata``, with the
supported values remaining the same.

To support clients that used the previous key names, the HTML representation
**MAY** also be emitted using the ``data-dist-info-metadata``, and if it does
so it **MUST** match the value of ``data-core-metadata``.



Clients
-------

Clients consuming any of the HTML representations of the Simple API **MUST**
read the :pep:`658` metadata from the key ``data-core-metadata`` if it is
present. They **MAY** optionally use the legacy ``data-dist-info-metadata`` if
it is present but ``data-core-metadata`` is not.

Clients consuming the JSON represenation of the Simple API **MUST** read the
:pep:`658` metadata from the key ``core-metadata`` if it is present. They
**MAY** optionally use the legacy ``dist-info-metadata`` key if it is present
but ``core-metadata`` is not.


Backwards Compatibility
=======================

There is a minor compatibility break in this PEP, in that clients that currently
correctly handle the existing metadata keys will not automatically understand
the newer metadata keys, but they should degrade gracefully, and simply act
as if the :pep:`658` metadata does not exist.

Otherwise there should be no compatibility problems with this PEP.


Rejected Ideas
==============

Leave the spec unchanged, and cope with fixing in PyPI and/or pip
-----------------------------------------------------------------

We believe that the improvements brought by :pep:`658` are very important to
improving the performance of resolving dependencies from PyPI, and would like to
be able to deploy it as quickly as we can.

Unfortunately the nature of these bugs is that we cannot deploy them as is
without breaking widely deployed and used versions of pip. The breakages in
this case would be bad enough that affected users would not even be able to
directly upgrade their version of pip to fix it, but would have to manually
fetch pip another way first (e.g. ``get-pip.py``).

This is something that PyPI would be unwilling to do without some way to
mitigate those breakages for those users. Without some reasonable mitigation
strategy, we would have to wait until those versions of pip are no longer in use
on PyPI, which would likely be 5+ years from now.

There are a few possible mitigation strategies that we could use, but we've
rejected them as well.


Mitigation: Special Case pip
++++++++++++++++++++++++++++

The breakages are particularly bad in that it prevents users from even upgrading
pip to get an unbroken version of pip, so a command like
``pip install --upgrade pip`` would fail. We could mitigate this by having PyPI
special case pip itself, so that the JSON endpoint never returns the :pep:`658`
metadata and the above still works.

This PEP rejects this idea because while the simple command that only upgrades
pip would work, if the user included *anything* else in that command to upgrade
then the command would go back to failing, which we consider to be still too
large of a breakage.

Additionally, while this bug happens to be getting exposed right now with PyPI,
it is really a bug that would happen with any :pep:`691` repository that
correctly exposed the :pep:`658` metadata. This would mean that every repository
would have to carry this special case for pip.


Mitigation: Have the server use User-Agent Detection
++++++++++++++++++++++++++++++++++++++++++++++++++++

pip puts its version number into its ``User-Agent``, which means that the server
could detect the version number and serve different responses based on that
version number so that we don't serve the :pep:`658` metadata to versions of pip
that are broken.

This PEP rejects this idea because supporting ``User-Agent`` detection is too
difficult to implement in a reasonable way.

1. On PyPI we rely heavily on caching the Simple API in our CDN. If we varied
   the responses based on ``User-Agent``, then our CDN cache would have an
   explosion of cache keys for the same content, which would make it more likely
   that any particular request would not be cached and fall back to hitting
   our backend servers, which would have to scale much higher to support the
   load.
2. PyPI *could* support the ``User-Agent`` detection idea by mutating the
   ``Accept`` header of the request so that those versions appear to only
   accept the HTML version, allowing us to maintain the CDNs cache keys. This
   doesn't affect any downstream caches of PyPI though, including pip's HTTP
   cache which would possibly have JSON versions cached for those requests and
   we wouldn't emit a ``Vary``  on ``User-Agent`` for them to know that it isn't
   acceptable to share those caches, and adding a ``Vary: User-Agent`` for
   downstream caches would have the same problem as (1), but for downstream
   caches instead of our CDN cache.
3. The pip bug ultimately isn't PyPI specific, it affects any repository that
   implements :pep:`691` and :pep:`658` together. This would mean that
   workarounds that rely on implementation specific fixes have to be replicated
   for each repository that implements both, which may not be easy or possible
   in all cases (static mirrors may not be able to do this ``User-Agent``
   detection for instance).


Only change the JSON key
------------------------

The bug in pip only affects the JSON represenation of the Simple API, so we only
*need* to actually change the key in the JSON, and we could leave the existing
HTML keys alone.

This PEP rejects doing that because we believe that in the long term, having
the HTML and JSON key names diverge would make mistakes like this more likely
and make implementing and understanding the spec more confusing.

The main reason that we would want to not change the HTML keys is to not lose
:pep:`658` support in any HTML only clients or repositories that might already
support it. This PEP mitigates that breakage by allowing both clients and
servers to continue to support both keys, with a recommendation of when and
how to do that.


Recommendations
===============

The recommendations in this section, other than this notice itself, are
non-normative, and represent what the PEP authors believe to be the best default
implementation decisions for something implementing this PEP, but it does not
represent any sort of requirement to match these decisions.


Servers
-------

We recommend that servers *only* emit the newer keys, particularly for the JSON
representation of the Simple API since the bug itself only affected JSON.

Servers that wish to support :pep:`658` in clients that use HTML and have it
implemened, can safely emit both keys *only* in HTML.

Servers should not emit the old keys in JSON unless they know that no broken
versions of pip will be used to access their server.


Clients
-------

We recommend that clients support both keys, for both HTML and JSON, preferring
the newer key as this PEP requires. This will allow clients to support
repositories that already have correctly implemented :pep:`658` and :pep:`691`
but have not implemented this PEP.


Copyright
=========

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.

3 Likes

Updated the title of this thread as well to reflect the new focus.

And thanks to @EWDurbin, we now have a PR up that implements PEP 714 for Warehouse.

2 Likes

I’ll just consider this a 1.1 feature then (mousebender has versioned TypedDicts).

… or else what? :wink: Should clients not worry about checking and just prefer core-metadata over dist-info-metadata blindly, or are you expecting clients to raise an error if there’s a disagreement?

As in libraries like mousebender shouldn’t backfill dist-info-metadata with the information from core-metadata even though the semantics are identical? I don’t have an opinion either way, but I want to make sure I implement the right thing.

I’m not requiring clients to error (we can if people want?), but just saying that if the core-metadata field exists, then that is the authoritative source for this metadata. If you’d like to error I’m perfectly fine explicitly allowing that in the PEP, but I figured this was a better default?

As in libraries like mousebender shouldn’t backfill dist-info-metadata with the information from core-metadata even though the semantics are identical? I don’t have an opinion either way, but I want to make sure I implement the right thing.

I don’t care what clients name things once they’ve interpreted the API response, I don’t think it even matters if they keep both sources of metadata around for whatever reason [1]. The key idea is that whenever this metadata is interpreted, the core-metadata name should be considered authoritative if it exists.

So in your case, if you want to backfill dist-info-metadata with core-metadata, that’s perfectly fine, you’ve taken on the role of interpreting the metadata for your users, but what API you expose is outside of our concerns.


  1. I can imagine a use case where someone is writtinga generic API client for the Simple API and wants to provide their users with the power to implement PEP 714, but considers interpreting the metadata beyond de-serializing into some object to be the responsibility of the caller. ↩︎

1 Like

I definitely think it is. I just wanted to check since the server side of this has a responsibility in regards to this but the client didn’t have anything matching and I wanted to make sure that was on purpose.

1 Like

Poetry is also in the process of implementing PEP 658, but this change seems fine to me as we haven’t cut over yet.

3 Likes

proxpi v1.2a1 supports PEP 714, but with no real-world testing

1 Like

I plan to ask for a pronouncement on this PEP on Friday, June 16th. That will be ~10 days since posting, which I believe should be more than adequate for what I think is an entirely uncontroversial change that already has PRs up to implement it in at least PyPI and pip, and which has an alpha release containing it from proxpi.

If anyone has anything to say about this PEP, then I hope they do so, at least to ask for more time, before the 16th :slight_smile:

8 Likes

Hi there! A gentle reminder to ask for a pronouncement on this PEP. :slight_smile:

Whoops, ADHD did me in again.

So yea, @pf_moore I request pronouncement :slight_smile:

2 Likes

… and I hereby formally accept PEP 714. Congratulations, and thanks for putting in the work on this!

I will start work on getting PEP 714 support into pip now, hopefully targeting the 23.2 release (July). The code is ready to go, so this should be easily achievable.

6 Likes

Support PEP 714 · Issue #103 · brettcannon/mousebender · GitHub is tracking this PEP for mousebender.

1 Like