Thinking about this some more, I have more serious concerns than the data volumes. (I’m still not happy with the “arbitrary text” nature of the details field, but I guess that ship sailed with the “yanked” field, so I’m willing to let it drop).
First of all, is per-file the right granularity for this data? In the JSON API, the data is at the release level. The more I think about this, the more it feels to me like a bad fit for the simple API.
The only justification for the addition of this data in the draft PEP is “This PEP adds data which were previously only available through the JSON API, in order to allow more clients which were previously Warehouse specific to support arbitrary standards-compliant indexes”. But that’s an entirely generic statement, and it’s in direct conflict with the FAQ from PEP 700 (which you defer to in this PEP) saying that
Proposed additions to the simple API will still be considered on their individual merits, and the requirement that the API should be simple and fast for the primary use case of locating files for a project will remain the overriding consideration.
What are the merits of adding this field specifically to the simple API? Are there consumers which currently use the simple API for most of their data, and fall back to the Warehouse-specific JSON API for vulnerability data? That was the justification for the fields added in PEP 700, and it doesn’t seem to apply here. Also, what indexes other than PyPI are maintaining vulnerability data, and have they expressed a need to expose that data in a standards-compliant way? If the data is only served by PyPI anyway, what’s the urgency for standardising it?
I was deliberately careful when writing PEP 700 to make it clear that there was not a license to add data to the simple API just because it existed in the JSON API. I feel that this proposal undermines that intent, if only by not arguing for the change on its own merits.
In general, I don’t see any advantage to simply moving data out of the JSON API unless there’s a real prospect of retiring that API. And that seems to be a long way off, as it’s going to need PEP 658 to be implemented in Warehouse. So I’d argue that consumers should just continue to use the JSON API until there’s sign of movement on that item of work.