Pre-PEP: What would it look like to deprecate PEP 503?

This is something I’ve been thinking about for a while, but has also come up more recently with shifts in how people have been approaching “supply chain” security in an OSS context (with cooldowns, malware advisories, quarantines, etc.).

Questions

  • What would it look like to deprecate PEP 503? Can it even be done?
  • What would the consequences of doing so be?
  • If we were to deprecate and then formally remove PEP 503 (on PyPI), what timeline could we possibly do that on?

(To be clear, these are extremely open questions given how central PEP 503 is at the moment. But I consider it a healthy practice to allow ourselves to periodically rethink/reevaluate foundational assumptions in Python packaging, even if the conclusion is “absolutely no” :slightly_smiling_face:)

Context / status quo

The Python packaging ecosystem has two parallel index representations: HTML via PEP 503, and JSON via PEP 691. These two representations are more or less maintained in tandem in packaging PEPs, but PEP 503 is the de facto baseline index standard because it formalizes the pre-existing (pre-standardization) index layout.

This has two downstream effects:

  • Because the HTML representation predates standardization, a ton (the majority?) of third-party Python packaging indices only use it. In other words, they don’t offer a PEP 691 index at all, and only offer something vaguely resembling the bare minimum of a PEP 503 index.
  • Packaging PEPs that modify the index in some way (like PEP 740, PEP 792, etc.) need to jump through hoops to make their changes fit into the right “shape” for compatibility with the HTML index, versus the more straightforward process of updating the JSON representation (by adding new keys, etc.). Some packaging PEPs (like PEP 700) bypass making changes to the HTML index entirely, under the assumption that consumers of the HTML index don’t need certain information facets. This is done without formally deprecating the HTML index, but effectively bifurcates the two in functionality.

Consequences

The above has negative downstream consequences for the ecosystem:

  • Installers (like pip and uv) are increasingly moving to enable dependency cooldowns, which require the index to provide upload-time metadata. This works cromulently with PEP 700’s amendments to the JSON index, but no similar feature exists for the HTML index.
    • In principle, the HTML index could also be modified to shoehorn upload-time into the representation. However, doing so is unlikely to result in much of a net-positive change for downstreams that use third-party indices, since corporate index hosts haven’t moved much (at all) beyond the PEP 503 baseline. So this might be good to do from a consistency perspective, but won’t necessarily be a huge boon to many teams looking to adopt cooldowns.
  • Similarly, PyPI uses PEP 792 project statuses to signal things like archival and quarantine (e.g. for malware), but clients that consume PyPI indirectly (e.g. through corporate mirrors) lack visibility into those statuses when their hosted index service strips everything away into just a bare PEP 503 form.

Long term, I think this status quo represents a modernization risk to the ecosystem, similar to one that Python has been through before (with 2->3, wheels, etc.): indices that only support the bare minimum required by PEP 503 effectively function as a source of drag on installers and other clients (e.g. uploading clients like twine), and make it harder to make important ecosystem-wide security and usability enhancements to Python packaging.

Solutions?

The extreme solution is to deprecate PEP 503. Once deprecated (and this would take a long time, almost certainly), a series of interlocking constraints could be untangled:

  • PyPI could stop serving PEP 503 index responses, or only serve them with an explicit Accept: application/vnd.pypi.simple.v1+html request rather than by default. This will force mirrors to make a conscious decision to remain on the HTML index.
  • uv and pip could (again, slowly) deprecate their PEP 503 support, first by warning when an index fails to offer a PEP 691 response and eventually removing support entirely (with an appropriate error).

Combined, these would ideally produce the appropriate incentive structure with third-party services to modernize their index representations: the former allows PyPI to gracefully nudge them during a “yellow” deprecation period, followed by a “red” period in which they’ll need to explicitly opt into mirroring the HTML index, following by similar periods for installers wherein everything will continue to function until the final day of installer-side removal.

That’s the “extreme” solution; less extreme options are possible, but accomplish fewer goals around reducing ecosystem drag. Some less extreme options:

  1. Never deprecate the HTML index, but instead make the Accept: application/vnd.pypi.simple.v1+html header mandatory (again, after a deprecation period) and default to the JSON index index on PyPI. This will allow everything to keep chugging along, but would (hopefully) nudge mirrors into making the migration.
  2. Never deprecate PEP 503, but instead encourage installers (pip and uv) to emit more warnings when encountering an HTML index that’s entirely unversioned (missing a pypi:repository-version tag). This would also serve as a nudge, but a software one (and IMO therefore not an especially effective one).
  3. Never deprecate PEP 503, and instead look at “backporting” PEP 700 to the HTML index (to help with use-cases like cooldowns). I think this is potentially worth doing independently anyways, but the outcomes will still primarily be limited to PyPI versus third-party indices.

There are of course many other options, and I’m positive I’m not thinking of the overwhelming majority of them :slightly_smiling_face:. I’d really appreciate hearing others’ thoughts on both the above and alternatives!

Disclosures

I’m writing this out of personal interest (my track record with writing index PEPs will hopefully evidence this!), but in this interest of transparency I’m disclosing that I work for Astral on both a client (uv) and an index implementation (pyx). My experience on both of those has colored my opinions about the value of this idea, but the idea itself predates any interest my employer may have in the topic.

9 Likes

I believe you know this since you call it out, but PEP 691 explicitly allows not exposing every feature in every serialization format:

Future versions of the API may add things that can only be represented in a subset of the available serializations of that version. All serializations version numbers, within a major version, SHOULD be kept in sync, but the specifics of how a feature serializes into each format may differ, including whether or not that feature is present at all.

The only think making PEPs “need” to update the HTML is people deciding that a given PEP should do that, but it’s perfectly valid not to do that, and I think it would be reasonable to say that future PEPs should be JSON only unless we have a compelling reason why something needs to be extended to HTML.

I’d suggest that whether or not PyPI continues to serve HTML encoded content is immaterial to the problem you’re trying to solve here. Indexes that are not updating the content that they serve are not likely going to be affected by what PyPI serves except in the most basic “dumb” mirror (which I’m not aware of any that are like that, since you typically need to rewrite the package URLs anyways).

I think those indexes are basically completely ignoring changes in the index spec, and anything we do there (including backporting things like PEP 700 to support HTML) are likewise going to be ignored.

I think those indexes are enabled to ignore changes in the index spec because thus far all of our changes have been additive and installers have never required any of those features, and I think the only way those indexes can be forced to supply new features is if installers act as a forcing function here.

PEP 691 also allows clients to stop supporting HTML, thus far no client has been willing to do that :slight_smile:, likely because there are a lot of non PyPI indexes that require HTML and nobody wants to be the one to start breaking them. We could write a PEP that says we want clients to start doing that so no individual client takes the heat– but I suspect users aren’t going to care a ton that a PEP gives them permission to do so, they’re going to see pip/uv/whatever is breaking them and be upset about it.

I don’t particularly have a strong sense of whether we can/should deprecate PEP 503 from PyPI, maintaining support for it is basically zero cost for PyPI, and I don’t think that PyPI supporting it or not effectively changes anything here. I think it’s basically entirely down to what clients support.

7 Likes

Yep, but thanks also for saying it explicitly!

I agree with this mostly, but I do think PyPI ceasing to serve the HTML index (or serving it only under an explicit Accept header) would provide a bit of a forcing function here, since it’d force services that mirror PyPI to at least double-check when their blind mirroring operation begins to fail. Hopefully that would cause them to then re-think and consider switching to a JSON index, but I could just as easily see that backfiring :sweat_smile:

Totally agreed with the rest about installers, though! I think one of the outcomes that would potentially be desirable here is some kind of agreement among the installers to act as a forcing function for modernization, e.g. first with warnings and then maybe something more forceful from there. Doing that is certainly going to be hard and thorny; I think maybe this thread would suffice as me forecasting an interesting discussion topic at the packaging summit and similar.

1 Like

Are there mirrors that blindly mirror the HTML pages? Bandersnatch and devpi generate their own HTML pages, I’m not sure about the rest though offhand.

1 Like

Bad choice of words on my part – I didn’t mean blind as in copy the HTML pages wholesale, but as in “fetch the index pages without sending an Accept header,” i.e. relying on the default being HTML. So changing the default to JSON would at least notify them of a change :sweat_smile:

2 Likes

I think before we look at deprecating PEP 503, it would behoove us to consider why so many indexes use it and don’t implement other index features. In my mind the answer is that PEP 503 is really pleasant to set up because it makes it easy to create statically generated indexes with extremely little work, independent of the environment/infrastructure it is being set up on.

There is also very little OSS tooling in existence that statically generates an index with support for new index features. dumb-pypi is the only one I’m aware of, and even that doesn’t implement new features like PEP 792, and a lot of the features it does implement are optional.

PEP 691 alone requires some webserver/CDN rewriting to handle Server-Driven Content Negotiation properly. That makes JSON metadata harder to adopt because it impacts specifics of the infrastructure used to serve your metadata. Suddenly index maintainers need to figure out how to parse request headers and change the response based on that for whatever is sitting in front of their index (webserver, CDN, etc.)

So in my mind, people rely on PEP 503 because it is an easy default which is simple to integrate regardless of infrastructure. This indicates to me that we ought to think about how we can make moving off of 503 easier, rather than how to force people off.

Edit with disclaimer: my team manages a 503-only index, pypi.nvidia.com. But I have seen this pattern across a number of companies and hold these views regardless.

7 Likes

Removal of PEP 503 from uv and pip is going to be a problem for the community. A lot of third-party Python index implementations do not support support JSON API:

$ curl -SLsI -H "Accept: application/vnd.pypi.simple.v1+json" https://download.pytorch.org/whl/nightly/cu130/torch/ | grep -i content-type:
content-type: text/html

$ curl -SLsI -H "Accept: application/vnd.pypi.simple.v1+json" https://pypi.org/simple/torch/ | grep -i content-type:
content-type: application/vnd.pypi.simple.v1+json

There aren’t many PyPI servers with JSON API that people can just spin up locally. Pulp just got PEP 691 support a few months ago. DevPI should support PEP 691 (too be confirmed).

5 Likes

dayjob has at least two of these. We could switch if there was a pressing reason to do so, such as tooling dropping support, but there’s significantly more overhead. I can’t say definitively, but my gut is that we would probably just switch from using the simpleindex as a wheelhouse to just downloading the files and then installing as if local. Any of the things like dependency cooldowns are moot for us by the time it’s at the simpleindex, that needs to be used as a delay to pulling the sources we’re building from.

1 Like

This is a really great point – the fact that PEP 691 is content negotiated is definitely problematic for static hosts, and more or less precludes the kinds of simple deployments that people reasonably favor.

It’s not widely known (and probably not widely supported!), but PEP 691 also has an optional URL parameter negotiation mode, e.g. ?format=application%2Fvnd.pypi.simple.v1%2Bjson.

Ref: Simple repository API - Python Packaging User Guide

So one potentially less disruptive route would be to make that mechanism more normative.

(There’s also a separate URL-specific negotiation mechanism, e.g. /simple/v1+json. That one requires the origin to serve an appropriate content type, but since the formats are shared by URL that could be done statically. Actually, thinking about it more, that might be the most static way to serve PEP 691 JSON indices :sweat_smile:)

Same here, some communities I work in stick custom packages or platform-specific builds in a directory on a filesystem and point Apache autoindex at it. Currently pip installs from it just fine as if it were a fully compliant package repository. It doesn’t get much lower-effort than that.

Obviously they’d adapt if necessary, but it would be extra nice not to have to do so.

3 Likes

It looks like pypi.org supports this but neither pip nor uv do as best I can tell. It seems like this is not really much different than server-negotiated responses because you still need to respond with a different file/object/etc. depending on what’s in the URL. That’s not really fully-static friendly.

TIL! Thanks for pointing that out! Unfortunately, it seem that none of uv, pip, nor pypi.org support this as far as I can tell, so it isn’t usable. Maybe the best path forward is to encourage tools to implement this? It’s not clear to me however how tools would select between the three different routes of viewing the index. The advantage of SDCN is that a client can list multiple values in Accept and get back at least one in a single request. With the fully-static approach, there’s no way to negotiate, so I doubt clients will want to do this by default, which makes it harder to adopt.

That’s because PEP 691 – JSON-based Simple API for Python Package Indexes | peps.python.org states:

Supporting this parameter is optional, and clients SHOULD NOT rely on it for interacting with the API.

4 Likes

Could we extend 503 in some way to allow optionally advertising that the server does support one or more (and which) of those urls, allowing tools interested in the full data to get it when available?

I think that’s about as good as we get if we want to keep it static for the simple hosting case and keep this optional for those who don’t need it.

1 Like

Aha, thanks for pointing that out!

Minor correction to my own post: I checked the path listed in the PEP which is not required, so it might be supported, but the URL isn’t standardized so tools cannot build around this.

I think this seems like a good route forward. It could even be it’s own well-known path such that the /simple/ route won’t need to be returned if it is large (pypi.org’s Simple index is ~40MB!)

2 Likes

I like the idea of increasing visibility for the JSON index by advertising it on the HTML index, but I worry that effectively we’d still be in the state @dstufft points out (where we can do all this great stuff, but since it’s additive none of the baseline 503 indices out there will adopt it). But it definitely seems useful for the subset that is willing to proactively adopt 691, particularly if we can make it easy with a static URL!

I’m actually surprised this works! I would have expected it to work with --find-links, but the fact that Apache’s autoindex happens to have the right shape for PEP 503 seems like a coincidence that isn’t safe to rely on long term.

I think there’s a subtlety that needs to be called out here - while I’m normally the first person to remind people that “there’s more than one installer”, in this particular case, I don’t think that’s true. The reality is still (as far as I’m aware - I don’t have up to date figures) that for the majority of users, especially large, closed source, or commercial projects, pip is the only installer that really matters. So using pip as a “forcing function” would be essentially no different than desupporting the HTML index altogether.

I don’t think that even pip has the weight to force change on index suppliers - we’ve had occasions before where we’ve been impacted by quirks of commercial index implementations like Artifactory, and we’ve not had much success getting them to fix things. The general result has either been pip working round the quirks, or the issues remaining unresolved.

100% agreed. I think a really good approach here would be to do some outreach to a number of large providers of non-PyPI indexes - I’ve already mentioned Artifactory, there’s the PyTorch indexes, piwheels, I believe Azure offers a (commercial?) Python index service, and probably many others. If we went to them and asked what it would take to get them to implement the JSON index API, I suspect the answers would be very illuminating (and probably pretty discouraging :slightly_frowning_face:).

To put this another way, I think we should, at least initially, be treating this as a people problem, not as a technical problem. We’ve already seen this with PEP 708 (dependency confusion) - getting anybody outside of PyPI to take interest in new index standards is (to a close approximation) impossible. We desperately need to understand why that’s the case, and take steps to address it.

Again, this is a great point. No-one is going to implement Warehouse as their index software, and there’s no toolkit to implement anything lighter weight, until you get down to the very bare bones HTML index, where “just serve a static directory” is by design sufficient.

If more tooling were available, would index suppliers use it and therefore adopt newer, more complex, standards? Maybe. Yet again, the only way to find out is to ask them.

That’s not even a complete PEP 503 index. It’s missing even the (optional) serving of Requires-Python metadata. In reality, it’s barely any better (for pip, at least) than just serving a flat directory of wheels and sdists, and using --find-links.

The problem here is that consumers need to choose which mode to use. Content negotiation has been promoted as the “right way”, so that’s what people use. And so servers that only support one of the other approaches won’t get recognised. I can’t imagine consumers trying content negotiation, if that fails to get JSON trying one or both of the other two methods, and only if they fail, falling back to the HTML response that came from the first try. That’s both complex and time consuming, and won’t help in 99% of cases.

I’m not against moving away from content negotiation if there are clear benefits, but the benefits would have to be significant, as it would be a pretty disruptive change, and doesn’t really address the underlying problem. We’d need a lot of evidence that content negotiation is the key blocker causing index implementors to stick with HTML, and frankly I doubt that’s the case.

Maybe, but I feel this is a side issue. We should focus on the core problem first, which is understanding why people aren’t willing to implement the JSON index spec (and more generally, anything but the bare minimum feature set for an index). We can debate ways of avoiding content negotiation if we establish that’s the key issue here.

You have to lay the directories out in the specific form required by the simple index. So it’s not that surprising - you just implement the index spec via the directory layout rather than in the web server.

The flaw here is that this fails to implement any of the optional features - redirection of unnormalised names, hashes, data-requires-python, or any features beyond PEP 503 (such as separately downloadable metadata). As I said before, and you hinted at here, it’s no better in any practical sense than --find-links with a flat directory.

2 Likes

Thinking some more about this, it feels like something that would be an ideal use of some of the Packaging-WG funds, to hire someone with the right skillset to do such a survey.

One or two data points isn’t statistics, but the cases I’ve been involved in decision making for, it’s about simplicity. If we had a static option for serving the expected json data without content negotiation, I’d have to look at everything expected in the full json index spec, but I don’t remember there being anything objectionable there for keeping it simple yet full featured in terms of available metadata if we can drop content negotiation.

I’m also aware of cases that are shaped more like this and less like what happens at my dayjob:

Example in the wild: Directory listing for /pip index is updated by CI when wheels are built. (this one’s done with a jinja template and is hosted on github pages)

1 Like

Technically that’s not a valid PEP 503 index. The URL Directory listing for /pip/Brotli doesn’t correctly normalise the project name. Also, it’s HTML 4.01, where the PEP requires HTML 5.

This is a classic example of the sorts of problems caused by a “be liberal in what you accept” approach. Of course, the underlying issue here is that PEP 503 standardised existing practice, but in doing so tightened a number of constraints in ways that no-one ever followed or enforced.

Should pip reject indices that don’t conform strictly to PEP 503? IMO, it’s not our job to police the standards - we have to deal with the realities of what people use in the real world. We don’t have the resources to act as standards enforcers, even if we wanted to. Which is just another example of why expecting installers (pip) to act as “forcing functions” for change isn’t really practical.

2 Likes

Point of clarification, you only need a webserver if you’re trying to do conneg– which is only required if you’re trying to support both HTML and JSON with the same URL. If all you serve is JSON, then you don’t need conneg (the same as if all you serve is HTML, then you don’t need conneg).

There’s nothing stopping someone from serving only JSON at whatever URL they’re currently serving HTML from, and in fact the ability for a server to switch from HTML only to JSON only without consumers having to care about it is part of the motivation for using conneg to begin with.

2 Likes