Pre-PEP: What would it look like to deprecate PEP 503?

So here’s a thought. If someone were to write a tool which took a directory full of sdists and wheels, and built a new directory structure that reflected a JSON index for that data - which could then be served with a simple static file webserver like python -m http.server - would that be a useful tool for people who wanted to adopt the JSON index format?

Of course, this is yet again a technical response to a people problem, so I don’t advise actually doing this unless there’s a demonstrated benefit…

4 Likes

pip has only supported PEP 691 for ~three years, so if you want to be compatible with ~95% of clients, you must support the HTML representation, which means choosing either conneg or not supporting JSON. This will of course change with time, and maybe the answer is “wait a few years and maybe people will generate JSON-only indexes.” I expect that likely isn’t to happen, especially considering there are likely other tools that only speak PEP 503.

5 Likes

Sure, but that’s just time then yea? You don’t get to rely on new features of things until enough of your users upgrade to a new enough version of the tooling to rely on it. Wait until enough time has passed that you can rely on it as a base line.

The same would be true for any mechanism that we could have added for supporting JSON indexes, conneg made the most sense (and still does IMO– there’s no other option that I can think of that doesn’t require out of band information OR extra HTTP requests).

Also to be clear, many static web servers support conneg just fine– if I recall from the PEP 691 discussion correctly, it was primarily just Amazon S3 (without a CDN) and python -m http.server that didn’t.

3 Likes

I also know of another instance where a group went with the HTML index because it could be served via GitHub Pages. Had the JSON API worked out then they would have gone with that. So in that case the fact that application/json doesn’t work was a blocker.

I have run into this thanks to mousebender · PyPI , e.g. Repo index page w/ trailing slash in CDATA not supported · Issue #19 · brettcannon/mousebender · GitHub . So the HTML index is just messy and if we could ever get people to move to the JSON API it will be a good thing long-term. And I suspect making application/json work would help with that.

5 Likes

By “make that mechanism more normative”, do you mean “provide an alternative variant of that mechanism”?

Making it easier to statically host JSON-only data, and ensuring both pip and uv support indices which only provide that data, would help. But it doesn’t solve the issue of getting broad adoption for that format.

For the purposes of this discussion, I think it’s best to assume that we can add a nice static JSON format, which installers can detect as supported and use preferentially. So how would we get index providers to adopt it?

I think before we look at deprecating PEP 503, it would behoove us to consider why so many indexes use it and don’t implement other index features. In my mind the answer is that PEP 503 is really pleasant to set up because it makes it easy to create statically generated indexes with extremely little work, independent of the environment/infrastructure it is being set up on.

Writing from a corporate background this is a key point, indeed.

Consider any mid to large corproate environment:

It is quite easy and straight forward to serve a statically generated HTML page for people to use - essentially no need to ask anyone for permission. A team can just do it, and almost instantly have a closed down, custom managed pypi for everyone to use. That’s a benefit in multiple dimensions, not least of all you get a cooled down set of package dependencies almost naturaly.

In contrast it is much more difficult in most corporate IT shops, perharps surprisingly in this day and age, to get a dedicated server set up and install some protocol compliant server installed. To do that requires evaluation, and evaluation invites a wide range of stakeholders and concerns. What used to be simple to do (i.e. set up a interally managed pip-usable package index) becomes a major, complex project.

Edit: removed a more general observation that was off-topic and distracted from my actual point. Thanks to @notatallshaw for catching it.

1 Like

@miraculixx I largely agree with you but please don’t cast vague aspersions, if you have specific concerns on a different topic please make a new thread. Everyone volunteering to be here is doing it because they want to make something better, not worse, please assume good faith.

This thread is exactly that, it’s a feedback gathering exercise, not a decision about how to do something.

FYI, as a pip maintainer I would be -1 on deprecating PEP 503, but most of my reasons have already been articulated here so I didn’t feel the need to contribute.

Edit: P.S @miraculixx clarified that I took their original post in the wrong direction, apologies for the miscommunication.

4 Likes

Thanks for your feedback. Although I do not agree with your reading of my thought, I see how it might come across that way. I too am contributing to make things better, not worse.

I have thus removed it from my actual feedback which I believe adds a specific aspect that has not been mentioned before.

2 Likes

I suspect the simplest reason why so many indexes use only the HTML serialization is that, for the average user, there’s no real compelling feature that they lose if they only support HTML, but if they only support JSON they lose some compatibility with older clients, and supporting both HTML and JSON has some non zero amount of inherit cost.

To some extent this is just inherent in how we’ve evolved this– serializing the same data in HTML vs JSON really only matters for implementers of packaging tools, not for end users, and most of the features we’ve added since 691 have also been added to the HTML serialization anyways.

We can continue to evolve the API, focusing on the JSON serialization, and presumably if those additional features are good features, each one of them will make the JSON API more inherently compelling for end users, which will drive some of them to want to seek it out explicitly.

Just waiting longer will reduce the impact of only supporting JSON, at some point clients that don’t support JSON indexes will be such a small minority that breaking them isn’t a particularly large breakage– the same as every time a project drops support for some version of Python they break some number of their users as well.

We could try to reduce the cost of supporting both HTML and JSON together, but I’m not sure that I can see a way of actually doing that which isn’t (IMO) worse than the status quo.

The problem becomes that if you want to support HTML and JSON together without conneg, then you need distinct URLs for HTML and for JSON, which means that clients need to know when to use the JSON URL and when to use the HTML URL, and currently clients are configured with something like pip install –index-url ``https://example.com/simple/`` foo, which gets turned into a single HTTP request to https://example.com/simple/foo/ (which might spawn more if there are dependencies or whatever).

Without conneg, we have to keep that URL as the HTML serialization, so how does pip know that it can make a request to a different URL and get JSON back?

I could only think of two options:

  • We implement some sort of discovery protocol.
  • We ask users to inform pip (or whatever tool) that their index supports JSON.

A discovery protocol will slow down installation, because we’ll have to make some additional request to the index to discover it’s capabilities prior to actually executing the requests we want to make, incurring a round trip network request penalty before we can make any requests to that index. We can try to amortize the cost of this by caching it, but we wouldn’t want to cache it forever because then indexes could never add support for JSON, and caching doesn’t help things like ephemeral hosts, containers, etc.

Asking users to inform pip (or whatever tool) that their index supports JSON relies on the user understanding that the JSON option exists, whether their client version supports it, whether or not their index supports JSON, and also to just care what serialization their index uses (which as mentioned above, I don’t think they will, because of a lack of compelling feature that is locked behind JSON).

Interestingly enough though, asking users to inform their tool that their index supports JSON actually inherently works as part of conneg here. There’s nothing stopping an index from actually hosting effectively two indexes at different URLs, one for HTML and one for JSON, and having users on a new enough version of of their tool configure their installer to use the JSON one. That works today [1] and doesn’t require any special option as a consequence of relying on conneg. The problem circles back around to the fact that most users simply don’t care, because there’s no compelling feature that is unlocked by supporting JSON.

To circle back to @woodruffw original question. I think the only form of deprecation that actually makes any sense today, is we could explicitly declare that we’re no longer going to be adding new features to the HTML serialization.

PEP 691 allows individual PEPs to decide on a case by case basis whether or not it makes sense to add a feature to the HTML (well it’s more generic than that, and allows that decision for any serialization generically). That puts the onus on individual PEP authors to justify why they shouldn’t add a given feature to the HTML serialization, so most PEPs I suspect will try to find some way of adding it.

Explicitly declaring the HTML serialization as “feature complete” would provide a signal to users that if they want new features, they’ll need to figure out how to get off of the HTML serialization as well as switch the default for new PEPs to having to justify why not adding something to the HTML serialization, to having to justify why some feature is important enough that it must be added to the HTML serialization.

I think anything stronger than that would probably be premature at this point, but my gut instinct is that a PEP like that wouldn’t be unreasonable.

PEP 691 punted on application/json, but we could definitely add it. One thing I’d personally want to look at before we did that though is whether anyone has actually talked to GitHub to see if we could add more content types to their list (we’d presumably have to come up with an extension if we did that).

The problem with application/json is there’s no versioning information, so indexes relying on that would lose the ability to use the content type for versioning– which would mean that we’d have to maintain the versioning information in the JSON body (which we have), but we’d have to either accept breakages or never change the structure enough in a hypothetical v2 that a v1 client using application/json would still be able to determine the version.

That’s probably not particularly hard to do, so not the end of the world– but it would be nice not to have that constraint, and if it’s just GitHub that we’re doing it for, it’d be nice to see if we could just work with GitHub to fix the problem in a more “correct” way.


  1. You could imagine other schemes that don’t require a different base url for HTML and JSON, but instead use some other marker to tell the installer that its a JSON based repository, for instance --index-url https://example.com/simple/ vs --json-index-url https://example.com/simple/. The difference here is immaterial though, and if an installer wanted to offer that today they could do that as well, but either way the user would have to know that a repository supported JSON and would have to know how to tell their client to use the JSON version. ↩︎

3 Likes

I doubt it’s just GitHub. I believe S3 without a CDN is in the same position here (and notably, it’s one of the options people use for this without a way to do conneg)

Not meaning to nerd snipe here, just trying to make sure the information is correct because it’s fiddly.

PEP 503 does not require the repository handle non-normalized names in the URL. It requires that the repository normalize the names in URLs (so your example, PEP 503 requires the answer be /pip/brotli/) but it allows (but does not require) repositories to redirect non-normalized names to the normalized names.

In PEP 503, it is the client’s responsibility to normalize the name before making the request (and pip has done that for a long time), which was done ages ago to reduce the number of HTTP requests it takes to talk to PyPI, because originally PyPI (and pip) didn’t do any normalization, and if someone did pip install django when Django’s name on PyPI was Django, pip relied on PyPI redirecting django to Django.

At that time, if /$project/ did a 404, pip would fall back to request / and searching for the <a href=”/Django/”>Django</a>, which is how support for static indexes that couldn’t implement the redirect worked. That page was getting huge though, and we wanted to eliminate the fallback, but couldn’t do that without breaking static servers.

Given pip couldn’t know the “proper” name a-priori, but it could know the normalized name, we changed the simple API so that normalized name was the “correct” name, but allowed the redirect so older versions of pip could still talk to PyPI (since they didn’t normalize prior to constructing the URL).

This is a valid issue though :slight_smile:

2 Likes

S3 supports arbitrary content types.

3 Likes

This is an important point, but as I’ve already noted, many (most) of such “easy and straightforward” implementations of an index are actually not PEP-compliant. And worse, they omit support for important optimisations (requires-python, separate metadata).

Strict PEP compliance isn’t a problem because pip (and I imagine uv) don’t enforce the standard strictly. And as I’ve already said, I don’t consider installers to be the right place to do such enforcement. But with regard to sub-optimal performance, I assume (and I’d appreciate your perspective from real-life experience) that such indexes are either small enough that the optimisations aren’t important, or performance isn’t important enough for them to matter. Is that a fair assessment?

Suppose (again, purely for the purposes of gathering information) that installers like pip started requiring things like separate metadata, and requires-python served by the index. What would these organisations do in that case? Less extreme, what would they do if pip emitted noisy warnings if key optimisations like this weren’t possible?

Conversely, why do organisations like this bother laying out an index structure in any case? What makes doing that worthwhile, considering that just serving up a flat directory of wheels and files using --find-links is just as effective, and (slightly) less work to implement? In your experience, is it simply that you didn’t know that was an option, or is there a deeper reason?

If there was a tool that could be used to build an index-compliant static website from a directory full of wheels and sdists, would people use that? It would just add a build step every time a new wheel was published - is even that too much? How annoying would installers need to be in order to make teams accept the extra step?

I’m personally of the view that anything that requires setting up a dedicated server to host a local index is going to be a significant loss of usability for many people, so we need to focus on options that allow static publishing of an index. But actually, once you accept the need for some sort of “build step” to prepare the static index, our existing standards are actually pretty good in that regard. So I’d like to explore whether such static publishing could be a realistic compromise (between “trivially easy to throw up an index” and “needs to expose rich enough data to allow tools to do their job effectively”).

(There’s a whole other question about how we persuade larger index providers to do more than the bare miniumum. That needs to be explored separately. For now, in this conversation, I’m focusing purely on the ad-hoc users for whom the simplicity of just serving up a simple index is a compelling feature).

I appreciate that these questions are hard to answer (I also worked in a corporate environment, so I know how little extra work counts as “too much to bother with” in these sorts of situations). But if we can get any sort of feel for what would be acceptable, it would be immensely valuable. Otherwise, we end up in a situation where we cannot change anything, because we’ll break such users - and that just leaves us with other problems we can’t deal with (“pip is too slow”, “resolving dependencies takes too long and fails too often”, etc).

3 Likes

lol, I’ll note that the index linked in my quoted comment has now been updated to be compliant.

Which demonstrates that (1) it’s certainly possible to serve a compliant index statically, and (2) it’s easy to not get all the details right if you’re not careful.

4 Likes

My bad :smiley:

1 Like
❯ curl -I http://dstufft-pep691-demo.s3-website-us-east-1.amazonaws.com/pip/
HTTP/1.1 200 OK
x-amz-id-2: pLdTLO6xnh4rvt0qI1hSS18XYsJxVN/61BaBK2PJPYucGPNbEG7pcKWKFBD8HBKwDEpcaF8f9fYHOafdL0c73uNxXaW4MR04
x-amz-request-id: GWX252XKRZSWMFC8
Date: Tue, 14 Apr 2026 15:27:05 GMT
Last-Modified: Tue, 14 Apr 2026 15:25:05 GMT
ETag: "6554b040ef103f1afd4fd4160384349d"
Content-Type: application/vnd.pypi.simple.v1+json
Content-Length: 144150
Server: AmazonS3
1 Like

Just to push back on this– is there anyone else besides installers in a position to enforce the standards here?

I doubt anyone who is producing a non-compliant index is going to run some sort of external “compliance checker” against their index, the only thing that matters to them is whether it works with pip/uv/etc or not, so from my perspective clients enforcing the standard is the only way to enforce the standards at all [1].

That being said, I wouldn’t expect tools to go far out of their way to enforce compliance. Pretend for a moment that we said that indexes had to support both HTML and JSON at the same URL, I think it would be unreasonable to expect pip/uv/etc to make a second request to verify compliance with that, because it would drastically negatively affect everyone, including users who were doing things correctly.

On the other hand, I think it would be imminently reasonable for pip/uv/etc to enforce that the HTML serialization is properly using HTML5 [2].

I guess my opinion here is that installers/clients should be enforcing compliance, where it is reasonable that they do so, just like Warehouse enforces compliance where it is reasonable as well [3]


  1. This isn’t really that different from the fact that PyPI/Warehouse is often used as a compliance checker for projects producing wheels or sdists. In many cases we’re validating things that PyPI itself doesn’t actually care about, but because we’re the only tool that is in a position to do that compliance validation. ↩︎

  2. Though I wouldn’t personally advocate for this, because of backwards compatibility concerns. ↩︎

  3. For instance, it’s not reasonable for Warehouse to validate that a wheel is actually manylinux compliant. ↩︎

3 Likes

I largely agree with this, but really I’d like installers/clients to be even more forgiving (potentially with warnings), because that works out better for everyone as a whole than an installer that reject something that used to work “because the maintainers felt like it”.

I’d much rather pip be spamming all our logs with “failed to find JSON index, falling back to legacy HTML format” than us pinning to an older version of pip.

3 Likes

TIL.

With that said, I think without conneg, this runs into the same adoption delay “issue” for any major version right?

That’s a fair question, and I’d have to answer that I can’t think of anyone.

Your footnote about backward compatibility is the key issue here. Installers pretty much can’t do any enforcement, because if we actually detect any standards violation, we can’t do anything about it without breaking a working environment.

I think my take would be that installers can only enforce compliance on behalf of some other party that takes responsibility for the consequences. Or to put it another way, when they can direct the screaming hordes of angry users at someone else, who will take the heat :slightly_smiling_face:

Right now, “community consensus” is far too vague an idea to take such responsibility. Maybe in the future, the packaging council could… If we wanted to instantly burn out all the PC members in one go :slightly_frowning_face: