I suspect the simplest reason why so many indexes use only the HTML serialization is that, for the average user, there’s no real compelling feature that they lose if they only support HTML, but if they only support JSON they lose some compatibility with older clients, and supporting both HTML and JSON has some non zero amount of inherit cost.
To some extent this is just inherent in how we’ve evolved this– serializing the same data in HTML vs JSON really only matters for implementers of packaging tools, not for end users, and most of the features we’ve added since 691 have also been added to the HTML serialization anyways.
We can continue to evolve the API, focusing on the JSON serialization, and presumably if those additional features are good features, each one of them will make the JSON API more inherently compelling for end users, which will drive some of them to want to seek it out explicitly.
Just waiting longer will reduce the impact of only supporting JSON, at some point clients that don’t support JSON indexes will be such a small minority that breaking them isn’t a particularly large breakage– the same as every time a project drops support for some version of Python they break some number of their users as well.
We could try to reduce the cost of supporting both HTML and JSON together, but I’m not sure that I can see a way of actually doing that which isn’t (IMO) worse than the status quo.
The problem becomes that if you want to support HTML and JSON together without conneg, then you need distinct URLs for HTML and for JSON, which means that clients need to know when to use the JSON URL and when to use the HTML URL, and currently clients are configured with something like pip install –index-url ``https://example.com/simple/`` foo, which gets turned into a single HTTP request to https://example.com/simple/foo/ (which might spawn more if there are dependencies or whatever).
Without conneg, we have to keep that URL as the HTML serialization, so how does pip know that it can make a request to a different URL and get JSON back?
I could only think of two options:
- We implement some sort of discovery protocol.
- We ask users to inform pip (or whatever tool) that their index supports JSON.
A discovery protocol will slow down installation, because we’ll have to make some additional request to the index to discover it’s capabilities prior to actually executing the requests we want to make, incurring a round trip network request penalty before we can make any requests to that index. We can try to amortize the cost of this by caching it, but we wouldn’t want to cache it forever because then indexes could never add support for JSON, and caching doesn’t help things like ephemeral hosts, containers, etc.
Asking users to inform pip (or whatever tool) that their index supports JSON relies on the user understanding that the JSON option exists, whether their client version supports it, whether or not their index supports JSON, and also to just care what serialization their index uses (which as mentioned above, I don’t think they will, because of a lack of compelling feature that is locked behind JSON).
Interestingly enough though, asking users to inform their tool that their index supports JSON actually inherently works as part of conneg here. There’s nothing stopping an index from actually hosting effectively two indexes at different URLs, one for HTML and one for JSON, and having users on a new enough version of of their tool configure their installer to use the JSON one. That works today and doesn’t require any special option as a consequence of relying on conneg. The problem circles back around to the fact that most users simply don’t care, because there’s no compelling feature that is unlocked by supporting JSON.
To circle back to @woodruffw original question. I think the only form of deprecation that actually makes any sense today, is we could explicitly declare that we’re no longer going to be adding new features to the HTML serialization.
PEP 691 allows individual PEPs to decide on a case by case basis whether or not it makes sense to add a feature to the HTML (well it’s more generic than that, and allows that decision for any serialization generically). That puts the onus on individual PEP authors to justify why they shouldn’t add a given feature to the HTML serialization, so most PEPs I suspect will try to find some way of adding it.
Explicitly declaring the HTML serialization as “feature complete” would provide a signal to users that if they want new features, they’ll need to figure out how to get off of the HTML serialization as well as switch the default for new PEPs to having to justify why not adding something to the HTML serialization, to having to justify why some feature is important enough that it must be added to the HTML serialization.
I think anything stronger than that would probably be premature at this point, but my gut instinct is that a PEP like that wouldn’t be unreasonable.
PEP 691 punted on application/json, but we could definitely add it. One thing I’d personally want to look at before we did that though is whether anyone has actually talked to GitHub to see if we could add more content types to their list (we’d presumably have to come up with an extension if we did that).
The problem with application/json is there’s no versioning information, so indexes relying on that would lose the ability to use the content type for versioning– which would mean that we’d have to maintain the versioning information in the JSON body (which we have), but we’d have to either accept breakages or never change the structure enough in a hypothetical v2 that a v1 client using application/json would still be able to determine the version.
That’s probably not particularly hard to do, so not the end of the world– but it would be nice not to have that constraint, and if it’s just GitHub that we’re doing it for, it’d be nice to see if we could just work with GitHub to fix the problem in a more “correct” way.