PEP 700: Additional Fields for the Simple API for Package Indexes

See Backwards Incompatible Change to PyPI JSON API - #27 by pf_moore.

Edit: The PEP is now available at PEP 700 – Additional Fields for the Simple API for Package Indexes |

1 Like

Thanks for the PEP - just a few questions:

  • Should there be a minor version bump? I think the keys’ optionality should be indicated at the index level somehow but this can come at a later date, considering that the new keys are… optional. I’m not sure how much sense it makes for these keys to be individually optional either. On that point, it would be good if the PEP would elaborate on the uses the new keys would enable - is the PEP related to an optimisation in pip?

  • Should the versions array be ordered and should the order be significant? Suppose I want to grab the latest version available and I don’t wanna pull in an additional dependency (packaging or distlib) for version comparison, there’s no good way to do that with the simple API. I don’t think we’d want people fiddling with versions or distribution filenames in the command line[1] or (worse) thinking that they might be able to sort the files by date to grab the latest version. It’d be good to note that this is out of scope of the PEP if it is indeed out of scope.

  1. The PEP mentions curl and jq but I don’t know if we expect that people should be able to manipulate data in the command line. ↩︎

Good point. If we make it version 1.1, then the keys can simply me made mandatory, with indexes being able to indicate that they don’t provide the keys by saying they provide version 1.0. I thought I recalled someone saying that this didn’t need a version bump, but reading PEP 691, I’m probably wrong.

See the original thread, but basically no, this doesn’t help pip directly. It’s more for people wanting to move off the JSON API, bringing the simple API closer to parity with that. I’ll add some notes to the PEP.

It could be made ordered, with latest first. But that would require indexes to implement sorting of versions, which might be extra work (especially for non-Python implementations, that can’t just depend on packaging). I’m inclined to say ordering is unspecified, and clients who need an order can do something like sorted_versions = sorted(Version(v) for v in versions).

But either way, the PEP should be explicit.

Thanks for the feedback!

After a quick skim of the PEP and this thread I don’t have any major feedback besides:

  • I agree it should be 1.1, we’ve historically been bad about versioning the API but we should try to be better about it.
  • I don’t have strong feelings about whether the version key should be ordered or not, but whether order matters or not should be specified (e.g. is it logically a set or a list). For Warehouse we’ll almost certainly order it in some fashion (and ordering it by interpreting the version probably makes the most sense) just because we need our output to be consistent unless the content changes.
  • I would like it to be made clearer whether an index is allowed to omit versions from the version list that don’t have files uploaded for them. It would be simpler for Warehouse if we are allowed to omit versions that don’t have files associated with them, as currently our query is querying the release_files table and we already access the Release.version key for sorting, so we can implement it entirely in Python, but we can make due if we need to include versions without files associated with them.
  • I’d call out the _ prefix reservation more clearly, I think it’s good to add but it’s easy to miss it currently.

Ordering is certainly more convenient for the client, and if Warehouse ends up ordering the data, I suspect people will rely on that even if we say they shouldn’t. I’d like to hear from other index providers as to whether ordering would be an issue for them, but assuming not, I now think I’m inclined to mandate ordering. I’d go for a sorted list, ordered according to PEP 440 semantics, with the most recent release first.

I don’t have a strong feeling on this, mainly because I’m not clear what semantics we would attach to a release with no files - particularly given that Warehouse no longer allows projects to create releases by any means other than uploading files to them. Do any other indexes even have a concept of a release with no files?

Having said that, the point of this key is to replace the JSON API, which does return empty releases. Does anyone have any intuition (or experience!) to say whether people use that aspect of the JSON API?

In the absence of any arguments to the contrary, I’m inclined to say something along the lines of “if an index records release versions with no associated files, it MAY include those versions in the versions key, but it is not required to do so”.

Agreed, I added it at the last minute because I nearly forgot I’d said I would do it. I’ll make it more obvious.

Warehouse can be in this state even today by uploading a file and deleting the file without deleting the release FWIW.

That wording seems fine to me though, I believe Warehouse would likely omit them unless we had a lot of people asking for it for performance reasons.

If an index would include versions without files, I feel there should be an easy way to filter these out. Would we want to map versions to files, so instead of having a simple string list, we might have:

  "releases": [
      "version": "1.0.0",
      "files": ["abc-1.0.0-py3-none-any.whl"]


  "versions": {
    "1.0.0": {
      "files": ["abc-1.0.0-py3-none-any.whl"]

Index users would then be able to exclude versions with an empty files list and they would not have to resort to parsing distribution filenames to find files belonging to a release. And at that point and we’ll have basically incorporated everything we reasonably can from the PyPI API and with a lot less duplication too :stuck_out_tongue:

I’m a bit confused on some wording in the PEP:

An index MUST provide all of the specified information, for all projects and files, if it provides any of it. In other words, while indexes may choose to only support the base PEP 691, if they choose to support this PEP they must do so completely.

But then each proposed field is listed as optional. I think this is meant to suggest something like, “if you specify any of these fields, you must do so consistently for all projects (and files, if appropriate) hosted by the index.” Otherwise my brain is reading this like, “if you use any of the new fields you must use all the new fields,” which I don’t think is necessary as versions is independent of size and upload_time`.

One other bit of feedback is upload_time uses an underscore while everything in PEP 691 uses -.

I think it comes down to whether you view the index as an index of files or an index of releases. Making it optional to list empty releases allows for either and is simplest in terms of a spec. I personally view the index as an index of files, so it makes sense to me to leave out empty releases.

1 Like

That’s my error, clumsy wording as a result of not making this a version bump. I’ll update the PEP, to basically say that the fields are mandatory. Indexes that don’t want to (or can’t) supply this information, will simply say they only support version 1.0.

That does bring up a question, though. If (when) we introduce version 1.2, will indexes have a way to say that they support whatever’s new in 1.2, but not the fields from 1.1? I’m going to make that someone else’s[1] problem though, and ignore it for now.

My bad, I’ll fix that.

The problem here is that this is the correct view for the Simple Index API, but as discussed in Backwards Incompatible Change to PyPI JSON API, there’s a move towards expecting the simple API to replace the Warehouse JSON API (if only because no-one is working on any other replacement for it…) and so it’s being asked to perform double duty.

I’d personally much prefer to keep the two views (and hence the two APIs) logically separate, but I fear that the JSON and XML-RPC APIs are likely to die of neglect[2], and I would rather a somewhat-compromised simple API over nothing.

  1. Possibly future me… ↩︎

  2. I’m just a user of those APIs, I don’t have the time to contribute to their maintenance. ↩︎

I don’t think providing file size and upload time should be coupled to providing project versions, as they each stand on their own merit. I think they should both be made optional.

Having the files only stored in cloud object storage means getting the size and upload time is trivial (part of the list response), but the filename must be parsed for the version.

Having references to the files stored in a DB, the version may be known, but not the size and upload time.

Making them (or anything) required would also mean bumping the version to 2.0.

One thing I will suggest is to say that if any file has size and upload time, then every file (for the project) should.

JavaScript dates stringify into ISO-format with millisecond precision (ie YYYY:mm:ddTHH:MM:SS.fffZ).

For easier use with JS, I suggest allowing or changing to ms-precision, especially as microsecond precision is unnecessary (and incorrect, really) for upload times.

Our internal index only has to deal with simple versions (of the form X.Y.Z[(a|b|rc)n]), so a simple regular expression parse into a tuple sort is sufficient.

For public indexes, you’ll have to make a decision on what to do with invalid versions (eg rank them earlier).

I suggest a reminder for implementers of PEP 700 to remember to set the correct content type for responses.

That’s the fundamental versioning question here, and I’d much rather not couple that question to the matter of these individual fields.

The basic question is whether newer minor versions must be backward compatible with older ones (major version bumps definitely are allowed to break compatibility). So if I get a version 1.0 response, and just change the version number to 1.1, is that now a valid version 1.1 response? If the answer is “yes”, then we have the following consequences:

  1. All fields added in minor versions must be optional.
  2. Clients cannot use the version number to determine what fields to expect, they must always be prepared for fields added in later minor versions to be missing.
  3. We have to decide, and document, for new fields whether “optional” means “can be included for some records but not all” or “can be omitted for all records, or present for all, but that’s it”.

If the answer is “no”, then

  1. Clients can know what fields to expect based on the version number.
  2. Fields are coupled based on what version they were introduced in.
  3. New versions include fields from all older versions (unless we implement some complex scheme where a server can say it supports 1.0, 1.2, 1.3 and 1.5 but no other versions…)

I’m inclined to think that the limitations of saying “no” are too significant, but on the other hand, the implications of “yes” make bumping the version rather pointless (we could just as easily say that 1.0 now allows 3 extra optional fields, and ignore versioning).

I would really appreciate thoughts from others on how we should version the simple index API. In particular @dstufft and the other authors of PEP 691, what are your views? The PEP says “This is intentionally vague, as this PEP believes it is best left up to future PEPs that make any changes to the API to investigate and decide whether or not that change should increment the major or minor version” but as the author of a “future PEP” I don’t feel I have any better insight than you on what’s reasonable here :slightly_frowning_face: And honestly, I don’t really want to be the person who decides this…

That’s fair. I suggest we make the fractional seconds optional, with a maximum of 6 digits precision (this is what the Warehouse JSON API returns at the moment).

Ouch. That’s a very good point, and suggests that leaving versions unordered so that the client can choose how to deal with invalid versions is better.

I just checked, and there’s over 1000 projects on PyPI with invalid versions, and files associated with those versions. So this is absolutely a real issue.

I would think that compatibility in the original PEP refers to client compatibility. Why would you switch your server over to 1.1 without being able to produce 1.1 responses? Put differently, if your 1.0 response is valid as a 1.1 response that means that 1.0 is forward compatible. That’s not the same as 1.1 being backward compatible with 1.0. A 1.1 response type which introduces new, mandatory properties but does not change any of the existing properties can be considered to be backward compatible with 1.0. A client which is able to handle 1.0 responses will be capable of handling 1.1 responses. If the client is going to choke on unknown properties it will do so whether the new properties are optional or not.

Well, the list will probably follow some kind of order (insertion, lexicographic, etc.) so it would be beneficial to have some way to communicate to the user that the order is not significant. It could be something as simple as renaming the property to "version-set".

But you could calculate the size without much issue. As for upload time, I guess either the PEP could suggest a default value or people could simply choose whatever default date they wanted?

I don’t know if that must happen. I think the spec could say, “if you provide size you must provide upload-time, but version is optional on it’s own”.

Sort of; version 1.0 of the spec has a bunch of optional fields.

It does potentially lead to better typing as you can say the TypedDict relies on the literal value of the version and that influences what the resulting type is.

For me, I don’t know if this proposal is going to be indicative of future additions to the Simple API, or an outlier. And even if it is common, the versioning does give us the flexibility for having required fields, optional fields that must appear together, etc. So I don’t think we need to stress over justifying a version bump by requiring fields versus what we think makes the most sense.

Having slept on it, I realise I was thinking of protocol versions (whose specification has two clients: one on either end of the protocol), for example Docker-Compose YAML or HTTP.

In the case of protocols, it’s important to maintain compatibility with both ends in minor versions.

In the case of this discussion, we’re talking about the version of an API, whose only client is the client requesting from the server. In this case, regardless of whether fields are optional or required, it makes sense to bump the minor version

I’m not sure I agree. From the client side, the only point in having a version is if checking that version allows the client to make assumptions about the content of the response. If all fields introduced in the new version are optional, what’s the point in the client even checking the version?

With that in mind, I feel there are two options worth considering here:

  1. Bump the version, and make all 3 new fields mandatory in version 1.1.
  2. Leave the version at 1.0, but add 3 new optional fields.

If we don’t reach a consensus here, I’ll probably set up a poll asking people to vote between those two options.

Generally speaking, the idea behind versioning was more that clients could warn to end users that their client might not understand every part of the response, and possibly prompt them to upgrade if it’s a newer version of the API.

The thought was minor versions could be used as warnings, and major versions could be used as errors.

I’m not sure how that would work in practice? Why would a client ever care if there was extra information that it didn’t know about? After all, the PyPI response already contains _last_serial, which clients won’t know about.

To be clear here, I don’t have a strong opinion on this - I’m happy to follow whatever versioning policy people prefer, I just need to know what that policy is. My (weak) opinion is that either of the two options I stated above work for me, but I don’t like any others.

Here’s a suggestion: from version 1.1 onwards, all responses must contain an "additional-features" field, an array of strings.

When the project-files response’s additional features has "versions", the versions array must be present.

When the project-files response’s additional features has "file-metadata" (to be bikeshedded), each file must have creation-date and size fields.

1 Like

Because extra information may change what a client does, if it knew how to interpret it.

For instance, if we had versioning prior to python-requires being added, we could have softened the issues around people using a too old version of the client, by warning them that the index is using a newer version, without turning it into a hard failure.

If you want the version to be useful for that, then you need to only increase the version when you make a soft-backwards-incompatible change like python-requires. Otherwise the warning becomes noisy and people will just ignore it.

(Though I’m also unconvinced that the minor version is useful enough to be worth worrying about. In my code I ignore the minor version instead of warning about it, because there are better ways to tell users that they need to upgrade their client.)

Not sure I get the point – you don’t need a JSON field to tell you whether another JSON field is present… you can just check whether there’s a "versions" field directly, instead of checking the "additional-features" field.