PEP 592: Support for "Yanked" Files in the Simple Repository API

(Donald Stufft) #1

I’ve just submitted PEP 592 which will implement the ability to mark a file as “yanked” in the simple repository API. Because this is adding a new attribute, it will not affect how any current versions of pip interpret the simple repository API, however it will allow us to solve the problem of trying to mark files as “don’t actually use this” going forward.

I’ve included the PEP body below, and it can be viewed online once the PEP pages sync the latest version.

I’ve also had this PEP move the canonical location of the simple repository API specification to the packaging guide.

Abstract

This PEP proposes adding the ability to mark a particular file download on a simple repository as “yanked”. Yanking a file allows authors to effectively delete a file, without breaking things for people who have pinned to exactly a specific version.

It also changes to the canonical source for the simple repository API to the Simple Repository API reference document.

Motivation

Whenever a project detects that a particular release on PyPI might be broken, they often times will want to prevent further users from inadvertantly using that version. However, the obvious solution of deleting the existing file from a repository will break users who have followed best practices and pinned to a specific version of the project.

This leaves projects in a catch-22 situation where new projects may be pulling down this known broken version, but if they do anything to prevent that they’ll break projects that are already using it.

By allowing the ability to “yank” a file, but still make it available for users who are explicitly asking for it, this allows projects to mitigate the worst of the breakage while still keeping things working for projects who have otherwise worked around or didn’t hit the underlying issues.

One of the main scenarios where this may happen, is when dropping support for a particular version of Python. The python-requires metadata allows for dropping support for a version of Python in a way that is not disruptive to users who are still using that Python. However, a common mistake is to either omit or forget to update that bit of metadata. When that mistake has been made, a project really only has three options:

  • Prevent that version from being installed through some mechanism (currently, the only mechanism is by deleting the release entirely).
  • Re-release the version that worked as a higher version number, and then re-release the version that dropped support as an even higher version number with the correct metadata.
  • Do nothing, and document that people using that older Python have to manually exclude that release.

With this PEP, projects can choose the first option, but with a mechanism that is less likely to break the world for people who are currently successfully using said project.

Specification

Links in the simple repository MAY have a data-yanked attribute which may have no value, or may have an arbitrary string as a value. The presence of a data-yanked attribute SHOULD be interpreted as indicating that the file pointed to by this particular link has been “Yanked”, and should not generally be selected by an installer, except under specific scenarios.

The value of the data-yanked attribute, if present, is an arbitrary string that represents the reason for why the file has been yanked. Tools that process the simple repository API MAY surface this string to end users.

The yanked attribute is not immutable once set, and may be rescinded in the future (and once rescinded, may be reset as well). Thus API users MUST be able to cope with a yanked file being “unyanked” (and even yanked again).

Installers

The desireable experience for users is that once a file is yanked, when a human being is currently trying to directly install a yanked file, that it fails as if that file had been deleted. However, when a human did that awhile ago, and now a computer is just continuing to mechanically follow the original order to install the now yanked file, then it acts as if it had not been yaned.

An installer MUST ignore yanked releases, if the selection constraints can be satisified with a non-yanked version, and MAY refuse to use a yanked release even if it means that the request cannot be satisfied at all. An implementation SHOULD choose a policy that follows the spirit of the intention above, and that prevents “new” dependencies on yanked releases/files.

What this means is left up to the specific installer, to decide how to best fit into the overall usage of their installer. However, there are two suggested approaches to take:

  1. Yanked files are always ignored, unless they are the only file that matches a version specifier that “pins” to an exact version using either == (without any modifiers that make it a range, such as .* ) or === . Matching this version specifier should otherwise be done as per PEP 440 for things like local versions, zero padding, etc.
  2. Yanked files are always ignored, unless they are the only file that matches what a lock file (such as Pipfile.lock or poetry.lock ) specifies to be installed. In this case, a yanked file SHOULD not be used when creating or updating a lock file from some input file or command.

Regardless of the specific strategy that an installer chooses for deciding when to install yanked files, an installer SHOULD emit a warning when it does decide to install a yanked file. That warning MAY utilize the value of the data-yanked attribute (if it has a value) to provide more specific feedback to the user about why that file had been yanked.

Mirrors

Mirrors can generally treat yanked files one of two ways:

  1. They may choose to omit them from their simple repository API completely, providing a view over the repository that shows only “active”, unyanked files.
  2. They may choose to include yanked files, and additionally mirror the data-yanked attribute as well.

Mirrors MUST NOT mirror a yanked file without also mirroring the data-yanked attribute for it.

Rejected Ideas

A previous, undocumented, version of the simple repository API had version specific pages, like /simple/<project>/<version>/ . If we were to add those back, the yanked files could only appear on those pages and not on the version-less page at all. However this would drastically reduce the cache-ability of the simple API and would directly impact our ability to scale it out to handle all of the incoming traffic.

A previous iteration of this PEP had the data-yanked attribute act as a boolean value. However it was decided that allowing a string both simplified the implementation, and provided additional generalized functionality to allow projects to provide a mechanism to indicate why they were yanking a release.

Another suggestion was to reserve some syntax in the arbitrary string to allow us to evolve the standard in the future if we ever need to. However, given we can add additional attributes in the future, this idea has been rejected, favoring instead to use additional attributes if the need ever arose.

Warehouse/PyPI Implementation Notes

While this PEP implements yanking at the file level, that is largely due to the shape the simple repository API takes, not a specific decision made by this PEP.

In Warehouse, the user experience will be implemented in terms of yanking or unyanking an entire release, rather than as an operation on individual files, which will then be exposed via the API as individual files being yanked.

Other repository implementations may choose to to expose this capability in a different way, or not expose it at all.

Journal Handling

Whenever a release has been yanked, an entry will be recorded in the journal using one of the following string patterns:

  • yank release
  • unyank release

In both cases, the standard journal structure will indicate which release of which project has been yanked or unyanked.

Copyright

This document has been placed in the public domain.

1 Like
(Paul Moore) #2

Maybe clarify that having no data-yanked attribute is interpreted as false (it’s just having the attribute but with no value that’s interpreted as true).

Also spelling: “interpreted” (one “t”)

Presumably (a) only if the requested version doesn’t include a trailing .* (which you show in the examples) and (b) also ===? Also, ==1.1.0 matches 1.1 (by the rule about zero padding) and I assume yanked versions would be accepted in that case too.

I suspect the rules are subtle enough that we’d want the packaging library to provide a reference implementation, rather than having installers just interpret things for themselves. But equally that means that it would be nice to be clearer here, so that we didn’t risk ending up with edge cases being implementation-defined in the packaging library.

“an installer”

(Donald Stufft) #3

Updated the PEP, you can see the full diff at https://github.com/python/peps/pull/1034 but the major changes are:

  • Switch the Delegate to @pf_moore
  • Change the data-yanked attribute from a boolean to a string to allow embedding a reason for why it was yanked.
(Paul Moore) #4

Looks OK to me. I’ll wait a while for any other interested parties to add their comments, though.

Once accepted, I assume the following actions are needed:

  • Implement the flag in Warehouse and add some form of UI for it to PyPI.
  • Support the flag in pip (my preference would be for this to be done in packaging, and for pip to just use that implementation, but I don’t know how plausible that will be in practice).

Anything else? Will tools like pipenv need special action, or do they just pick this up via pip?

(Donald Stufft) #5

Those steps are roughly correct yea.

I don’t know exactly how pipenv uses pip internally so it may or may not need to change, hopefully changing things in pip is enough.

This change would ideally live in packaging, but the repository API access all lives in pip itself still so it’ll require changes there at a minimum. What can live in packaging as it exists right now is logic to determine which specifiers are for an “exact” version or not, so we can toggle on/off yanked files in pip’s finder based on that.

(Paul Ganssle) #6

@dstufft First off, thank you for this PEP, it is going to make the transition from Python 2 to Python 3 much easier, in my opinion.

Two things:

  1. Would it make sense to have some “reserved” syntax for this, in case we need to make future modifications? Something like “Ending the text in data-yanked with text in square brackets is disallowed, as that syntax is reserved for future modifications.” Then if we have some unforeseen need to communicate additional information, we can do so in a backwards-compatible way.

  2. I think it may be worth documenting in the motivation section that this is particularly needed when the way that the package is broken is in metadata telling installers that a package is not suitable for a specific platform (e.g. python_requires), because making new releases to correct the metadata will be ignored by installers on the unsupported platform!

(Nathaniel J. Smith) #7

Strong agree.

1 Like
(Donald Stufft) #8

I don’t think so, we don’t need to smuggle extra data in that string, if we ever need to communicate more information we can just add another attribute.

I can do that sure.

(Tzu-ping Chung) #9

Pipenv currently simply delegates package discovery to pip, so I think it should be fine. It will likely affect some other tools that implement their own Simeple and JSON API client though (including distlib), so this will need to be done with as much visibility to the community so people can fix stuff before end users notice breakages.

(Donald Stufft) #10

And just to be clear, the possible user breakage here is similar to the python_requires rollout where the absolute worst case scenario is that someone will install a yanked file when they otherwise shouldn’t have. That’s less ideal than not doing that, but it doesn’t introduce a new type of breakage, the unhappy path is basically just acting like this PEP doesn’t exist.

1 Like
(Chris Jerdonek) #11

Clarifying question: does “yanked” mean to imply it’s permanently yanked, or can a file be “unyanked”?

Also, there’s a typo here (should be “may have no value” I think):

1 Like
(Donald Stufft) #12

One thing that I’m considering is changing the wording around when an installer should use a yanked version or not. I’m worried that I’m being a bit too pip specific in my wording. I’m looking at the prior art here and Cargo for instance will only install a yanked crate if it’s pinned inside of a Cargo.lock file.

Obviously pip doesn’t have a Cargo.lock file analogous but maybe we should make the wording in the PEP a tad bit more ambiguous and tell installers that the intent is that no new dependencies can be created against the yanked version, but that existing dependencies continue to work. This would suggest that something like pipenv or poetry would be best suited to only installing it from a lockfile and pip would… I’m not 100% sure, either install it for == and === or perhaps even go one step further and only install it for == or === when coming from a requirements.txt?

I’m kind of torn on this though, because as it stands the PEP is more consistent, but it means that something like pip install yanked-thing==1.0 still works, and I’m not sure that we want it to. What do other people think?

I would lean towards allowing it to be unyanked, and I’ll update the PEP to say that. I think this is a case where we can give the project more control to do what makes sense for their project rather than constraining them.

Yea, I’ll get that fixed in the next update.

2 Likes
(Tzu-ping Chung) #13

I might have missed some content, since I am not sure why this would be problematic.

(Xavier Fernandez) #14

I’d say pip install yanked-thing==1.0 should behave the same way as pip install -r requirement.txt if requirement.txt contains yanked-thing==1.0.

(Chris Jerdonek) #15

Agreed. My instinct is that the rule shouldn’t depend on how it’s invoked (at least for pip). Otherwise, I can see this creating confusion when people are trying to diagnose issues because the behavior will subtly vary. But I haven’t thought deeply about it.

(Nathaniel J. Smith) #16

It sounds like what you want is: humans are not allowed to type new ‘yanked-thing==1.0’ requirements, but if a human typed it a while ago and now it’s just a computer continuing to mechanically follow orders, then that’s allowed.

Unfortunately, I don’t think that’s something we can reliably distinguish. You could have an old unmaintained Dockerfile that says ‘RUN pip install yanked-thing==1.0’, and you could have a requirements.txt that someone typed a few seconds ago.

Maybe the closest you can get would be for pip to print a warning when installing a yanked version? That way if a human did type it, they’ll see it, and if it’s a computer, they’ll ignore it.

1 Like
(Xavier Fernandez) #17

I like that :slight_smile:

(Donald Stufft) #18

Yea that’s roughly it. The pip case is the hardest case I think. Tools like pipenv, poetry, even pip-tools has it easier because they can just support yanked files only from lock files. It might be the case that a warning in pip is the best we can do given how pip works.

1 Like
(Pradyun Gedam) #19

The PEP looks good to me in its current form.

pip printing a warning when using a pinned yanked release, sounds right to me.


I think we should take this opportunity to move the simple repository API spec to packaging.python.org, like we’ve been doing for so many others.

1 Like
(Bernat Gabor) #20

I agree with @pradyunsg on this one. Great work on this everyone!