PEP 592: Support for "Yanked" Files in the Simple Repository API

pf_moore · May 10, 2019, 1:35pm

Looks like pretty much everyone is happy with this.

Do we need to add anything covering the point about warnings? Something like:

The presence of a data-yanked attribute SHOULD be interpreted as indicating that the file pointed to by this particular link has been “Yanked”, and should not generally be selected by an installer, except under specific scenarios. Installers MAY choose to warn the user if a yanked link gets installed in an interactive context.

(the last sentence is new).

@dstufft I’ll leave it to you to decide if that’s worth adding. But either way, I’m happy to sign this off once you confirm there’s no more changes you want to make to the text.

dstufft · May 10, 2019, 1:39pm

I’ve got some more clarifications coming I’m working on. Nothing super earth shattering but that does tweak the semantics a little bit.

pganssle · May 10, 2019, 2:18pm

I think it wouldn’t hurt for the PEP to set a “floor” for how implementations should treat the “yanked” attribute, but leave it up to the specific front-end how they want to enforce the semantics, if that’s possible.

Maybe some language like this:

An implementation MUST ignore yanked releases if the selection constraints can be satisfied with a non-yanked version, and MAY refuse to use a yanked release even if it means that the request cannot be satisfied at all. An implementation SHOULD choose a policy that prevents new dependencies on yanked releases - so, for example, a project may refuse to install yanked projects other than when installing from a lock file.

This seems like the maximum freedom for projects while also preserving the integrity of the “yanked” attribute.

dstufft · May 10, 2019, 2:53pm

Alright, I’ve updated the PEP, you can see a diff at https://github.com/python/peps/pull/1042 and I’ve updated the first post to have the latest text.

Major differences:

Explicitly call out the python-requires case as a motivation.
Specify that yanking a file is not permanent, and that tooling has to be able to cope with a file being unyanked (and potentially yanked again in the future).
Specify what will get logged to the journal for yanking or unyanking a file.
- This isn’t really part of the simple repository API, but I wanted to include it because I think it’s an important part of this feature.
Expanded the section how consumers of the API should treat yanked files, both as installers and as mirrors.
- For installers, it takes @pganssle general approach above while also adding some specific recommendations.
- For mirrors it gives two options, depending on how that particular mirror wants to operate.
Says that installers that do choose to use a yanked file, should emit a warning about the fact that they did. This is only a SHOULD so installers can choose to not do that if they want, but ideally they all will.

I’m pretty happy with the current state of this PEP now, I think it gives us a fairly powerful and general feature that projects can use to guide people away from releases that have major known issues, without all of the downsides of outright deleting files.

The only real open question in my mind, isn’t really specific to the simple repository API but rather a general question and then depending on the result of that, maybe a question for how to represent that in the journal.

Because of the nature of the simple repository API, yanking has to be indicated on a file by file basis but that doesn’t answer the question of conceptually what do project authors want to yank? Is it file by file? Is it entire releases? Both?

At first I was thinking of this feature that projects would yank files, but the more I think about it the more I think it might only make sense to yank entire releases. I think there are three major cases that can occur if we yank file by file:

There is a sdist and 1 or more wheels, they are all the same version so they’re all equally acceptable.
- If you yank wheel(s), those yanked files will never be used, so it’s roughly the same as deleting because the sdist is always acceptable for anything the wheel is acceptable for, so an installer would end up using the wheel.
- If you yank the sdist, then the wheels will still be used by default, except on platforms where there aren’t wheels then the normal yanking semantics will come into play. It feels weird to me that in this case yanking or not will vary on a platform basis, and potentionally seems like a footgun for user confusion.
There is no sdist, and only wheels.
- If there are multiple wheels that are acceptable for each platform, then this case is the same as the first sub bullet above.
- If there is only one wheel for a given platform, then the weird platform varying-ness still applies.
There is only a sdist, and no wheels.
- Yanking the only file on a release is exactly the same as yanking the entire release, so there is no difference.

So I think that as a concept, it should be thought of as yanking an entire release, and just the implementation of the simple repository API means that gets represented on a file by file basis. In that case, I think the journal entry should be updated so it’s just yank release and unyank release and doesn’t specify a file at all, because it is all the files, but maybe mirroring software authors would find it more useful to still have it represented in the journal as if it was yanking file by file?

Another open question I think is if we’re yanking entire releases, should new uploads to that release be accepted? Or once a release is yanked should we treat it as locked in some way?

These aren’t exact fits for specifying in the simple repository API, but I think they’re important questions for actually implementing it, and we should probably specify it in the PEP.

pf_moore · May 10, 2019, 3:19pm

Spelling: dependencies

Missed closing parentheses after poetry.lock.

I’d be inclined to make that a MUST NOT.

The whole section about the journal isn’t technically part of the spec, so I’d rather see it as a separate section, maybe entitled “Implementation Notes - Journal Handling”. I’m not exactly sure to what extent the journal is a standardised mechanism, as opposed to an implementation detail of Warehouse, so maybe even note that this is Warehouse-specific if that’s the case.

The question about yanking at release level is important, but if we’re sure it’ll be purely a front end issue, and won’t impact the API spec at all, it doesn’t need to hold up acceptance of the PEP. Maybe it’s worth adding a brief note clarifying that repository implementations may have a higher level interface to control yanking, possibly at the level of yanking a release, but it will be implemented in the API at the file level using the mechanism defined here.

Other than these minor points, this looks good to me now.

dstufft · May 10, 2019, 3:37pm

Alright, another update to the PEP then located at https://github.com/python/peps/pull/1043, this is mostly just typo fixes, but it does include one “real” change, which is moving the bits about Journal to a “Warehouse/PyPI Implementation Notes” section at the very end, which is not directly about the specification but rather just documenting how Warehouse is going to expose this to users.

I figure it’s important to have this information at least documented here since PyPI has a special place in our ecosystem of tools, and the journal API isn’t exactly standardized at all but still generally useful to document how this PEP affects it.

I went ahead and made the decision to say that Warehouse is going to expose yanking/unyanking as an operation on releases rather than individual files. If someone feels strongly that it should operate on files I’m happy to have that discussion, but as @pf_moore has pointed out, it’s not really part of the spec so we can update those notes at any time since they’re just that, notes.

pganssle · May 10, 2019, 3:49pm

I definitely think this is the right decision, since a very common use case for this will be “bad metadata”, which applies per-release and not per-file.

pradyunsg · May 11, 2019, 12:53pm

@dstufft would you be willing to add PEP 503 and PEP 592 as a single document in packaging.python.org (once PEP 592 is accepted)?

@ncoghlan told me that we’re being actively moving specifications to packaging.python.org whenever we have a related PEP, to use the PEP process to describe changes to the specifications.

dstufft · May 11, 2019, 6:03pm

That’s mentioned in the abstract of this PEP yes.

pf_moore · May 12, 2019, 11:02am

OK, this looks good to me now, so I’m formally accepting it.

Thanks to everyone for the feedback, and to @dstufft for updating the PEP to address those responses.

(And to be clear, I agree that if any changes to the “Warehouse/PyPI Implementation Notes” are needed, that’s OK without affecting my acceptance of the PEP).

Congratulations, @dstufft!

btskinn · May 12, 2019, 11:25am

Is it technologically possible, if foolhardy, to publish files to the same release with different metadata? E.g.:

Publish wheel at vX.Y with metadata set A
Change setup.py, rebuild sdist
Publish sdist at vX.Y with metadata set B

Would the above be allowed, or would this inconsistent sdist get rejected? (EDIT: It struck me that sdist metadata checking might be fraught, so this could be a bad example. If so, then say it’s wheel-1 with metadata A and wheel-2 with metadata B.)

If it’s allowed, and should be, what metadata does Warehouse report, and how does this affect PEP592?

If it shouldn’t be allowed, but currently is, should a test for this be added to the ‘stricter upload checking’ discussed at the summit? If so, an explicit error message as to where the metadata inconsistencies are would assist the publisher in correcting them.

dstufft · May 12, 2019, 4:08pm

Awesome, thanks!

btskinn:

Is it technologically possible, if foolhardy, to publish files to the same release with different metadata? E.g.:

Publish wheel at vX.Y with metadata set A

Change setup.py, rebuild sdist

Publish sdist at vX.Y with metadata set B

Would the above be allowed, or would this inconsistent sdist get rejected? ( EDIT: It struck me that sdist metadata checking might be fraught, so this could be a bad example. If so, then say it’s wheel-1 with metadata A and wheel-2 with metadata B.)

If it’s allowed, and should be, what metadata does Warehouse report, and how does this affect PEP592?

If it shouldn’t be allowed, but currently is, should a test for this be added to the ‘stricter upload checking’ discussed at the summit? If so, an explicit error message as to where the metadata inconsistencies are would assist the publisher in correcting them.

It might currently be possible, but we generally do not want people to do it and if possible we will likely restrict it in the future. I don’t think it affects PEP 592 at all though and it would make sense to make it part of stricter upload checking.

sumanah · June 12, 2019, 7:55pm

There’s a WIP PR implementation by @dstufft.

ncoghlan · November 12, 2019, 11:50pm

Just noting the items required to move this PEP for Accepted to Final:

finish up the Warehouse PR at https://github.com/pypa/warehouse/pull/5838
update the specification page at https://packaging.python.org/specifications/simple-repository-api/ to cover both the original PEP 530 API and the PEP 592 additions

sumanah · April 23, 2020, 12:57am

Yay! The first item is done, and I’ve now created https://github.com/pypa/packaging.python.org/pull/723 for the second. @dstufft or @dustin please review?

sumanah · April 23, 2020, 8:44pm

That’s now merged, so I’ve made a PR to change the PEP status to Final.

[Edited <1hr later to say]: And done!