I’d say the goal of the restrictions is not to force user-readable labels (which it can’t achieve, per trying to solve social problems with technical means), but merely to avoid filesystem issues. I’m open to slightly extending it (e.g. to block non-asciinum at start), but I agree with @jonathandekhtiar that it feels like adding implementation complexity without solving a real issue.
Actually, I wouldn’t even go that far. Note that the restrictions aren’t just for PyPI publishing but for any wheels built. I don’t think we should prevent people from naming their local variants `_` (a perfectly valid Python identifier) or `1`, `2`…
We chose the 16 char limit to be conservative, but I’m also fine with removing the limit. You can already create dubiously long package names or extremely high version numbers, there aren’t any agreed-upon restrictions for that, which also can cause long filenames.
Given the open-ended intentions for wheel variants, I think it makes sense to be more permissive at first, and restrict the length later if needed.
Going from a short limit to a longer one is backwards compatible for all existing variants, but not for tools that rely on the existing pattern. Going from a long limit (or no limit) to a shorter one is backwards compatible for tools (their pattern still works) and might require renaming some variants (presumably just the ones causing a problem). So the latter path makes more sense to me.
Actually, perhaps the key point here is that we’re defining installer behavior, and I would expect the installers to be lenient about this. So perhaps the right thing would be to leave the length undefined (with perhaps a generic warning about filename length), and impose limits (if any) in a subsequent PEP governing building wheels.
. I think we should continue to leave it up to package maintainers how they spend their character “budget”, and how conservatively (or not) they want to spend it.
Ahahaha okay this is not everyday DPO makes me actually laugh @bwoodsend kuddos to you, that was funny
Alright there seems to be consensus among everybody including the PEP authors, so I suppose we can remove the arbitrary limit with the next PEP revision (please correct me if I’m wrong).
Thanks to community feedback we did the following changes since the initial version published in the PR:
Variant Label Length: Limit completely removed, giving the regex: ^[0-9a-z_.]+$
The variant JSON file served by the index is now considered mutable to facilitate sequential releases on the index (some variants one day, some more variants another day).
Variant JSON schema versioning is now explained.
pylock.toml now fully inlining the content of the variant JSON for performance reason and satisfy mutability.
Rejected Ideas Added: Hash as variant label
We hope these changes will resonate with the community.
I have a concern about this part, as allowing it to be mutable allows changing what a set of variants points to, not just adding more variants unless there are specific restrictions on how this file can be updated.
The actual language of the proposal is:
They MAY require that keys other than variants have exactly the same values, or they may carefully merge their values, provided that no conflicting information is introduced, and the resolution results within a subset of variants do not change.
This file SHOULD NOT be considered immutable and MAY be updated in a backward compatible way at any point (e.g. when adding a new variant).
I understand the desire to be able to upload different artifacts at different times, but I don’t think this is the best way to enable that.
Would it be possible to say that all variants, even those for which a wheel has not been uploaded yet, must be declared in the initial version of the file, along with a way to signify a “to be uploaded” placeholder? This would allow stricter language about what parts may be updated and how they should be allowed to change, as well as give installers an indication of when a variant json is safe to cache based on content.
I think that the only possibility this rules out is adding support for new variants that weren’t even considered at time of release for a version, but that seems natural to me. If you’re supporting new variants, you should probably cut a new release.
I interpret “in a backward compatible way” as not allowing changes to existing variants. Is there an example in what’s that’s not true?
This complicated the upload, as either the uploaded or the index has to check new variant files are allowed to be uploaded.
In addition, what can installers do with this information? They can’t cache, as the to-be-uploaded flag would persist.
The simplest language would be to say “variants can only be added, and all other fields may not be changed”.
I think the fallacy here is treating the version variants JSON as a file to be downloaded, rather than treating it as the same class as the project list and project file list pages.
I disagree, I believe the same code should be the same version, where possible (although there’s an argument for post releases with metadata-only changes).
The specification language of “may” without it only existing as a carveout to a corresponding “must not”, while having the default state of the file being “this is mutable”, leaves changes that wouldn’t be backward compatible allowed, and the backward compatible change or restrictions upon that as only an example.
If they encounter a variants-json without such a placeholder, paired with it not being allowed to add new variants, that makes it safe to cache.
If they encounter a variants-json with a placeholder for a variant the user’s system declares it wants, they can inform the user of this, and maybe the user waits on upgrading versions until the wheel that matches their system is available, rather than only some non-specialized fallback wheel.
There’s a lot more installers may be empowered to do if they have information to make decisions with rather than having to assume all releases ever might have this file change.
Variants have been presented as something that isn’t solvable without explicit support for them. How would the same code handle a new variant it doesn’t declare? Variants have been presented as something that isn’t solvable strictly statically, which would indicate that this isn’t just “if you run this with an updated build system it will suddenly know how to make the right wheel”, and even if it was, that would imply a new minimum requirement not previously expressed by the project.
I’d like to thank the authors for splitting this up.
I have no significant issues with anything in this portion of the overall set of things needed to support variants.
I think I agree in theory with @Liz about handling the mutability of the variant metadata, however I believe that any issues with this might be better served orthogonally by future changes that people have discussed around atomicity of releases as well as the ideas on restricting deletion, paired with a way for index servers to advertise which files the index considers to be immutable, and a way to handle exceptions to this, such as a way to advertise when an artifact was removed for some other reason (ie. allowing deleting malicious artifacts on an otherwise intending to be immutable package index)
It might be possible to make the wording slightly more prescriptive that indexes SHOULD enforce non-conflicting data, but without either requiring this or allowing indexes to advertise in some way if they enforce this or not, it won’t be something tools should rely upon, so the benefit here would more be to setting the social expectations.
I don’t think this rises to a security concern given the pieces in play, so while I have other preferences here, I have no issue with deferring any advocacy of those preferences to another proposal that may be able to handle those concerns more comprehensively.
There are still some minor auditability concerns here, but the remaining ones that are in scope for this portion of the proposal are things I feel are very easy for different people to reach different conclusions on, and the updated wordings make it clear enough for those who care how they should handle it.
I’m not one of the PEP authors, but I would expect most maintainers will not upload every possible combination of variants that exist. I would not try to treat them as closed sets, and I would expect that at least a few of the variants axes will have values added without needing detection code modification (e.g. CPU features).
That’s not the problem at all. There is no requirement that you actually upload wheels for all the variants listed in the metadata. The metadata is merely a lookup table for the tools, and while listing additional variants technically makes it larger than necessary, it’s not wrong per se.
Precisely this. It is not an artifact but metadata. The primary problem that prompted the change is that the index can generate this metadata dynamically, rather than requiring the user to upload it (which is reasonable; why require the uploader to go through extra hoops, when the data is already there?). This implies that the actual metadata grows as subsequent wheels are uploaded.
Ideally, we’d use a mechanism like publishing sessions to ensure that all wheels actually land before this metadata is published. However, to the best of my knowledge, there is no formal requirement to actually require people to use them (and PEP 694 is not even approved yet).
Besides, PyPI is not the only index out there. Technically, any HTTP server can serve as an index, and we can’t guarantee that people won’t be updating their variant metadata files there. So I think it’s better to make it clear that tools can’t rely on the file being immutable (irrespective of whether it changes because the maintainer uploaded new variants or because you happened to fetch it in the middle of release being uploaded), and instead need to be prepared to handle the mutability.
Of course, we can discuss whether PyPI should “lock down” old releases and block uploading new wheels to them, but that’s tangential to this PEP.
It’ll be unsurprising to you that I feel this way but yea I totally agree with this take.
Whatever someone’s opinion is on the mutability of releases, that doesn’t particularly matter because the ground truth is that releases are mutable today, and thus any PEP has to contend with that in some way.
There’s nothing special about variant metadata here, it’s intrinsically similar to the metadata that is encoded in the wheel filename, just more expressive and powerful, and we don’t currently have any requirements that the set of tags for a given (project, version) are a closed set, and it’s both confusing and technically difficult to do that for variants but not the other compatibility tags in the current state of the world.
If someone is of the opinion that the set of wheels for a given project should be a closed set, that’s a perfectly valid opinion, but that needs to be it’s own PEP that holistically enables that, it should not be partially thrown into a PEP that is only tangentially related to that problem. After all, whether the variants are a closed set or not isn’t going to affect whether the release as a whole is a closed set, so trying to shoehorn it in is only going to cause pain because tools won’t be able to optimize for either closed or open sets, they’ll have to support both.
One thing I’d note, and this might be in the PEP already (I’m about to run out so I can’t look), is installers should probably check, once they select a wheel, that the variant information in that wheel matches the variant information that was in the repository metadata.
That’s also not specifically related to variants, installers should probably check that for all of the metadata that they use, if for nothing else but as a sanity check to make sure that the file they’re installing matches that they think it should.
That also acts as a forcing function to say that the definition of what a variant label means cannot change for a given (project, version), unless you also change the metadata inside of all of the wheels for that as well– which isn’t possible to do on repositories like PyPI.
The thing is that’s not inherently different than “the platform tags a project supports for a specific version” should be a closed set. Having it apply for one and not the other makes the system as a whole more confusing.
An example platform tag might be amd64 on Windows, and I can later decide to expand that to amd64 on macOS.
An example variant might be amd64 v2 (with the later instruction sets) on Windows, and if variants are a closed set, I can’t expand that to amd64 v2 on macOS.
Trying to explain why one is allowed but not the other is difficult without just boiling down to “because they use different mechanisms and we just decided to add a limit to one of them”.
Remember that nothing in the packaging standards even requires that artifacts themselves are immutable. It’s perfectly legal to have an artifact change over time. It’s a PyPI constraint that you can’t do that, not a general constraint on the system as a whole. There’s also no requirement in the system that a name is some immutable thing for all time— PyPI applies some constraints that make re-using names less convenient, but in a local repository I might delete a version or a name completely and re-use it.
This isn’t uncommon in systems that have a process where they have QA testing that happens on the artifact prior to publishing it for wider release.
It’s important to separate the requirements that we put on the system as a whole, versus the requirements we put on specific systems like PyPI, as well as making sure that the constraints make sense in the wider picture.
I thought I was clear that what I proposed doesn’t require artifact immutability since I proposed something that explicitly includes placeholder values for any variant tag the project wants to support for that version, but isn’t uploading currently, and for the placeholder to be replaced if and when the corresponding wheel is uploaded. It’s strictly providing enough information for tools to make optimizations and user-presentation in some cases.
I don’t think it’s unreasonable to say if a project is intending on supporting a variant combination, they can explicitly include that fact without having to upload the wheel that corresponds to that tag immediately. I’d expect that if that’s specified, that it’s also automatically done from the matrix of variants in the source distribution. For any new variants not in that, you need new source, so it needs a new version for wheels to match the source, which even though isn’t enforced, should be noncontroversial.
This isn’t the same argument as immutable and atomic releases. I want those long term, but I don’t know how we get those if we’re supporting the ability for 3rd party indexes to do anything they want.