I had to wait for written confirmation that an internal tool was not considered secret. I also have been asked to say that this is not a formal request from my employer, and that we’ll work around this if the community goes through with it anyway.
The fact that this is even being seriously considered has shaken faith in some of our internal tooling.
The example of this I have and needed to get permission to speak about involves static analysis.
We download (using pip download --only-binary=:all:, but it isn’t clear to me that variants don’t count as binaries, and the proposal leaves this up to individual tools?) dependencies regularly and analyze statically to check for API changes in libraries. We do also read the changelogs, this isn’t abdication of review process, but a tool meant to spot changes we might be impacted by sooner. Everything from docstrings changing to calling convention changing is detected statically, and we get automatic notes to verify specific things. This runs regularly, prior to even getting to the point of asking our security team to approve a version for production use. (we have an internal wheelhouse, and build ourselves). That this is even being considered when wheels were supposed to not execute code on download has caused that tool to be frozen and flagged for reassesment of risk, because we can’t know going forward that the community will not change an important property of wheels in the future.
We were specifically using pip in a script to do this to match the resolver logic without having to reimplement it, and without importing pip (which pip does not support use in that way)
To be clear here, variants are still binary wheels.
The behavior of pip download is specific to that tool. Whether pip (or any other tool) decides to execute third-party code is not determined by a specification, it can choose to do at any time.
I do not think this specification should require any tool to execute third-party code (i.e., variant providers). Right now, the verbiage is that installers “should” use providers to determine available variants. I don’t even think that’s necessary, it seems fine for it to be “may”.
The intent of the specification is to create a structure that allows packages to express variants and gives tools the choice to invoke providers to determine the variants to use for wheel selection and validation.
I’ve not seen anything that requires (or even suggests) making any sort of change without warning [1].
At least in my head, the questions of “is this the right change” and “how do we communicate this change” are related, but separate questions.
Funny thing, the spec never requires, guarantees, or even mentions that wheels avoid code execution, just that they avoid needing to install build tools. Though I think conventionally everyone generally expects that currently. Gotta love the ambigiutiy of the old specs ↩︎
The behavior of unpacking a wheel is very specified. At no point is any code within the wheel executed in that specification. This changes that, and attempting to gloss over existing behavior and assumptions that were part of the format we were sold as an improvement from source distributions that people rely on isn’t helping with the perception that this is a change we are going to have to redesign our processes around this.
Yes, pip could decide to do something not specified with the current wheel specification, but if it did that in a way that executed code provided by wheels it was downloading, we’d discard all trust we have for pip. This is instead specifying that wheels can provide code that can be executed to determine what to install.
Yeah, I think this might be doable if we’re willing to constrain the set of axes and govern them through standards, i.e., create environment markers (e.g., add cuda_version_lower_bound and sm_arch environment markers) for these things and (presumedly) encode them in wheel tags. (I would need to run through a few examples to figure out the sticking points with all we’ve learned from the current design.) That approach does have downsides, though. In that world, we’re signing up to maintain the available markers and their semantics through standards when many of them will ultimately be vendor- or tool-specific. We also need standardized implementations of all the various providers for detection, which will require coordination with vendors (like NVIDIA – defining and detecting these properties is not totally trivial). Package managers would need to vendor those implementations, etc. Part of the thinking behind the current design is that it enables this development to happen without requiring constant changes to specifications by standardizing on the general schema and principles (variants). I don’t mind pushing on that concept though. It ends up being similar to variants, but with a pre-defined set of properties and providers that are governed by a standard.
Edit: Just as an example: if NVIDIA introduced a new versioning scheme, then in this world, we’d need to update the standard / specification to include that new marker, add an implementation for it, and propagate those changes to the marker grammar and the corresponding detection implementation to all packaging tools.
I’m not looking to debate the sentiment behind your comment or gloss over any changes but I do feel the need to clarify a few details because I want to make sure the proposal is well-understood in this thread:
Unpacking a wheel does not require executing code. The wheel variant specification does not change this.
Wheels themselves do not provide code that can be executed to determine what to install. A JSON file is published alongside the wheels that lists the available variants, the features they require, and the providers that you can run to determine whether those features are available on your machine. You can still install a wheel without running any code, if you know what wheel you want. You should also be able to define, in advance, the set of features that you want the installer to “assume” are available on your machine, without running any of the provider code. If you want the installer to infer the correct wheel automatically, then it would need to run the provider code.
Replying to your footnote that discourse won’t quote, I think it’s a reasonable assumption that we aren’t executing random code without a specification for it, and the pep that introduced wheels specifically called out arbitrary code being run as part of the rationale. If people actually think it’s reasonable to execute remotely downloaded code without user input or specification, when the specification only provides that it’s a zipfile and what to do with those files, then I think we have a serious problem in terms of what people are presenting as reasonable behavior, and all future specifications would need language “installers must not do things not specified…”. The absence of that language does not make doing so reasonable.
The current proposal says it should be opt out, not opt in. If a tool was to adopt this immediately as opt out I would consider that as no warning. Regardless of any other sort of user out reach.
To provide sufficient warning I think it must start as opt in (at least for existing tools) and then it can be up to the tool how to communicate and transition to opt out, assuming that’s agreed on recommended default.
I feel like this is actually a problem. This is a remote file providing something to run in a context where something previously wouldn’t be run. Whether the existing expectations people have are well-specified or not, they do seem to me as reasonable expectations that are being broken here, and making this something that tools can choose to do means the mental model people have will no longer be packaging standard focused, but tool specific.
I put that in a footnote specifically because I don’t think that “the spec doesn’t say this” isn’t really a great argument for anything, but I think people are being a little disingenuous when they’re using that as an argument.
I think that wheels allow you to avoid code execution is a good and useful feature, that isn’t promised by the spec, but that people are currently relying on. Which I think is a lot stronger argument than trying to nitpick over what some ambigiously worded phrase from 13 years ago meant.
I think it can be reasonable. It can also be unreasonable. Life and Engineering is about trade offs not absolutes. Maybe this feature the trade offs are worth it, maybe they’re not. Software exists to evolve with the needs of the day or it gets replaced.
… and the intersection of those two questions is the “How do we teach this?” section of a PEP. And any specification of a transition plan in a PEP.
I’m used to evaluating PEPs, so for me, answering the question “is this the right change” without havning good information about how the change authors propose that we communicate the change and handle transition is pretty much impossible. It’s not the right change if we can’t educate people about the impact, or manage the disruption caused by the change.
I think the way the discussion has been going here, there’s a very clear suggestion that the proposers expect automatic evaluation of the selector plugins, and a lot of comments are being made from a perspective where that’s the norm.
I think it would be much easier to have this discussion if the proposal was carefully neutral over how selector plugins get executed. For now, omit all discussion of how the consumer knows whether the target environment supports selector XYZ, and concentrate on how the right wheel will be picked assuming the answer is known. This is very much like the existing wheel tags - the spec describes how a wheel is selected based on tags, but says nothinng about how tools know what tags the environment supports.
By doing this, we can have a discussion about the mechanism in the abstract, without the underlying approach being eclipsed by the question of “running arbitrary code”.
Then, as a separate discussion, we can debate how we handle selector evaluation for environments. That could be as simple (and vague) as “tools have to invent their own mechanism and there’s no standard”, or a static approach that allows the user to build an environment spec file, probably by using tools that implement the same code as the proposed plugins do, or a fully dynamic and automatic approach using load-on-demand plugins. We can define the minimum approach that tools must support, and we can define interoperability standards for more complex approaches, so that tools can share and reuse selection code. That discussion can also establish standards and guidelines around how to handle security, reproducibility, trust, and all of the other issues that the “dynamic plugins” approach is raising.
Does it matter? any is not a supported Python tag.
Do all tools ignore invalid wheel filenames? I know I used to write a lot of scripts that bulk-processed the contents of PyPI, and I’m pretty sure I never wrote any code to skip invalid wheel filenames.
Does your tool happen to verify wheel version?
Besides, our goal wasn’t to make sure that no tool ever could do anything about a variant wheel without explicitly being updated for it. Our goal was to ensure that installers won’t accidentally install them.
The label length is strictly limited to prevent the wheel filenames to become much longer than they are now, and causing issues on systems with smaller filename or path length limits.
I don’t think it impossible to agree on a longer limit.
Okay, perhaps we should have made one point clearer: the specification as linked has been frozen at some point to be able to test the current implementation. As the security implications section proves, we are aware of the issues with opt-out approach, and we are definitely going to address them. We did not discard them as irrelevant; we deferred working on them until the other technical details are more settled, and we have a clearer idea of what the solution space is.
Since I keep seeing people being confused as to when providers would plug into the resolution and installation story, here’s some pseudocode to help illustrate where providers would come into play (based on my understanding):
while unresolved_deps(context):
dep = get_next_unresolved_dep(context)
release = resolve(dep, context)
wheel = find_wheel(release)
add_wheel(context, wheel)
for wheel in wheels(context):
install(wheel)
def find_wheel(release):
for tag in platform_tags(release):
if tag in release:
return release[tag]
raise Exception("No compatible wheel found")
def platform_tags(release):
variants = []
for provider in get_providers(release):
new_variants = provider() # ⚠️ 3rd-party code execution.
variants.extend(new_variants)
yield from add_variants(packaging.tags.sys_tags())
Notice the execution is only at the step of choosing a wheel, not installing one.
I’m purposefully not commenting on the proposal, just trying to help clarify a key detail.
We would have to centrally approve and design every variant axis.
We would have to centrally maintain implementations of every axis for every possible system, and keep updating it to account for new values that can’t be predicted.
The users would have to keep updating the relevant tools (installers? CPython?) to take advantage of new values that were just implemented.
If our design turns out to be incorrect and does not work for some package, we end up either telling them “sorry, can’t do” or implementing a replacement axis to cover the new use case.
I’m not saying that it’s impossible. However, in my opinion this is going to be a significant effort and maintenance burden. We are talking about people having to centrally maintain support for hardware they don’t have, platforms they aren’t able to use, and in the end, they won’t be able to reliably tests that they still work.
Compared to that, decentralized (or even semi-decentralized, if we consider vetting available plugins) system has the advantage that:
Plugins for specific variants are maintained by people with direct stake in these variants, who are knowledgeable about the hardware in question and able to test them.
Plugins can be released and updated independently, to account for changes required by specific variants.
If an existing plugin for a given topic does not fit the needs of your package, you can always create a new plugin that does it better.
By no means I am claiming that this a perfect solution. On the contrary, it is a solution designed to work in an imperfect world.
I think this makes a lot of sense, especially as the variants proposal moves to formal PEP stage. This would be especially helpful in order to keep the PEPs “reasonable” in size and comprehensibility. One PEP could be a Standards Track PEP describing the mechanisms, file formats, etc. of how a static selector matrix is used to resolve wheel variants, and the other could be an Informational PEP giving (non-binding) guidance and recommendations to the tool ecosystem for how they might implement the discovery of machine capabilities and how that maps into the variant resolution process.
The tension here is between secure-by-default and maximising convenience in the common case especially for novice users.
I’m singling out this particular misconception with an important reminder that it’s not the dichotomy a lot of people seem to think it is. Novice users are the ones most in need of secure-by-default solutions because they don’t understand the risks enough to know what should be secured, much less how to do so.
Indeed. I think I understand what @steve.dower was suggesting now, which is effectively “fat” binary wheels that support every possible variant, along with runtime code that “imports”[1] the right bits of that fat binary. Apologies if I still misunderstand.
That’s indeed the problem. This stuff can get huge, and even if wheel size limits weren’t a problem[2] being able to download just the bits you need will provide a much better end user experience.