Honestly? I’m not sure how much clearer I could be than I have been before. It’s not just about something being exploitable, but about behavior being easy to audit. We shouldn’t be unnecessarily creating situations that allow sophisticated attackers to hide more easily.
Using a compromised supply chain to target specific users rather than all users isn’t a new concept, and it tends to increase the longevity of a compromised part of the chain. Why would we not want the names to be meaningful and tied to something that would be more immediately noticeable if something unexpected was happening?
If variants naming was centrally agreed upon and a validated requirement, and the name of a wheel has to be predictable from the variants it was selected from, then any even cursory human review should notice that the variant selection must have changed in some way and should go look at that.
Is there a typosquatting issue here? I’m not sure how it could happen, but I’m imagining a malicious wheel with a variant label that’s very close to a well-known one, getting past an audit check because no-one spotted the typo.
I’m not trying to spread FUD here, but I’m genuinely asking the question whether this is something the authors thought about.
Edit: Actually, there’s no need for even a typo here. A malicious package could simply reuse a name like cuda12_6 to mean something entirely different.
Without this, if two providers both have the same marker for some reason, how would you know which axis was being referred to by the marker being present? What if someone does the seemingly obvious thing of naming the marker of an abi provider as just the version the provider detected itself? that’s almost sure to conflict at some point.
This doesn’t seem like a source of ambiguity we should have.
It’s designed on purpose to be that way. We can argue that it’s a poor design choices but it’s on purpose.
Here is the design rationale:
Tools / Index / Installers: MUST reject 2 providers that use the same namespace on the same package.
It’s not even possible because the way you “declare a namespace” needing provider A or B requires you to specify the namespace as a dictionary key => keys are unique => it’s physically impossible to get a conflict.
If provider A becomes “unmaintained” the community can fork it and create provider A reloaded that leverages the same namespace but continue the support
For corporate reason, a company may want to re-implement namespaceX and cook their own provider to fit best their own infra. Or for control, or for security or for extension (hundreds of good reasons here).
Forking is at the core of OSS philosophy and this design embraces this concept. And supports it.
This namespace approach guarantees by design that:
multiple provider CAN use the same namespace (on the package index) => and it’s a very desirable feature
it is impossible that two providers declaring the same namespace be use simultaneously for one package. It’s not something that would cause a problem - it’s purely and mechanically impossible to occur - by design.
In other words => when a package declare that abc-variant-provider is needed for abc namespace, it becomes “authority” for abc namespace for the given package & version (can change next release).
Please keep in mind that what you responded to is part of a chain on how the resulting environment markers translate for use in pylock.toml, not about namespaces.
Unless I’ve missed something, the markers are “by design” allowed to collide and aren’t namespaced, because you’ve assumed this is solved by the namespacing done elsewhere. This doesn’t fit into existing places that need this information.
In pylock.toml, it’s an array of environment markers. It’s also pretty strongly suggested that these should be non-overlapping.
Both of those conflict with things 817 is doing “by design”
Well I don’t think the question about overlap is specific to pylock. I mentioned that I do think the PEP missed the ability to specify what providers the pylock file itself needs. If that’s added, then the answer becomes the same— don’t emit a pylock with providers that have overlapping namespaces, and fail if someone does.
@ncoghlan this comment from ~80 posts back fell through the cracks I think (DPO scaling poorly, yay ), but asked a pretty fundamentally important design question. So let me attempt an answer.
The design doesn’t allow for this “call back into the installer” to determine “co-installed packages”, except for the special abi_dependency[1]. We discussed this multiple times (e.g., in pep_817_wheel_variants#7, the longest discussion thread). It turns out to be extremely difficult to implement that, and probably for limited gain. It would be easier if all the metadata for all packages and their dependencies was all available at once, then one could have something like “mutex packages” a la conda-forge and resolve things in a consistent fashion. But that’s not, and will never be, the case on PyPI. Having all the metadata and the idea of calling back into the installer are both showstoppers.
Take a concrete example: two packages, pkg1 and pkg2, using the blas provider, with two sets of variants, built against mkl and openblas. You can then solve the problem of getting a consistent install (say both mkl variants) in two ways:
Let the resolver sort this out dynamically and ensure the same variants are selected. If it deals with pkg1 first, and then with pkg2 later and comes to a non-uniform choice, it may have to backtrack.
We can use the “ahead-of-time” provider. At that point there is a fixed priority order, and the user can override that. The choices for pkg1 and pkg2 are made by the installer completely independently, no verifying between them and backtracking for consistency.
The way this is done will result in a consistent choice as long as pkg1 and pkg2 were built with the same blas provider and didn’t manually change the default priorities of that provider. If mkl was the higher-prio one, then if the user provided no input both packages would get their mkl variant selected. If the user actively selected openblas by overriding the priority order or by forcing that variant, the end result would still be consistent. So in practice, this will work as long as the same providers are used.
I suspect that what made your intuition focus on this is that option (1) above is strictly more powerful than (2), and has harder consistency guarantees. However, we tried hard to make that work, and didn’t get there - I doubt it’s feasible at all, given the constraints we’re working under with PyPI.
One last point: the “wheels are isolated” nature of how we’re dealing with shared libraries in Python packaging is still present, wheel variants don’t change that. Regular distros need a design that enforces complete consistency (option 1 above) because otherwise there’ll be lots of conflicts. With wheels/PyPI we’re already allowed/assumed to be in an inconsistent state, and adding variants can only make it better by adding more consistency through shared providers.
Also see PEP text “Unfortunately, such a variant provider cannot be implemented within the plugin API defined by the specification. Given that a robust implementation would need to interface with the dependency resolver” in that `abi_dependency section ↩︎
Agreed that would solve that problem, but unless this is suggesting a new field for it, I think this runs into the same problems that have led to suggestions that plugins installed only to determine what to install shouldn’t be installed into / imported from the environment being managed.
That underlying issue explicitly is possible to change the intended execution scope with, and is about more than predictable behavior auditing.
Humans don’t directly interact with variant labels in a way that really affects anything like that. They’re technically scoped to (project, version), so the automated systems don’t care what the values are, and they don’t “leak” between different sets of (project, name). Having them match is mostly just an aesthetic thing.
No. Package names come first. How would a malicious wheel find its way into a release of a regular package?
Again, very clearly no - if the whole package is malicious, who on earth cares whether it’s the default (non-variant) wheels, or just a variant that was labeled cuda12_6? Malicious packages don’t just magically show up in an install, someone has mistyped the actual package name and added that to its dependencies - nothing changes here. No label or marker name ever causes a provider package to be pulled in out of thin air - there’s a dependency declaration needed.
I think the answer is just yes, 817 makes hand auditing wheels based solely on the filename if you’re concerned about a malicious package targeting only a subset of platforms while pretending to target a different subset.
I personally think that is not a super compelling edge case:
This is already somewhat the case anyways given how unwieldy filenames can become with compressed tags or even just implementing the targeting at runtime, though both are more limited.
It doesn’t affect at all the ability to execute an attack, it’s strictly limited to making it harder to discover by trying to target it.
It only affects hand auditing wheels by file name alone. If you inspect the wheel at all you’re able to see the targeting completely— which means it can be resolved by either hand auditing the wheel contents or using tooling to surface the provider information when you’re auditing.
For me, I think that’s a reasonably small cost and the goals of the PEP represent a large enough win to pay that. It’s ok if you disagree! I’ll make sure it’s documented in the PEP at least.
I’m on my phone so I haven’t had a chance to look at your marker post to see if that seems like it would change anything.
You need some sort of short string to put in the file name, so you need a name ( not saying you’re suggesting not, just stating ground truth).
The way variants work you’re defining your own “platform” basically. It’s hard to imagine how to make that naming centralized?
My wheel might be built against x64-v3, require Cuda 11.x, a GeForce 5xxxx, and OpenBlas.
Someone else’s wheel might only care about BLAS.
There’s a large list of possible combinations for even a small number of axis. How do you propose to centralize naming all of them?
Sorry, I think I did my own argument a disservice with imprecise wording.
The variant names don’t need to be fully centralized and agreed upon to accomplish both parts of what I’m suggesting is important, though standardizing the format they should be expected to be expressed in might be useful for discoverability reasons.
We don’t need some pre-agreement on cuda65, or anything like that, but it might be nice to standardize that these should actually be human readable values, even without enforcement for discoverability reasons.
Seperate from that specific part, If the variant label of the wheel (the portion of the filename contributed from variant selection, sorry if this is no longer the right term) needs to be predictable from the variant provider → markers that contributed to selecting it (and indexes verified that this matched the info they were given to advertise to installers), even if it’s literally something like a hash function, it would create visibility immediately on if a package has changed it’s use or calculation of variants, which could then prompt to double check everything is normal on the variant side.
ie. If nothing has changed in how the package is selecting a variant, you should be able to expect the filename is consistent outside of the version number when updating
This little bit of making it predictable and based on input might seem small, but little things like this are how people notice things aren’t right when automated systems are evaded. Every little detail contributes to how people notice things.
I’ll try and do a more thorough review of each place I’ve found what appears to be unneeded flexibility over the next week.
I think overall the plugin system isn’t going to pan out, but enumerating each place the proposal doesn’t specify behavior where there isn’t a concrete benefit to not specifying it will at least create a list of things that can be specified to make behavior more consistent and predictable without detracting from the plugin approach’s reason for existing.
At the level of abstraction everything exists at currently, it’s not easy to look at it and evaluate each part where something is left to tools for potential negative impact. Ideally, since this is for interoperability, the vast majority of this would be specified, and it would only be the plugins themselves that are acting as a black box.
No worries! I’d be lying if I said I never did the same .
Ah, Ok. So the idea of a standardized mechanism/formatting makes more sense than a fully central list. There’s some tradeoffs to doing that (more in a sec) and I don’t think those trade offs are worth it, but I don’t think it’s inherently wrong to make the other trade off.
This gets into one of the tradeoffs!
At the risk of repeating myself, but just trying to make sure I’m being clear since I’ve personally mixed up the words in this thread already , let me just define the two relevant terms real quick:
variant label: The discriminator in the wheel filename that uniquely identifies this particular wheel. Can be any arbitrary string and only has meaning within the same (project, version). Each individual wheel can only have 1 (but it can have 0 as well, then it’s not a variant wheel).
variant property: A value that describes some specific aspect of the system that we’re trying to determine compatibility for.
So the variant label ends up basically identifying a collection of static variant properties that describe that system the wheel is compatible with, and the variant label is what is visible to humans without looking inside the wheel (or at the metadata that was lifted out of the wheel).
One pieces of the puzzle here that may not be obvious, is that the collection of variant properties that a variant label on a specific wheel identifies can be any number of properties and they can span multiple providers.
To use a slightly silly example (since some of these could just use the platform tags).
These 4 variant properties would be the “real” data that the system is looking at, and they could come from 3 different providers.
So we need to come up with a variant label for our wheel that has those 4 properties.
Currently the PEP treats this as an opaque string provided by the author of the package being installed (not the provider, since this can span providers) and I believe it specifies that it should be human readable (if it doesn’t then that can be changed of course). Obviously there’s not a reasonable way to require it be human readable or enforce that in any way.
Where the variant label is x86_64v3_cu11_120_windows.
The second case if the one you’re worried about, where the opaque variant label provided by the package author is misleading (properties say linux– which is what matters, but label says windows, which doesn’t matter except for the human eye).
Since that label is opaque, we could make it deterministic and use a hash. Say 8 characters of sha256 of the 4 variant properties (sorted and such to make sure it’s deterministic).
That’s deterministic and is guaranteed to change if any of the variant properties change (it could still be tricked I supposed by using different provider plugins or something, but we can just extend the hash to include those if we want).
My concern with that is now the variant label is just meaningless gibberish and humans can gain nothing useful from it even in the normal case (other than whether it’s changed since last time) and particularly when you look at a list of these I feel like it’s going to very quickly just become noise that’s impossible to differentiate anyways. IOW, yes you can tell something changed with your eyeball if you’re comparing a single previous wheel to a single new wheel…. but what if you have 20 of them? Are you going to spot the one that’s different between a list of 20 old ones and 20 new ones?
We could mitigate this somewhat by combining the two ideas, human readable and hashed, and do something like
Where the variant label is x86_64v3_cu11_120_ubuntu_e3b0c442.
I guess that’s a little better since the label is human readable again, but it could still be the confusing label again so you have to ignore that label if you care and use just the hash identifier on the end. I think that doesn’t really fix the other problem with hashes, because it still really only works for comparing a single old wheel and a single new wheel, but doesn’t fare well if you’re comparing 20 to 20.
Does that sound like what you’re thinking of, or are you thinking of a different scheme?
To be honest, this feels like Zooko’s triangle territory to me personally, except I think it’s misleading to assign the possible confusion here to the “security” side, because it’s not really a question about “secure”, since it’s only really relevant if several catastrophic failures have already occurred, and even then relies on a human manually auditing wheels by hand with no support from tooling and just happen to notice the platform tags look different.
Reviews are great!
I definitely agree with trying to specify behavior where possible and minimizing the blackbox-ness to the plugins as much as possible.
I think the PEP needs, at a minimum, some refactoring to make it easier to read and follow along and some of the authors have started working on that. I think it needs to lay out how the pieces fit together better too, because I think most of the concepts are not super hard once you wrap your head around them, but I personally had a hard time trying to connect the dots on the moving pieces
Maybe! I’m not personally married to anything here (though a lot of the original hashing out of the idea was done by others, so they may feel more strongly).
I think the problem it’s trying to solve is a pretty useful problem to solve though, and looking at the responses to my vague and high level framing the other day, it seems to me like folks generally are on board with trying to solve the problem somehow. Which is great
I think we can find a solution that has reasonable tradeoffs , maybe it looks like PEP 817, maybe it doesn’t. I know personally I’ve never been someone wiling to sacrifice practical security or safety,and I don’t think we have to. We’ve just gotta sort out the right set of tradeoffs.
I actually intended to still be focused on higher level questions, trying to go a layer more detailed than my high level one sentence “design” the other day and try to poke at some of these foundational questions about the “broad shape” of what an implementation of that idea might be.
Then I got distracted by the discussion today focusing on various details . That’s alright though, I think it was good and productive discussion, and at least on my end, I found it super useful and I think it’s helped move the needle .
Anyways, all that to say, if you’ve got time to review it, that’s great and super appreciated, but I’m personally still less worried about the specific details in the PEP, so if you want to think high level for awhile still, I think that’s fine! (Though obviously that depends on how successful my ADHD brain is at keeping focused on what I planned to do ).
I’ve gotta get dinner and such so I don’t have time tonight, but likely tomorrow I’ll be back to poke at more high levels!
Well, I’m not sure where the right tradeoffs here are. It’s “easy” for me personally to fallback to “the plugin system is far too open” because I would still be fine with requiring hardware to get enough community buy-in to make it into a centralized package to avoid the issues of a plugin system.[1]
So to keep things on the constructive path of “Okay, if the authors really want plugins, lets at least ensure it’s as good as it can be”, I’m trying to go through it for the parts that actually feel like they trade something we have away and minimize those to not keep falling back on every bit of my brain telling me that the install time plugin system is inherently more complicated than the problem space warrants.
My thought process on this particular part of it was that if it is well communicated that this portion is deterministic, and how to reference it when needed, then in normal workflows, like an automated update notice via dependabot, a “simple dep update” that changes variant behavior in some way becomes more noticeable in a diff because more than just the version component of the filename would change.
That would come at the cost of that portion of the name being meaningful beyond “yes/no has this changed” without following through to the underlying information, but it feels like that portion is already on a path to being something that we want to present via better UI in the happy path rather than by that part of the name, since people have suggested an ideal outcome where pypi shows users the specific wheels that correspond to a specific variant selection combination.
I do think there’s something to be said about there already having to have been failures to get to this point where this becomes a useful breadcrumb, but this is a big target, it’s under constant attention from malicious actors, there’s still continuous failures, and we don’t live in a world where even those running their own index would be fully insulated in all cases[2].
And yes, it’s a little late for this to be where something gets noticed as not quite right, but people do notice these little details when all else fails. One of the most sophisticated supply chain compromises was noticed because ssh “felt just a little slow”. If you give people something reliable, some portion of people will notice when it changes.
The more things we can say have defined and consistent behavior, the more anything out of the ordinary should stand out more automatically.
As for reviewing this for potential impacts like this, It’s currently almost distracting how many things seem loosely specified in this proposal, because it really forces examining every bit for the worst case outcomes rather than “there’s only one outcome here”, and while there have been various comments after bringing them up that there was a considered tradeoff, many of the tradeoffs considered aren’t exactly accessibly documented in the proposal.
To start with, if the authors still have “$tool install torch” just works without an opt-in being needed, then there’s an expectation that the providers’ torch needs will already be in a state of trust and community management compatible with that. ↩︎
Despite that at $dayjob, we run our own index, I’ve pretty consistently held that $dayjob needs to be invested in open source tools being secure by design, because we can’t control what other companies we work with use and do. ↩︎
Do I understand correctly that we’re now considering an elaborate attack vector whereas the release publishing workflow was compromised, and the attacker decided to publish an additional variant wheel that targets a relatively narrow subset of users, with the assumption that targeting a wider subgroup or a “more predictable“ labels are more likely to be noticed?
In my opinion, the attack vector here is not much different from Platform compatibility tags. Say, the attacker could publish a manylinux wheel with lower glibc boundary. Admittedly, the sole presence of more axes make it easier to find a niche, say, x86_64 v2 users. However, I don’t really see why having a new variant with an explicit x86_64_v2 or alike label would make users more suspicious than, say, cpu.
Then, there’s the matter that if your release pipeline is compromised, all bets are off. Given that variant labels will likely be used primarily on packages that aren’t pure Python, the primary attack vector is actually modifying the binaries — adding malicious code to compiled extensions that is generally invisible to users, since they tend not to disassemble and analyze code.
This potential for increasingly narrow attack targeting was brought up in a prior round of feedback. In any case, it’s not an elaborate attack, and attackers choosing this route do tend to go undetected for longer periods of time.
Once a compromise happens, it generally takes someone noticing it from the impact, and any disassembly tends to (if at all) only happen for the post-mortem. The fewer users impacted, the easier it is for a compromise to go undetected. Deterministic naming would increase the odds that a change made specifically to narrowly target users wouldn’t be able to fly under the radar downstream of the initial compromise, and therefore reduce the ability of attackers to use such targeting to stay unnoticed for longer, but this isn’t a replacement for the detection of malicious actions taken by a compromised library.
Variants will inherently make the potential for narrow targeting greater, and that on its own is just a necessary tradeoff. But there are portions of behavior that don’t need to be a part of that tradeoff.
The impact of this one portion of the problem is primarily on the audit side of the equation, with some potential security impact because of the reliance many people have on auditing happening when it comes to “popular software” (the majority of users not running their own index are relying on safety in numbers)
I do think there is room for reasonable disagreement on whether this particular detail is important.
With that said, given that this is a change from the current wheel name semantics on indexes where wheel names do currently have a level of determinsm attached, if the authors don’t see this as a property worth preserving, it should at least be explicitly noted that it’s a property the authors were aware of, and made a decision to allow this to be arbitrary going forward.
Now that I’ve slept a bit, I want to poke at the plugin topic a little bit! Again at a high level, so specific implementation details about what’s in 817 aren’t super interesting to me currently.
I don’t believe anyone disagreed with my assertion that determining whether the current machine has some property requires having the “installation process” [1] execute some code to introspect the machine [2], and that this holds true independently of Variants (e.g. it exists today for all of our compatibility tags).
So regardless of how we express a constraint on a piece of hardware, we need code that can detect it.
Our existing “axis” we just defined what axis must be supported, and we left it up to each installer to figure out how to implement them. That’s something that we could do [3], and it would solve the $tool install torch by default problem and simplifies the execution model. One of the trade offs here is that each installer has to figure out how to implement it on their own (and in practice, for the Python installers there will likely be a default implementation that most people use). The other major tradeoff is that it excludes the ability to handle more niche hardware.
While this was being worked on, a use case that came up was a robotics lab that couldn’t share their libraries because of inability to specify hardware. I think this would also apply to things like Piwheels, and the whole embedded space in general?
We can of course decide that we’re OK if we exclude solving the “selecting the right wheel for niche hardware” and they’ll have to use one of the available workarounds. That’s just a tradeoff we have to decide on.
Another option is some sort of runtime loading of that code (e.g. plugins), which is what PEP 817 has (again, I don’t care about the details right now for how this actually selected/fetched/executed).
That allows us the ability to handle all use cases, regardless of how niche they may be. I think that’s a pretty useful benefit, but it may not be worth the tradeoffs here.
Installers would not have to handle implementing the detection of any individual “axis” we decide on, as that would all move into the code loaded at runtime, so they’d no longer have to care at all about the specifics of various hardware and can focus on more important aspects in their code base.
I agree with the idea that allowing arbitrary plugins by default wouldn’t be secure. That means that to solve the $tool install torch works by default we need to reintroduce some mechanism to select certain plugins that are trusted– either at the ecosystem level or at the tool level (presumably if we chose the tool level, an allowed choice that tool could make would be not to trust anything by default). This could be an allow list, or a central PyPA owned library, vendoring, or whatever.
There’s another option that has come up which is rather than some sort of plugin have a data file that can be passed into an installer that declares what properties the system has, and leave generating that file as unspecified.
I mostly see this as a special case of the plugin idea– we’re just moving the plugins out of the installer but hand waving away the fact that for the system to work, the needed code is still going to have to run, so someone still has to deal with that (presumably just not us). It also has the same tradeoff with the $tool install torch work by default goal, we’d have to pick some of the generating code to trust (either ecosystem or tool level) and then merge that generated data with the passed in data.
I don’t think that’s a bad feature, but I don’t think it meaningfully changes the distinction between whether we should define our axis directly in the spec or whether we should define a plugin system that allows the community to deal with defining axis.
There’s also of course options where we do both– I could imagine a world where we say the things we want to provide by default are defined by spec, but we provide a plugin system that allows for the more niche hardware cases (though that’s sort of is pretty similar to the idea of an allow list I guess?).
That leaves us I think with these tradeoffs:
Define the axis directly as part of the spec
Pro: No impact to the execution model of an installer.
Pro: What an axis means is defined in spec and doesn’t require knowing who provides the axis.
Con: Installers are on their own to implement those axis.
Con: Only handles use cases that are popular enough to rise to the level of a PEP.
Con: Limits the ability for axis to be “agile” (e.g. changing semantics to match the real world, like when we had to adjust macOS to be >= instead of ==).
Con: Limited ability to
Define a plugin system and let plugins define the axis.
Pro: Handles all hardware regardless of popularity.
Pro: Eliminates the need for installers to think about specific axis and how to implement them.
Pro: Axis can be more agile and respond to changes easier.
Con: Changes the execution model to require executing these plugins, though it would likely match build backends?
PEP 817 obviously goes for the second option, and I think it’s pretty easy to say that, at a minimum, a plugin solution must be “opt in” as the default behavior. I think there is room within that model to either allow for a specific set of trusted plugins to be used by default OR combine spec defined (for the “by default”) with plugin defined (for the “arbitrary”) OR just leave it “opt in” and allow the installers to decide what, if anything, they want to provide by default.
I’d be curious what people’s thoughts are. Do we think being able to handle all the use cases, regardless of if a platform is popular enough to write a PEP for it Is an important enough use case? I suspect that each individual one of those pieces of hardware are minor, but the aggregate might be large enough for us to care? As we add more axis, is the need to implement the different detection code going to become increasingly difficult for installers to implement and maintain, and would punting on that to plugins be useful?
It doesn’t specifically have to be the installer doing this. ↩︎
Excluding things where a human uses prior knowledge to manually indicate what properties the machine has. ↩︎
This is independent of how those constraints are expressed in the system (e.g. platform tags vs variant properties/labels vs something else). ↩︎
Wouldn’t that actually make it easier for e.g. Rust tools?
Also simple library that helps detect hw_info.is_available(feature) could be provided anyway.
Another option is defining the important axes (mainstream GPUs and CPUs) in the spec, and leaving the door open for opt-in (via plugins or manual flags to indicate availability) to exotic scenarios. Is there even a example (non-mainstream, no GPU or CPU) where manual opt-in is unacceptable?
Do most people really want to install and run every single “detect very exotic hardware” provider just because one project theoretically supports it?