Implementation variants: rehashing and refocusing

msarahan · June 26, 2024, 2:27pm

I feel like this is a similar situation to when Pip introduced the proper solver, and things that used to be possible now would not work.

For situations where mutual exclusivity is necessary for correctness, if an installer allows multiple implementations to be installed, then it is undefined behavior. It might work, but it would probably rely on delicate import order to get the “right” thing. There are surely some environments where this is the current reality. I would argue that these environments are broken, but if people really need to keep using them as-is, we should provide per-package variant overrides to allow these to continue (perhaps especially as an escape hatch for incorrect variant metadata). In practice, we may not need that, because these mixes currently happen now by accident and arguably by lack of metadata.

jamestwebber · June 26, 2024, 2:34pm

It will probably depend on the dependencies in question–in some cases it isn’t about importing Python, but about C extensions looking for specific libraries. If both versions of the library are in the expected place then they should be able to co-exist.

I think the example I gave is plausible enough–I have my standard BLAS1 scientific environment set up and I want to install new package X, which for whatever reason has only released a variant for BLAS2. I think it’d be kind of a shame if installation failed, or tried to backtrack, because it didn’t want to install both versions.

pf_moore · June 26, 2024, 2:41pm

I know, and this is what concerns me. My understanding of what people are saying is that this rule is a requirement, and there’s a lot of discussion about how to support it, but the discussion seems to be ignoring the fact that it breaks existing assumptions that the ecosystem is currently based on.

Dependencies on variants is irrelevant at the level I’m talking about. My concern is that having to select the variant of A based on what’s already installed for a totally unrelated package B is a completely new requirement that we don’t know if existing tooling can support. Recording variant choices in an environment-specific file doesn’t really affect the fundamental problem - reading one variant choice file is no different, except in performance terms, from reading the metadata for each installed package to get that same data.

No, it’s no harder. We already need to know all of the installed variants to ensure we don’t pick an incompatible variant of A. Having to also make that variant compatible with whatever variant C expects is just a single extra check.

Disclaimer: As usual, this discussion is all hopelessly theoretical. Framing it in terms of numpy and BLAS would be more helpful, but apparently the numpy developers have no intention of publishing multiple BLAS variants of numpy, so it’s still not realistic. I’m happy to keep discussing theoretical scenarios, if that’s helpful to people, but I personally think it would be more useful to pin down exactly what the real requirements are here. If numpy/BLAS don’t need things to follow my assertions that “everything has to use the same variant”, then that’s fine, but what about other uses of variants, and what are the rules numpy/BLAS need us to follow?

msarahan · June 26, 2024, 2:41pm

That may be true for “BLAS” when library names and symbol names do not overlap. Namespace collisions are subtle but often catastrophic. When I say “importing Python,” I implicitly mean loading particular C extension libraries with each import. This is a good exploration of the issue with regards to how Linux works: How do shared library collisions break?

dhellmann · June 26, 2024, 3:02pm

It is completely fine with me if we don’t try to solve that case. I expect automation to be specifying variant parameters in the vast majority of cases, and that automation would produce the values the same way each time. In the cases where a user overrides those settings, or there is no automation, then it’s up to the user to choose and the packager to provide reasonable defaults. It’s not the PyPA’s problem.

dhellmann · June 26, 2024, 3:05pm

I’m trying to convince you that’s not a requirement at all.

I can’t speak to the numpy requirements. For the other cases, this is why I want selector packages. Those selectors will feed data into the selection process in a way that ensures repeatability for different packages, even across invocations of the installer.

I’m working on a prototype, but I’m also on deadline at work so it may take a little time.

msarahan · June 26, 2024, 3:08pm

numpy has to use a specific BLAS that is either compatible with any python library that also loads it in the same process, or otherwise name mangled to achieve decoupling. If it is unrealistic to think that numpy would ever provide builds against anything but OpenBLAS, it is at least realistic to say that newer versions of OpenBLAS may not be compatible with older versions, and that the dependency on OpenBLAS must be versioned to ensure compatibility. If OpenBLAS is available as a package on PyPI, we can treat it as a normal dependency. However, having it on PyPI of course opens up questions of who builds and maintains that, which goes back to Doug’s point:

If OpenBLAS is not available on PyPI, we can’t express a dependency on a particular version, at least not without something like PEP 725. My point is that the numpy/BLAS situation is not merely theoretical. Moving OpenBLAS to being a shared resource opens sensitivity to the need for alignment across consumers. Even without libraries being a separable shared resource, symbol collisions are a current problem with wheels: Do name mangling on individual symbols? · Issue #79 · pypa/auditwheel · GitHub, and we’d benefit from alignment anyway.

pf_moore · June 26, 2024, 3:22pm

Not entirely. The old resolver had known bugs, and allowed things that aren’t allowed under the new resolver, but we always had pip check which would validate if the environment followed the actual rules. Nothing that stopped being possible was ever acknowledged as being valid behaviour.

Here, we don’t even have a clear definition of the actual rules yet, much less a checker that validates if an environment follows those rules.

I would agree the environments are broken. I would not support giving people tools to manage such broken environments - that’s a slippery slope that (IMO) we should not start down. And I would fully support any tool that considered the ability to create such an environment to be a bug, and actively worked to prevent it.

I think I’m back to being confused, then. Isn’t “that case” basically what people are saying needs to happen for BLAS libraries?

I can pretty much guarantee it’ll end up being the pip maintainers’ problem, though

oscarbenjamin · June 26, 2024, 3:29pm

They are actually different (incompatible) builds of openblas in the current wheels:

$ ls site-packages/*/*blas*.so
site-packages/numpy.libs/libscipy_openblas64_-99b71e71.so
site-packages/scipy.libs/libopenblasp-r0-01191904.3.27.so

the NumPy vendored library is a 64-bit openblas but I think that the SciPy one a 32-bit build. They would need to be able to use a compatible build of openblas before they could share that build but that is definitely an end state that they would like to get to. If the general packaging system made it straight-forward to support sharing the library then there would be a good incentive for people to put the work into harmonising the two builds.

When I say that these two builds of openblas are incompatible I mean that the other binaries in the SciPy wheel have been built for their bundled BLAS and could not directly make use of the one bundled by NumPy. It mostly works okay to have both BLAS libraries installed as long as they are mangled and isolated. It is just redundant to have two BLAS libraries in the same venv and maintaining the two separate builds is more total work for the two projects.

oscarbenjamin · June 26, 2024, 4:06pm

I think that the expectation should be that there is a common base distribution that is the decider of the variants and upon which all of the other distributions that care about the variant depend.

If you have A and B and they both want BLAS which can be blas1 or blas2 then you need to make a separate base package called say blas with variants blas[blas1] and blas[blas2]. Then A and B both need to depend on blas and need to ship variant wheels that require the particular variants of blas (e.g. A[blas1] requires blas[blas1]). The selection logic belongs in the blas package and all dependent packages should respect that.

It should be the expectation that the maintainers of A and B will provide wheels for all variants of the base blas package or otherwise it is fine for pip to bail out with an error message. Ideally the variants are usually selected automatically but let’s suppose that the user requests incompatible variants explicitly:

User does pip install A[blas1].
pip dutifully installs the required blas[blas1] as well.
User then does pip install B[blas2].
B[blas2] requires blas[blas2] which is not the installed variant of blas.
pip exits with an error message about incompatible variants.

Probably the above makes more sense when talking about CUDA rather than BLAS e.g. you have cuda_base with variants cuda_base[cu11] and cuda_base[cu12]. The cuda_base package does not necessarily need to ship any meaningful code but it represents the choice about which CUDA version the environment will use. Every other CUDA using package should depend on cuda_base and should ship both variants like cudf[cu11] which requires cuda_base[cu11] etc.

pf_moore · June 26, 2024, 4:34pm

Sorry to be pedantic, but is that an “expectation” (as in, we hope people will do this, but we need to be prepared for the possibility that they don’t) or a “requirement” (as in, we formally disallow violating this rule)?

Because I’m flagging issues based on the question of what do we do when people don’t do things the way we expect them to. And if the plan is actually to turn that “expectation” into a “requirement”, then a lot of the edge cases go away (at the cost of making it more difficult to enforce the rules we’ve just added, of course )

Again, does “should” mean “tools actively disallow violating this rule” or “everything falls apart if people don’t do this”?

OK, cool. I get that, and it seems reasonable to me. But if we’re requiring users to explicitly specify the variant everywhere, that’s not much different from now (where users specify an index that contains the desired variant). I thought the required UX was that users would almost never specify which variant they wanted?

I know so little about either ecosystem that it makes little difference to me personally But the idea of packages having to depend on a cuda_base package that contains no functionality sounds like it’s something that would be very easy to forget. And once there’s a version of package A that has no cuda_base dependency out in the wild, it would be all too easy to end up with installers backtracking to that version rather than failing when there’s a variant incompatibility.

oscarbenjamin · June 26, 2024, 5:31pm

It is an expectation in the sense that it is what project maintainers need to do or otherwise users will end up with problems. This is the same as specifying any dependencies: there are no formal rules around project maintainers including accurate dependency information in metadata but they usually want to do so so that people can install and use the project.

That’s why I said:

We mostly do not want users to request specific variants explicitly but there are cases where it would make sense to do so. We also mostly do not want distributions to require particular variants of other distributions unless there is some actual reason for the requirement.

For example suppose we have variants but do not have an automatic selector mechanism. That still makes it possible for someone to pip install python-flint[x86_64_v4] to get the build that is optimised for their CPU. It would be better if it was done automatically by pip install python-flint but being able to select it explicitly is still useful. Even if we have a selector mechanism if there were such variants then I would want to have a way to select them explicitly so that I could run e.g. comparative timings or perhaps debug issues with particular variants.

In the CUDA version case it is possible that someone has both CUDA 11 and CUDA 12 installed. They might prefer to use one rather than the other and so they can do pip install cuda_base[cu11] when creating an environment and then pip install cudf does the right thing afterwards. Most users would not need to do this and could do pip install cudf which would install cuda_base and run the selector mechanism to find which CUDA version is installed.

I don’t think that there should be backtracking here. Rather this is like not having a wheel for the given platform. It should fail with an error somehow (or try build the wheel as is the current situation…).

If I understand correctly your suggestion we suppose that project A has old versions that do not depend on cuda_base and new versions that do. For some reason only A[cu12] wheels are provided but not A[cu11]. What is installed is cuda_base[cu11] but the available wheels for A are not compatible with that. You are concerned that pip would backtrack to find the old version of A but I don’t think pip should do that. The options as I see it are:

Exit with error
Replace cuda_base[cu11] with cuda_base[cu12] which may require replacing other dependent packages as well.
Try to build A[cu11] if that is a valid build of A.

I see it as being up to project A either to provide the A[cu11] wheels or declare that users should use CUDA 12 if they want to use project A. This is just like saying that you either provide wheels for say Linux on ARM or you tell users “sorry go build it yourself”. I don’t think that an installer like pip should try to solve this as part of the resolution by backtracking.

dhellmann · June 26, 2024, 6:11pm

Maybe it’s a nice-to-have, then. I don’t think solving that blocks other progress. Giving the user a way to express their intent, either explicitly on the command line or via a configuration file, gets us a first implementation. Future improvements might include pulling variant parameters from the metadata of installed packages automatically, but that’s not part of the MVP, as far as I’m concerned.

dhellmann · June 26, 2024, 6:13pm

I completely agree.

dhellmann · June 26, 2024, 6:22pm

Right. Variants are for selecting builds, not for specifying dependencies.

The fact that you have to pick the same variant of 2 things for them to work together is no different from having to have the same ABI level or system architecture. Those just happen to be implicitly managed by the installer because it can figure out those values for itself and those axes of variation are built into the tools. That’s the experience we should be providing for these extensible variants.

pf_moore · June 26, 2024, 6:49pm

OK, cool. Sounds like there’s general agreement, in which case any concerns I have about how pip will actually do this can wait till later. Apologies for how long it took me to understand what you were describing here.

dhellmann · June 26, 2024, 7:44pm

This conversation is definitely helping me turn a rough idea into better expressed requirements and non-goals, so I appreciate your persistence.

barry · June 26, 2024, 9:39pm

I’ve been thinking about PyPI “staged” releases somewhat along the lines of “draft releases”. I think we’ll need these for variant support one way or another, but I also don’t think it would solve the “migrations” use case, because the latter involves a wide swath of the dependency graph with likely dozens of owners.

barry · June 26, 2024, 9:45pm

In my mind it would be both. The static variant selectors would come from the config file while the dynamic variant selectors would come from the record file. For the B package which doesn’t show up in the dependency graph of the subsequent install command, you don’t have to track B itself, just whatever dynamic variant selector set B has narrowed your venv into.

oscarbenjamin · June 26, 2024, 10:05pm

We should have staged or draft releases regardless of the variant discussion. It is already a problem that not all files are uploaded at the exact same time. It would be better to have a way to sign off on the release on the PyPI side rather than having to manage everything only through github’s complicated access controls.