Wheel caching and non-deterministic builds

I’m a maintainer for the cocotb project. We are finally trying to solve a particular annoying bug for our users regarding our build process.

One of the C libraries we build requires us to pull headers from the user’s system (they are proprietary headers). If those headers are not found, that libraries is not built and a message is printed. Attempts to use the library that failed to build will fail at runtime.

So, we have a non-deterministic build process. However, pip always caches the built wheel. The user later attempts to load the missing library and it fails. They try reinstalling our package, or even force-reinstalling, to no avail. Most of them are unaware that pip caches built wheels and there is an option to turn off that behavior that is required to actually rebuild the wheel to include the library they need.

This isn’t the first time our project has had issues with the wheel cache. In that issue, the wheel cache would store wheels with C libraries that had RPATHs hardcoded pointing to a particular Python environment. That wheel and the RPATHs would be reused in successive installs into other environments on the system, but the Python environment the RPATH pointed to could be deleted.

So I have some questions:

  1. Where are the rules/assumptions of pip, wheel, or whatever dictates the current PEP517-compliant setuptools-based build process detailed?
  2. Is having a non-deterministic build against the rules?
  3. It would be serious effort, but would it preferable to separate out the non-deterministic part into it’s own package that can deterministically build/fail? Thus making the combined build process deterministic?
  4. Why is there no way to state in the project source that wheels created from that project are not safe to cache?
  5. Why is caching the default behavior? Any code in setup.py has the potential to make builds non-deterministic and potentially not reusable. Any difference in the user’s shell environment could also affect the build.
1 Like

I took a quick look and was surprised to find that there’s no statement in any of the “obvious” places in the specs, but the basic rule used by pip is that two wheels with the same name and version are assumed to be functionally identical.

A non-deterministic build would break that rule, yes. Although I don’t like the term “non-deterministic” here, the build is deterministic, but tries to adapt to the user’s environment in ways that cannot be captured in build tags. In particular, the wheel is safe to cache, unless the user installs or uninstalls the optional dependency - something that won’t happen randomly.

The normal way of handling this appears to be to split out the part that depends on the optional library, as you suggest. But I don’t work with libraries that have this sort of dependency, so I’m not the best person to advise on options here.

There’s no way to state that wheels are not safe to cache, (1) because we never thought there would be a need, and (2) projects have never suggested that it would be useful in the past. Also it would have a significant performance impact (see next point).

Caching is the default because it provides massive speedups, particularly for packages that are reinstalled a lot (common in situations that use lots of virtual environments, for example testing). Making users enable a cache every time would negate most of that benefit. And while it’s certainly possible for setup.py to build totally different things every time, that’s not the norm, and the trend is moving away from that sort of scenario (static build definitions, like setup.cfg, are becoming much more common).

Presumably you already have to tell your users not to build their own wheel of your package and use that? So you could also advise them at the same time to add --no-binary <your-package> when invoking pip.

3 Likes

Why not making the cache opt-out, e.g. add a pyproject.toml or setup.py option to disable it for a particular project that knows it can produce different wheels every time?

2 Likes

That’s an option.

It wouldn’t work as a setup.py option, as the cache is a pip feature, and setup.py is for setuptools¹. But it could be in pyproject.toml as a tool-specific option for pip, although pip currently doesn’t use pyproject.toml for any of its options so there could be unexpected complexities to iron out there.

I don’t personally think it’s worth the effort (IMO telling users to use --no-binary if needed is a sufficient workaround), but if someone is motivated they could create a PR.

¹ I’m assuming no-one is suggesting making this setting part of the standard project metadata.

1 Like

Can this not be solved by build tags? If two wheels have different content, they should be named differently, and the build tag distinguishes between two wheels when everything else is equal.

If you don’t want different variants to be orderable (build tags are sortable), the local version segment can also be used to achieve this (PyTorch uses this approach), with the cost of not being able to distribute on PyPI—which I assume is not an issue for the use case, since the project needs to perform environment detection on build time and therefore can’t distribute wheels anyway.

Either way, the cache problem is automatically solved since the wheels have different names and become different entries.

I don’t object to providing a way for a project to declare “don’t cache me”, but it does not seem to me to be the correct solution to this particular use case. The problem is not wheel-caching, but correct wheel-caching, which we already have enough mechanism for.

3 Likes

If it can, then that would be ideal. But my understanding is that this is “does library X exist on the system?” which is not covered by wheel tags.

1 Like

Build tags will not work. We cannot distribute the binaries in question. We are simply trying to avoid reuse of the built wheel in local caches. It comes from the same source code.

--no-binary is also not a solution. We have adequate documentation on the necessity to either have the application and it’s headers available, or to reinstall and ignore the cache. We still have users come to us with problems. They are easily addressed, but it shouldn’t be their problem. The problem is that the current solution of “just pass more flags” requires users to know the issues or to specifically look at our installation instructions on what is supposed to be “just another Python package”. And the fact that it is (nearly) universal tells me requiring all users to know and to pass a flag is wrong.

A pyproject.toml flag would work brilliantly. A suggestion of an opt-out flag is what I was hoping for. Not being able to opt out of assumptions that are breaking you is a bug IMO.

Long term we will try to become more compliant, but it is currently a huge amount of effort to do so. We are currently generating side-by-side assemblies in our setup.py to emulate RPATH on Windows to work around several issues our use case dictates. Having the library built and installed into a different package makes that solution not viable. So we will have to rethink our entire library loading system and rip up a lot of hard work.

2 Likes

Can you elaborate why build tags will not work? A wheel’s build tag is in its file name, so giving the built wheel a different build tag gives it a different name, makes it cached under a different key, thus not incorrectly reused.

2 Likes

I’m impressed! I’ve never heard of anyone who managed to get SxS assemblies to do something useful before :slight_smile:

There’s a much simpler trick that might help, depending on what exactly you’re doing: when Windows needs to find a .dll, the very first place it checks is the list of .dlls that have previously been loaded in the same process. This happens before it starts searching the filesystem or anything like that. So, if you do handle = ctypes.windll.LoadLibrary("explicit\path\to\your.dll"), then that will effectively add your.dll to the RPATH for all future dlls loaded in that process. (You don’t have to do anything with the handle, except maybe keep it around so it doesn’t get garbage-collected. You just create the handle in order to trigger this weird side-effect inside the Windows dll loader code.)

1 Like

You may also find that os.add_dll_directory is enough RPATH emulation for your needs (it’s new in 3.8, which is when we stopped Python treating PATH as an RPATH substitute).

1 Like

Can you explain how to do that? It’s not immediately clear to me how I could leverage build tags to prevent wheel reuse.

The typical user experience is:

  1. pip install cocotb
  2. pip install cocotb

But the second time we don’t want to reuse the wheel if it is missing FLI support, and the FLI is detected in the environment.

1 Like

We are embedding Python into an existing process. The first thing it loads is a pure C library, that then loads another pure C library that is in a different directory (but the same package), which is why we need SxS or RPATH. Maybe the whole reason we needed SxS is because we have everything in separate directories =/

If we move the library in question (which is the first library loaded by the application) to another package, we are going to have to add a DLL entry handler and pass it location information via environment variable to runtime load the second library. Which is approaching levels of grody never thought possible. Everything sucks when you aren’t in charge.

2 Likes

With setuptools, for example, you can do something like

from setuptools import setup
from wheel.bdist_wheel import bdist_wheel as _bdist_wheel


class bdist_wheel(_bdist_wheel):
    @property
    def build_number(self):
        if _has_fli_support():
            return "0_fli"  # Build number must start with a digit.
        return None

setup(
    name="cocotb",
    version="1.0",
    cmdclass={"bdist_wheel": bdist_wheel},  # Override the wheel-building command.
    ...  # Other args.
)

Now when you pip install cocotb with FLI available, pip will build a wheel named something like cocotb-1.0-0_fli-py3-none-any. The extra 0_fli part will extinguish it from the non-FLI wheel named cocotb-1.0-py3-none-any.

Or, if you go with the local version approach, you’d dynamically calculate version instead, and call the versions something like 1.0 and 1.0+fli.

1 Like

I will point out that conda (and conda-build) have solved those issues. It lets you package non-Python dependencies and install them naturally in your environment of choice, regardless of its location in the filesystem. It handles RPATH-like hackery for you transparently.

2 Likes