Packaging DLLs on Windows

(Jeroen Demeyer) #1

Continuing the discussion from Drawing a line to the scope of Python packaging:

That’s news to me. Do you know of any documentation or simple example package demonstrating how to do that on Windows? For example, something like https://github.com/pypa/python-manylinux-demo (which shows how to use auditwheel) but for Windows would be great.

CC @pf_moore

0 Likes

(Nathaniel J. Smith) #2

The thing Steve is talking about isn’t really helpful for shipping DLLs inside wheels. The challenge is that you want to isolate each wheel, so that if two wheels happen to ship different versions of (say) openssl, they don’t accidentally interfere with each other. To do this properly, you can’t put your openssl.dll on the search path, because that’s shared by all packages, and you can’t even call it openssl.dll, because the Windows DLL loader assumes that if it’s ever seen a file called openssl.dll, then that’s what it should use for all future files called openssl.dll, even if they’re being loaded from a different path.

AFAIK right now the only reliable ways to ship DLLs in Windows wheels are:

  • Manually give all your DLLs unique names. This probably requires manually hacking your build system to use the new name, maybe using black magic to generate some new .lib files, or else using a hacky tool built by some random person on github to patch your built binaries in-place. Then, hack your package’s __init__.py to either manually pre-load all these DLLs by absolute path, or else mutate the process’s PATH envvar to add a new directory you control, where you’ve placed all your DLLs. Or something involving AddDllDirectory (but that’s Win8+ only, so you can’t rely on it). The details are extremely complicated.

  • Don’t ship DLLs; use static linking instead.

NumPy actually does the first option – if you look at numpy-1.16.2-cp27-cp27m-win32.whl on pypi, you’ll see it contains a file called numpy/.libs/libopenblas.JKAMQ5EVHIVCPXP2XZJB2RQPIN47S32M.gfortran-win32.dll. I honestly have no idea how this is accomplished – I just spent 20 minutes searching through our wheel build infrastructure and can’t find it. Possibly Matthew Brett is the only person who knows :slight_smile:

I think sensible people generally use static linking.

IMO, comparing the three major platforms, they each have their own weird quirks, but in the end they’re all pretty similar in terms of how well they can support portable wheels. But currently our tooling for macOS and Linux is better than our tooling for Windows, so supporting Windows is probably harder in practice. Windows really needs an auditwheel equivalent to catch up. (And it would be technically straightforward to do this, just no-one has done it.)

0 Likes

(Steve Dower) #3

Nathaniel is talking about the integration work I mentioned (e.g. Conda handles this for the most part by building packages against the same DLLs so they don’t conflict).

Otherwise, adding one as package_data to go alongside the extension module that requires it is sufficient.

For 3.8 I also enabled better control over DLL resolution, so you can keep them in a separate folder from your module if you prefer: https://mobile.twitter.com/zooba/status/1112204206071373826 (it doesn’t solve the problem Nathaniel is talking about automatically - you need SxS assembly manifests for that, but so few people ever got those to work that they’ve been abandoned now)

1 Like

(Nathaniel J. Smith) #4

Oh cool, I’d missed that AddDllDirectory got backported to Win7. And it’s nice to have a wrapper in the stdlib. But it doesn’t make a huge difference? You still have to make sure to embed some kind of unique hash into the filename of every DLL you vendor, and you still have to add some code to __init__.py to set things up. The difference is that the code you add to __init__.py can be a little simpler (especially once projects drop support for 3.7).

You still need an auditwheel-like tool to make this usable in practice, I think.

Please don’t tell people they can stick random DLLs with non-uniquified names in their packages. That’s the biggest contributor to “DLL hell” (= trying to load one version of a DLL but getting a different one).

0 Likes

(Steve Dower) #5

System integrators can totally do this, and provided they’re integrating their packages correctly into the final environment (e.g. matching builds) they won’t have any issues.

Please don’t tell people they can safely use wheels without also learning how to be a system integrator. That’s what leads people into trouble, because it’s not true.

1 Like

(Nathaniel J. Smith) #6

Sorry, when I said “packages” I meant “Python packages”.

Sure if you’re packaging for conda you can and should use conda’s mechanisms for this. But most people writing python packages are not exclusively targeting conda. Your tweet specifically says “If you bundle precompiled DLLs with your wheels or use ctypes, you should read […]”. Anyone who’s bundling precompiled DLLs with their wheels or using ctypes needs to know about the tricky issues around naming and accidental cross-package conflicts.

Nobody using wheels should be reading about add_dll_directory; this is all about what people who are building wheels or other packages needs to know.

0 Likes

(Steve Dower) #7

I didn’t mention conda, why bring that up?

The system integrator is the person who installs packages to create an environment for their app to execute in. None of the tools we currently have do that perfectly by themselves, so there’s often some manual help required to choose matching builds or put them in a valid layout. That’s all I’m saying here. No need to turn this into another pip vs conda thread.

0 Likes

(Nathaniel J. Smith) #8

OK… I just have no idea what you’re talking about then :-). Unless you’re saying that you think that if users create a fresh venv on Windows and then pip install multiple packages from PyPI, then they shouldn’t expect to get something working?

0 Likes

(Steve Dower) #9

Yeah, that’s basically what I’m saying.

A lot of the time it works, often thanks to huge efforts like what you and others have put in, but ultimately there’s not enough coherence between build and runtime environments for packages on PyPI for it to actually be reliable.

Someone has to do some extra work to make the packages work together, either the distro maintainers (who convert sdists and other packages into their own package formats) or the end user (who doesn’t necessarily have any idea what to do, so they mostly just complain to the package developers). This person is the system integrator.

0 Likes

(Nathaniel J. Smith) #10

Well, I’m surprised to hear a MS employee saying that we can’t expect precompiled binaries to work as well on Windows as they do on Linux and macOS :slight_smile: But I think you’re overly pessimistic. Making Windows wheels that work together reliably is totally doable, and people are doing it right now. We just need to make the knowledge of how to do it more accessible.

0 Likes

(Steve Dower) #11

Heh, maybe I’m just most informed about the way things may not work :wink:

Agreeing on specific DLLs is by far the best way to do it. For example, if scipy pins a particular numpy build and agrees to use its DLLs then it’ll be fine, even without renaming anything. But it requires an agreement on an API between the packages that are using that same shared library.

But as long as libraries pretend to be totally independent while relying on the same shared dependencies, it genuinely can’t be relied upon to “just work”.

0 Likes

(Nathaniel J. Smith) #12

Yeah, we’ve tried that… it sucked. There is no way for scipy to pin a specific numpy build. (You can pin versions, but not builds.) And then you ask someone to try building numpy from git to see if it fixes a bug, and it makes their scipy starts segfaulting, etc.

Vendoring libraries in wheels works well, but you really want each wheel to be self-contained, and not make any assumptions about any other packages, except for whatever public API contracts they provide. (Usually this means depending on their public Python APIs only.) And to make your wheels self-contained on Linux and Windows you have to give any DLLs unique names. (On Linux, auditwheel takes care of this for you.)

If you want to reliably share DLLs between packages, then, well, conda does handle this case well :-). Or, we could support it for wheels via something like the “pynativelib” proposal I wrote up a while ago, to create Python packages that mediate access to a specific DLL. The technical details turn out to be surprisingly complicated (this time it’s macOS’s fault), but it’s totally doable. But no-one is actually working on this right now.

1 Like

(Jeroen Demeyer) #13

Absolutely.

0 Likes