Notes on binary wheel packaging for C++ library wrappers

Recently I’ve been putting a bunch of spare coding time into a package I’m calling cyexiv2. It provides a Pythonic interface to the C++ library libexiv2, which reads and writes metadata embedded in digital photographs. It’s an API-compatible fork of py3exiv2. (Note: py3exiv2’s README describes it as an updated version of an older package called “pyexiv2” that only supported Python 2, but the “pyexiv2” package that’s currently on PyPI is a completely unrelated project. I haven’t been able to find a copy of the older pyexiv2.)

I started working on this package because my spouse, who is an avid amateur photographer, wanted to be able to use it on Windows without installing MSVC herself—so my goal from the beginning was to put binary wheels for Windows on PyPI. This has proven much more difficult than I anticipated—three weeks and I’m still not done—and I think my experiences might be relevant to the current conversation about improving packaging process and tools.

cyexiv2 is not on PyPI yet; the link above goes to the Git repo, and you might find it useful to page through the recent commit history while, or before, you read this.

We do have good tools for much of the job

The bulk of this post describes problems, so I want to start by saying that it could be much, much worse. I wouldn’t be attempting this if cibuildwheel didn’t already exist. I couldn’t attempt this if various CI companies weren’t willing to give back a little by offering free build cycles to open source projects. (I’m using Azure Pipelines because they offer Windows, MacOS, and Linux build environments all on the same service, but Microsoft isn’t the only outfit doing this.) My CI driver script is over a thousand lines long, but easily a third of it is devoted to making it easier to troubleshoot problems when I can’t interact directly with the build workers, and another big chunk is all about creating and validating sdists, not wheels. (And it would be shorter as a shell script, but also harder to debug.) My is quite short and straightforward, and would be shorter if some bugs in distutils and setuptools were fixed (see below).

The best way to think of the rest of these notes is as a real-world example of the infamous ninety-ninety rule.

Compiling third party C++ libraries: such a hassle

The bulk of the work I’ve put into this package is nothing directly to do with Python. Rather, it’s related to the difficulty of compiling third-party C++ libraries on MacOS and Windows in a way that Python modules can reliably use them. The main reason cyexiv2 is a fork rather than an in-development pull request for py3exiv2, is that py3exiv2 uses Boost.Python for glue between CPython and C++. I have had extremely bad experiences with Boost in the past; I wasn’t even going to try using it. I spent a solid week of evenings writing a new glue layer using Cython and I am convinced that that was still less work than it would have been to get a Boost library built for three different operating systems × 32- and 64-bit pointers × five different versions of CPython (3.4 through 3.8 inclusive).

(I would have tried to work with py3exiv2 unmodified if system packages of Boost.Python were available for all the necessary configurations. But, for instance, CentOS 6 (the base environment for manylinux2010) only offers a pre-built package of Boost.Python matching CPython 3.5, and choco only has Boost at all for Visual Studio 2013 and 2015, whereas py3exiv2’s shim uses C++11 features so VS2017 is required.)

It was absolutely necessary to build libexiv2 from source, because CentOS 6’s packages are too old, Homebrew and choco don’t have it at all, and if you build the library without special options it writes debugging messages to std::cerr. py3exiv2 worked around that by eating everything written to cerr, but I felt that was inappropriate. Instead I make a point of building libexiv2 with the right options, and I hooked pytest to fail the testsuite if anything showed up on stderr. (And then I had to patch libexiv2’s own testsuite to not expect any debugging messages on stderr. Apparently the libexiv2 developers never use the “no debugging messages” mode themselves. Sigh.)

Building libexiv2 from source is the point at which CPython becomes part of the problem. The 2011 revision of the C++ standard introduced enough breaking changes that “C++98” and “C++11” are effectively two different languages. In particular, C++11 is not fully supported by Apple for any MacOS release prior to 10.9, but distutils invokes the MacOS C++ compiler with settings (MACOSX_DEPLOYMENT_TARGET environment variable) that request backward compatibility all the way to 10.6. If the “deployment target” for the extension module is lower than the “deployment target” for any of the shared libraries it uses, the linker will error out. I did manage to get everything building after a fair bit of head-scratching. However, pip wheel is still producing wheels tagged as compatible with 10.6, even though I’m pretty sure they require 10.9, and I have no idea how to fix that.

On Windows I have a similar problem: distutils wants to use different settings for MSVC’s C++ runtime’s internal debugging modes than libexiv2’s build does, leading to a flood of LNK2038 error messages. This wouldn’t be a problem if I could build libexiv2 as a DLL on Windows, but that brings us to the next problem…

auditwheel for Linux, delocate for MacOS, ??? for Windows

…no one has yet written a tool that does for Windows what auditwheel does for Linux and delocate does for MacOS: find all of the DLLs that need to be packed into a wheel so that it works when installed on a system that doesn’t necessarily have any of those DLLs.¹ This is already being discussed in other threads [1] [2], so I won’t rehash it too much here, just emphasize that it’s a potential tripwire for anyone who wants to build binary wheels for Windows.

At least cibuildwheel’s internal testing process did catch the fact that I needed this nonexistent tool, which is better than what happened on MacOS and Linux. On those OSes there’s a system-wide default installation location for locally compiled shared libraries (/usr/local/lib) and the dynamic loader automatically looks in there to resolve dependencies, which is great when you just want to compile stuff yourself and run it locally, but not so great when you’re trying to build something redistributable. So, when auditwheel was crashing because the most recent version of auditwheel is incompatible with the most recent version of wheel (yes, really, sigh) testing was not catching that because the wheel appeared to work anyway. Two points for reading build logs very carefully, I guess.

¹ I get twitchy about this process because it’s one of the points where pip’s violation of the Highlander Principle of Package Management becomes likely to cause real serious problems, but that’s a whole other thread. should have an equivalent of make distcheck

Probably everyone reading this has left something important out of at least once. There’s a similar mistake that’s just as easy to make when packaging C programs with traditional Makefiles, and so automake has a handy feature to help you not do it: make distcheck, which builds (the equivalent of) an sdist tarball, and then unpacks that tarball into a temporary directory and does a test build, test install, etc. to make sure nothing is missing. (or any future replacement) should grow a similar feature and people should be encouraged to run it in CI. (It’ll probably be too slow to be used as a pre-commit hook, unfortunately.)

I implemented something like this in the CI driver script I linked to above: it builds an sdist tarball and a wheel directly from a Git checkout, and then it unpacks the sdist in a temporary directory and builds another sdist tarball and wheel from that. The results are required to be byte-for-byte identical to the first sdist and wheel. Test results are also required to agree, although not exactly, because test-results.xml files record the time each test took, which obviously won’t be an exact match.

Reproducibility is not just for binary packages

I’m a big fan of the reproducible builds project and not just because of worries about malicious infrastructure and injected malware. If not for the work that went into making wheels be reproducible, I wouldn’t have been able to require byte-for-byte agreement between a wheel built from a git checkout and a wheel built from an sdist tarball. (There’s still a few weird tricks one needs, like manually passing -fdebug-prefix-map options to GCC, but I have the impression that that’s on the reproducible builds people’s radar and not something that needs messing with in distutils or whatever.)

To date this project hasn’t concerned itself much with reproducibility of source packages. However, considering pip’s habit of building things from source, it’s equally valuable to be able to assure yourself that the source tarball you pulled off of PyPI does exactly match the corresponding release tag in the developer’s VCS. distutils and setuptools can almost build reproducible sdist tarballs. There are only a few bugs in the way, and all of them have already been filed: bpo#38632, bpo#38725, bpo#38726, bpo#38727, and setuptools#1893.

Fixing these would also be valuable progress toward a make distcheck equivalent. I’m actually monkey-patching them in my setup script because of that.

The distutils/setuptools split is still confusing

In recent releases of core Python, the distutils documentation discourages you from reading it, pointing you at the setuptools documentation instead. The problem with this is the setuptools documentation is written for people who have already read and understood the distutils documentation. It’s full of unexplained terminology and outright gaps that you have to refer back to the distutils documentation to get filled in on. It doesn’t even explain all of the arguments to setup() that you need for a basic, pure-Python package. I hope it is obvious why this is bad. There’s text in the latest version of the “legacy” distutils manual implying that people are working on improving the setuptools manual. I hope that gets finished soon.

Not all of the changes imposed by setuptools are documented, and some of them seem to have been mistakes. The example I know about right now is that in bare distutils, sdist has options -u and -g to override the user and group IDs written to tarballs, which is, again, valuable for reproducibility. Setuptools, however, disables those options by overwriting the user_options array for the sdist command. This is setuptools bug #1893, mentioned above. I bring it up again because it’s really confusing when the distutils documentation (and the code) say that should accept some option, and the setuptools documentation doesn’t specifically say that that option has been removed, but it doesn’t work.

In this case, the setuptools documentation says “because setuptools’ approach to determining the contents of a source distribution is so much simpler, its sdist command omits nearly all of the options that the distutils’ more complex sdist process requires” but that’s not good enough. Every single one of the removed options should be listed by name, so you can search for it, and there should be a one- or two-sentence explanation of why you don’t need it. A project policy of writing documentation in that level of detail for every last difference between setuptools and distutils would also mean that unintentional differences are more likely to get noticed.