Archspec: a library for labeling optimized binaries

tgamblin · February 9, 2020, 6:51am

There hasn’t (so far) been a standard for naming and comparing optimized binaries; most systems just call things by the ISA family (e.g. x86_64 or ppc64le), without much more information.

We’ve pulled a library out of Spack that could be used to label optimized wheels. We’re calling it archspec:

The library does several things that will be of interest to packagers:

Detects the microarchitecture of your machine (i.e., not just x86_64, but haswell, skylake, thunderx2, or power9le.
Compares microarchitectures for compatibility. You can say things like skylake > haswell to test whether a skylake machine can run haswell-optimized binaries (you’ll get True).
Query the features available on a particular microarchitecture. You can ask, e.g. 'sse3' in haswell or 'neon' in thunderx2
Ask what compiler flags to use to get different compilers to output binaries for a specific target. e.g., you can ask what flags to use on gcc, at a particular version, to get a binary for haswell.

The library also defines a set of canonical and hopefully familiar names for microarchitectures. The list currently looks like this (from spack output):

$ spack arch --known-targets
Generic architectures (families)
    aarch64  arm  ppc  ppc64  ppc64le  ppcle  sparc  sparc64  x86  x86_64

GenuineIntel - x86
    i686  pentium2  pentium3  pentium4  prescott

GenuineIntel - x86_64
    nocona  nehalem   sandybridge  haswell    skylake  skylake_avx512  cascadelake
    core2   westmere  ivybridge    broadwell  mic_knl  cannonlake      icelake

AuthenticAMD - x86_64
    k10  bulldozer  zen  piledriver  zen2  steamroller  excavator

IBM - ppc64
    power7  power8  power9

IBM - ppc64le
    power8le  power9le

Cavium - aarch64
    thunderx2

Fujitsu - aarch64
    a64fx

We use this library in Spack to ensure:

that every binary package is built for a specific target
that we can use a particular binary package on a given host machine.

The detection logic and library bindings are currently for Python, but the domain knowledge (features, arch names, etc.) is all in a generic json file with a schema, which we hope can enable people to easily build other language bindings. Currently the library supports macOS and Linux (Windows help would be great).

We’d love to get more contributions to keep the data up to date, and we’re hoping that if this takes off, it’ll enable people to more easily distribute optimized binary packages and containers.

Comments/suggestions/contributions welcome! For more info, see this talk from FOSDEM.

sumanah · February 10, 2020, 11:38pm

This is SO GREAT (in my opinion)!!

arcivanov · February 14, 2020, 5:06pm

There is an issue with limiting itself to the architecture due to the fact that large cloud providers order themselves customized CPUs with some of the features missing from the architecture. I had numerous cases even going back to 2010 where you’re compiling for AWS architecture and get an Illegal Instruction fault, dig in into the spec and find that a specific instruction set is missing from the otherwise standard architecture.

So you technically need to introduce an arch-specific CPU capability bit field and do an and with available distros’ bitfields to see which ones are maximally compatible and do a fallback.

tgamblin · February 14, 2020, 5:22pm

The library determines the architecture by the available features, so if a cloud provider disables key features (or at least the ones that archspec models), then we detect the platform as a different architecture – e.g., if you were to disable avx-512 features on skylake, we’d identify the machine as a haswell and only use haswell binaries. So from that perspective, the features you’re talking about are already modeled.

arcivanov · February 14, 2020, 10:34pm

Then how do you designate architecture that is used in the cloud-specific CPUs?

arcivanov · February 14, 2020, 10:34pm

Because optimizing for haswell is not the same as optimizing for skylake-avx512.

arcivanov · February 14, 2020, 10:38pm

There is also PyTorch cpuinfo library that provides a very precise CPU detection feature, including SoC levels.

tgamblin · March 1, 2020, 12:19am

If there are features missing from the architecture, it needs its own name in this model. So we would add a cloud-specific CPU name (e.g., graviton2).

If the chip doesn’t have the features we expect, we’ll build for the next best thing we can find (see Incorrect arch detection · Issue #15151 · spack/spack · GitHub), so if the cloud chip isn’t in our list, we’ll pick something else.

The goal here is to have optimized binaries and to know where we can reuse them. Not necessarily to build as natively as possible for the host. We’re sacrificing some degree of specificity by having well defined names. But like I said, we should add the cloud-specific CPUs. We don’t have to only support names that, e.g., Intel defines.