Hey folks, I think it’s time we seriously consider adding SIMD support for AArch64 in our libraries. Here’s why this makes sense now.
ARM servers are everywhere these days
Look around and you’ll see ARM servers popping up everywhere. AWS has their Graviton chips, Azure is running Ampere processors, and honestly, the ARM server ecosystem has gotten pretty solid. It’s not some experimental thing anymore - companies are running real production workloads on these machines because they often get better bang for their buck.
The writing’s on the wall: ARM isn’t going anywhere, and we’re probably going to see even more of it.
We’ve already done the hard work for x86
Here’s the thing - we already know how to do SIMD optimization. Libraries like blake2
in Python already use AVX2 and other x86 SIMD instructions to speed things up significantly.
The cool part is we don’t have to start from scratch. There are tools like SIMD Everywhere (SIMDe) that basically let you write SIMD code once and run it on both x86 and ARM. It translates x86 intrinsics to ARM NEON instructions, so you can take existing optimized code and get it working on ARM without rewriting everything.
Plus, we already have some experimental support for AArch64 on macOS, which means we’ve got a head start on the ARM SIMD work. Getting this working on Linux should be much easier since we can build on that existing foundation and experience.
This makes prototyping and testing way easier than you’d expect.
ARM’s SIMD story keeps getting better
ARM’s SIMD capabilities used to be pretty basic with just NEON, but that’s changed a lot. The newer ARM architectures, especially ARMv9.2-A, now support 512-bit wide SIMD instructions. That’s the same width as some of the latest x86 stuff.
So we’re not talking about settling for worse performance on ARM - in many cases, we can match or even beat x86 performance with the right optimizations.
Which modules?
For now, I think there is not any API change. The detail is under the water.
I think we just need to support SIMD on aarch64 for Linux for 2 modules
- hamc
- blake2
Those two modules is used very common on Linux server.
And those two modules are support aarch64 on Linux now, so I think this is more easy to support it on Linux
cc @diegor