I am writing to discuss the potential of using clang-cl as the default compiler for Windows on ARM (WOA) devices to ensure consistent performance across platforms. Given its widespread acceptance in iOS devices with ARM SoCs, I believe clang-cl is highly optimized for ARM64 and can significantly enhance performance on WOA devices.
My goal is to ensure that every end user experiences the same performance on WOA devices as they do on other platform devices.
In previous discussions, the primary concern raised was ABI compatibility. I have conducted few experiments with clang-cl compiled Python and have not encountered any compatibility issues with MSVC-compiled binaries. While there were issues in the past, the LLVM team has made considerable progress in addressing them. I am specifically focused on Windows on ARM, as it is fundamentally different from Windows on x64. Currently, MSVC for ARM64 does not match the performance of clang-cl, which is far superior.
I would greatly appreciate it if anyone could provide specific examples or insights into where clang-cl might be causing trouble in terms of ABI compatibility or other issues.
As this is clearly targeted at convincing me, Iâll just say that the absence of specific examples is not convincing (in either direction).
Iâve listed a number of suitable approaches that could do this for performance critical parts of CPython, and also pointed out that your benchmarks have been significantly outdated. Iâve also given things you could do that might produce convincing outcomes - none of those were âpost on Discourseâ.
The easiest way to prove it will be to release clang-cl binaries and watch what happens. I donât have the bandwidth to provide support to people who suffer breakage, which is why Iâm not considering doing it with the upstream releases. But if you want to make your own binaries and promote them as higher performance (and take the bug reports yourself, only passing them upstream if they reproduce with the standard compiler), youâre more than welcome. If your build becomes more popular, itâll be pretty convincing proof (and thereâs a number of popularity metrics - e.g. it might be popular with package developers who receive fewer complaints about broken builds from their users, or it might be popular with ânewsâ websites - one of those is better than the other).
I have collected pyperformance data on Python 3.14.0b2. This is a comparison between the beta release and the repository compiled with clang-cl (with PGO and computed gotos enabled). The clang-cl version I used is 20.1.4
Iâd like to reiterate here that I find the benchmarking results a little suspect. The OP says:
Itâs clear that x64 performance has improved with each release, while ARM64 performance has been inconsistent, with a noticeable regression in the latest version.
However, they benchmarked minor versions of 3.11 and 3.12. We did not do any major performance work between CPython 3.11 and 3.12. Any difference theyâre seeing is likely down to noise. In fact, some benchmarks regressed slightly due to immortal objects.
My apologies if I am bothering the community regarding this topic once again, but WOA platforms are suffering significantly due to the immature software ecosystem available for Windows ARM64. My intention is not to move away from MSVC, but rather to ensure that mature and optimized binaries are available for WOA devices. One reason for advocating clang is its high efficiency on ARM64 SoCs a great example being its widespread use in Mac-based ARM64 chips.
Although we are actively exploring it, our primary concern remains the Python interpreter. Unfortunately, we havenât seen promising results so far. MSVC for ARM64 has not been able to effectively optimize the interpreter loop, which is why weâve reverted to using clang-cl for better performance.
Weâve tried encouraging our target audience to use ourlocally compiled binaries for performance gains, but that hasnât helped much unless the Python community officially accepts them. After all, the end goal is to improve the experience for everyday users. results, received poor performance ratings of device.
I personally donât believe that adopting clang-cl as the default for WOA would immediately trigger a flood of issue reports. Python on WOA still lacks a broad ecosystem of extensions compared to Windows on x64. For instance, numpy one of the most widely used libraries only released its first WOA version a few weeks ago. I am also seeing discussions on GitHub regarding major Python libraries(numpy, scipy, openblas) considering whether to adopt clang-cl for their Windows on ARM64 release binaries to enhance performance.
Also, believe itâs worth considering the use of clang-cl for Windows on ARM64 beta releases to assess whether it poses potential risks or if itâs a promising idea to adopt.
I canât speak for the differences between ARM and x64 in MSVC, but we have known for some time that clang-cl can produce faster binaries than MSVC in x64. Scroll down to âEffect of tail calling interpreterâ graph in here GitHub - faster-cpython/benchmarking-public: A public mirror of our benchmarking runner repository . The main reason I suspect being the lack of explicit computed gotos in MSVC.
The problem is that any switch in building Python has to occur at the latest the earliest beta of the in-development CPython version (currently 3.15). This is to avoid any possible problems arising for ABI incompatibilities between the different compilers (there should be minimal, but itâs not fully rulled out).
@Akash9824 when you say âweâ who are you referring to? Sorry Iâm a little confused when you use âweâ in that way.
Finally, I am willing to investigate into ARM64 performance on Windows more, but I do not have an ARM laptop or desktop, so I canât at the moment.
Initially, I commented on the performance of x64, but later I realized it would not be fair to directly compare MSVC x64 with MSVC ARM64 due to the different architecture of SoCs. However, if the two devices have the same Geekbench performance, then we can compare them, I guess. Anyway, leaving x64 asideâŠ
Even if I disable âcomputed gotosâ while compiling, I still see a significant difference between MSVC ARM64 and clang-cl ARM64 compiled binaries. i can share the data if you are interested.
My intention here is not just to discuss using clang-cl as the default compiler for WOA, but also to explore possible ABI-related issues we might face. Although I am investigating and trying to reproduce potential issues, I havenât found any so far. If there are any known issues, they should be fixed by LLVM. I would love to work on those issues as well.
When I say âwe,â I mean the team working on the performance of Windows on ARM64 applications. We have largely adopted clang-cl for WOA apps.
Being more specific you can check the performance data of arm64 binaries between 3.12.0 vs minor version 3.12.10, you will see the visible regression in 3.12.10. if you believe it outdated then also you can check between major version 3.12.0 vs 3.13.0 you will see regression in 3.13.0 compare to 3.12.0.
Note: i am talking about arm64 binaries only and running pyperformance to compare the performance.
Personally I think the main reason is that the people working on optimising Python are using GCC and just hoping that every other compiler will behave the same. I base this assumption on the complete surprise I hear from those people whenever they test with a different compiler[1] and get different results It looks like a classic case of using bias to reinforce bias (you know, that thing everyone works so hard to prevent ML from doing).
This is starting to feel deceptive. If you work for an organisation that doesnât allow you to disclose that you work there, please use either âThree Letter Acronymâ (if itâs one of those) or âFruitâ (if itâs that one) in your next post. But most companies have social media policies that require you disclose, and we operate on openness here.
Either way, it sounds like youâre funded and in a position to demonstrate the advantages and reliability of an alternative compiler, and possibly to just benefit directly from using it, so donât let the volunteers refusing to take on more work stop you from doing it.
This is not true and this entire âwho do you work for?â line of reasoning feels disingenuous. Lets please not make such demands. Nobody has to tell us publicly who they work for here and if we ask and donât get an answer or donât want to believe it, we need to move on. It is gatekeeping and isnât relevant.
Akash: From your side the best thing to read into such lines of questioning is that people are searching for a reason to trust you. If you are willing to both contribute the work to get such a clang-cl based windows release build infrastructure and automation setup and commit to ongoing support and maintenance and bugfixing of it within the CPython project in the future, thatâd be the most likely way to ever see it happen. Yet it still no guarantee itâd be accepted. As is, youâre being met with a cold reception because you are coming into a community with a presumed-real desire but not offering signs that you can help do both the up front and long term work to make it real.
People see no reason to rally around your, to them esoteric, cause.
Using clang+llvm as our compiler on Windows (all platforms) would be a good thing in general IMNSHO. But to me the biggest obstacle to doing that is that, frankly, >90% of core devs do not care al all about working on the Windows platform. On top of that, Microsoft has proven they do not care about CPython. This leaves the very few core devs willing to spend their time caring, ex: Steve who is (unfortunately for him ) heroically holding a lot of our Python on Windows support together.
If someone claims it works better for windows on arm, be inclined to believe them. Because thereâs no reason to come telling us that if it werenât true because itâd quickly become obvious if it were not if any of us cared to spend time on such an unusual platform. We donât appear to have anyone core team associated with the time or ability to care? So such a change is a non-starter until we do.
Why do I have my opinion? Rust on Windows is LLVM based and its use is rapidly increasing, including within existing C and C++ application stacks, including with CPython (third party extension modules today, eventually we will wind up with Rust bits within CPython core, itâs just a matter of time and effort, but Rust is clearly the future of compiled languages). There is no reason I can see to believe clang has Windows compatibility issues other than FUD at this point. Meaning the FUD needs proving to change minds. But given our limited core team dedicated to the Windows platform is understandably biased against spending their valuable time finding out and working through how to validate and guarantee that in our project, we cannot make such demands of them.
The entire clang-cl windows tool chain exists in significant part because Visual C++ was growing incapable of compiling and linking the Chromium codebase thus Google invested in making it a thing. ABI compatibility was a specific goal of the project. Rustâs reliance on LLVM and linkage into existing C/C++ applications makes this only more so. Iâm not worried. But Iâm not who needs convincing as I donât do Windows.