PEP 774: Removing the LLVM requirement for JIT builds

nas · January 29, 2025, 6:00pm

I’m not suggesting to use git submodules. I find them awkward to use as well. Just a regular git repo that you checkout. The build can do the following. If you don’t have LLVM installed and have JIT enabled, look for the stencil git checkout folder. If it’s not there, either do the checkout or offer the user to do themselves. If it’s there, we check that the correct version of the stencil file exists. If not, the stencil repo needs to be updated (or the user needs to install LLVM).

ncoghlan · January 30, 2025, 2:13am

This makes the production of the release tarballs more complicated, though (since those would need to include the stencils if we go down this path), as well as making them diverge further from the git repo contents.

I’m wondering if rather than trying to support JIT-enabled builds without LLVM installed, we should instead consider the JIT as being akin to the PGO and LTO optimization features: leave it disabled in the default build, but turn it on for ./configure --enable-optimizations.

That way, git clone ... && cd cpython && ./configure && make will continue to work without LLVM, but ./configure --enable-optimizations will fail if you don’t have a suitable version of LLVM installed.

Folks actively working on the JIT would still need to pass ./configure --enable-jit=yes or ./configure --enable-jit=yes-off (I’m assuming a hypothetical future where the JIT config flag has lost the “experimental” qualifier, since that’s the premise of PEP 774)

(I’m also not sure how all this works with the Windows build)

kj0 · January 30, 2025, 3:11am

+1 to considering it akin to PGO and LTO. I think it’s a safe thing to do. I think that needs updating of PEP 744.

mcepl · January 30, 2025, 5:32pm

You are sadly right.

nas · January 30, 2025, 10:45pm

Regarding growth the repo size:

However, this approach does result in a slight increase in overall repository size. Comparing repo growth on commits over the past 90 days, the difference between the actual commits and the same commits with stencils added amounts to a difference of 0.03 MB per stencil file. This is a small increase in the context of the overall repository size, which has grown by 2.55 MB in the same time period. For six stencil files, this amounts to an upper bound of 0.18 MB. The current total size of the stencil files for all six platforms is 7.2 MB.

This growth is my main concern with putting the stencil files directly in the repo. If it was a one time 7.2 MB cost, I’d say fine, do it. However, it’s not. The JIT_DEPS in the Makefile are:

JIT_DEPS = \
                $(srcdir)/Tools/jit/*.c \
                $(srcdir)/Tools/jit/*.py \
                $(srcdir)/Python/executor_cases.c.h \
                pyconfig.h

Any time one of those files changes, the stencils need to be regenerated. Because they are output from the compiler, a one line change in one of those input files could potentially cause a lot of changes to the stencils. Over the past few weeks, the executor_cases.c.h files has changed many times.

In 6 months, could the storage used in the repo for stencils be 100 MB, rather than 7 MB? If so, I think that’s kind of a heavy cost to pay in order to avoid installing LLVM. The current .git folder is about 800 MB. Once that data goes in there, it’s in there forever. You can’t remove it later.

I won’t “fight to the death” on this issue. If other core devs think it’s fine to add the JIT stencils directly to the repo, I’ll accept the consensus. To me it doesn’t seem worth the storage cost.

nas · January 31, 2025, 12:36am

[Edit: note this size estimation omits running git gc, which makes a huge difference. See below.]

My “100 MB” size increase is a wild guess so I thought it’s a good idea to try to get a better estimate. I generated a list of git IDs for all recent changes to executor_cases.c.h and pyconfig.h.in. I then ran configure and make regen-jit on all those versions and committed the stencil file into a separate repo.

My changes cover development from Feb 2, 2024 until today (Jan 30). There are 180 revisions of the stencil file. Note that this is only the stencil for one platform so multiplying these sizes by 6x would be reasonable. The total size of the .git folder is 35 MB. For that, I would estimate the cpython repo to have grown by 210 MB over those approximately 12 months if stencils had been committed over that time period for all 6 platforms.

That would be 1.26x in size compared current repo size.

Looking closer at how the JIT tools work, I think this is actually an underestimate. The JIT_DEPS list of files is not actually the true list of dependencies. The compiler needs most of the header files from under Include (otherwise executor_cases.c.h could not be compiled). So any change in those headers that affects code in the bytecode implementations would also result in a change to the generated stencils.

savannahostrowski · January 31, 2025, 12:49am

Two relevant points here in response to the assertion that this is an underestimate:

The stencils are only committed back if they change and not every change that touches JIT-related files requires stencil regeneration.
GitHub does pretty aggressive repo packing on clone etc. (very similar to what you can emulate with git gc --aggressive).

FWIW, the calculation I added to the PEP resulted from replaying every commit over the last 90 days and then committing the stencils if they changed. For posterity, here’s the script I used to calculate the change over time: Calculate JIT stencil growth · GitHub.

nas · January 31, 2025, 1:18am

My approach does the same in terms of commit if there is a change. The key difference is using git gc. That makes a huge difference for me and the repo is now only 1.7 MB. Adding 10 MB of data to the repo over a 12 month period is not a problem, in my opinion.

We could reduce the size by stripping the comments before committing, as mentioned in the PEP. If they provide value to people working on the JIT (wouldn’t they have LLVM anyhow though?), then I would say they can stay for now.

Another easy optimization that would save a little space, I see changes like this:

@@ -359,7 +359,7 @@ static const Hole _BEFORE_ASYNC_WITH_data_holes[1];

 // _BEFORE_WITH
 //
-// /tmp/tmpwhrw5lfl/_BEFORE_WITH.o:        file format elf64-x86-64
+// /tmp/tmp5o8ugd7x/_BEFORE_WITH.o:        file format elf64-x86-64
 //
 // Disassembly of section .text:
 //

It would be easy to strip the /tmp/tmp*/ part of the name before committing (or when generating the file).

savannahostrowski · January 31, 2025, 1:47am

Yeah, there are definitely optimizations we can explore to make the files smaller.

For the particular optimization you’ve pointed out, Brandt actually addressed this in this commit, which made the stencils more reproducible. The script I linked cherry-picks this commit out, so your estimate should be even smaller.

diegor · February 5, 2025, 3:07pm

Savannah, thanks for the PEP and the reference implementation. I think the main issues have been already discussed, and there is a plan to address them.

As someone who works on the JIT, I want to stress the importance of regenerating these stencils: this should be streamlined in the normal flow of compilation (configure; make;…).
This means, even if the stencils present in the repo match my build type, I should be able to regenerate them anyway (maybe because I have a new version of llvm underneath that needs to be tested). Also in the output log of the compilation, the regeneration or not needs to be explicit.

I agree with the assumption of “if you work on the JIT, you need to have LLVM installed,” but as @nas said, if executor_cases.c.h changes, the stencils need to be regenerated.

I think we need to do a step back and try to understand why we are “forced” to use LLVM instead of GCC or MSVC.
We rely heavily on the features available on LLVM only (e.g. preserve_none) that other compilers don’t have (yet). I’m sure @brandtbucher can say more about it.

I think it’s worth analysing what gaps other compilers have and seeing if we can make a case to get these features implemented. Ideally, when compilers have feature parity, JIT stencils can be generated using the same compiler used to compile CPython, hence no need anymore to host them in git.

As an example, preserve_none is not present in GCC, and because of @kj0’s tail-calling interpreter, we had a strong case to request it. In fact, there is already a POC implemented for AArch64: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118328

I know it’s a long game, but we should push towards that direction, and we should be treating these JIT stencils as object files, as suggested by @nas earlier.

In the meantime, I support the proposal to check in the JIT stencils, but in the future, when conditions are met, we should reconsider their removal from the repo and generate them at build time.

savannahostrowski · February 5, 2025, 8:39pm

It’s a fair point that in the longer term, we should consider whether we can get all the compiler features into GCC, etc. It’d also be nice to figure out if we can get cross-compilation working for the stencils, while we still use LLVM to generate the stencils (that’d also streamline the CI workflow a bit). That might require restructuring some header files to be more conservative about what’s needed for the JIT, but also something I briefly explored.

As with all of this, baby steps. I don’t want perfect to be the enemy of good here. There are many build improvements I’d like to see us make in the longer term.

brandtbucher · February 10, 2025, 10:51pm

Thanks for taking the time to explore the options here, @savannahostrowski! I support the conclusion you’ve arrived at. We want to make it as safe, easy, and attractive as possible for people to try the JIT out this year, and I think that this is a big step towards that goal.

My favorite parts of this proposal are:

How simple it is to see the actual changes in the machine code during the PR review process, across all platforms. This benefit alone makes it extremely attractive, especially considering it invites higher-quality assembly code review from outside experts. Even just for seeing before-and-after when iterating on the JIT locally, this is a much nicer workflow than what we have now.
The support for reproducible builds, which I imagine will make distributors and release managers more confident that the JIT they’re shipping to their users is the exact same one we’ve been testing in CI and reviewing, even if they want to rebuild the stencils themselves.

A few thoughts I had reading through this thread:

My personal opinion is that we may still explore superinstructions in the future. I haven’t looked at it in a while, but register allocation (stack caching) may help here, since all of the superinstruction approaches I’ve seen still read/write operands on the frame’s value stack between instructions. I suspect that really reduces Clang’s ability to optimize across uops. We have a pretty high-level IR compared to other languages, so giving the C compiler as much context as possible may pay off quite a bit.

But I don’t want to derail the discussion (ping me on a relevant issue if you disagree). I’ll just add that build times matter a lot less if the stencils are checked in.

Eh, they’re kinda two ways of doing the same thing. The original implementation I landed used to do just that, and it was about the same size. I reworked it almost a year ago into its current form, which I personally feel is quite a bit cleaner, easier to maintain, and probably a bit faster at runtime.

See, this is why I want more people reviewing these files!

Nice observation, thanks. I’ll have a PR up soon to skip these zero bytes.

We currently use LLVM for a few reasons:

It has a C compiler (Clang) that supports __attribute__((musttail)) and __attribute__((preserve_none)), which we need for this approach to work.
It ships with two nice utilities. llvm-objdump allows us to place the human-readable disassembly of the code alongside the raw bytes in the generated header. llvm-readobj allows us to convert any object file format to JSON(!), which is a godsend for parsing out the info we need in a more-or-less cross-platform way.
It supports all of the platforms that we care about.
We’ve put quite a bit of time into figuring out the right compiler options to whisper to Clang to get what we want.

So while it’s necessary that any new toolchain needs to support the two attributes we use, it’s not really sufficient. In a hypothetical future where Clang, GCC, and MSVC support these two attributes (which is great for the new tail-calling interpreter), using them to build the JIT still blows up the complexity of an already-quite-complex build process, not to mention tripling the work required for issues like support for stack unwinding, etc.

And if we’re building them in CI and hosting them anyways, I really struggle to see the benefit of officially supporting more toolchains just because we can.

savannahostrowski · February 11, 2025, 3:16am

Thanks for the great discussion, everyone! Based on the feedback, I’ve opened a PR to update the PEP with details on how the repo growth factor was calculated, along with a link to a Gist containing the script.

Since it sounds like most folks are supportive of the PEP—and I was able to address concerns about the impact of the stencils on repo size—I plan to submit this for Steering Council review toward the end of the week. Please let me know if there’s anything else you’d like to see added before then.

gpshead · March 14, 2025, 9:05pm

The Python Steering Council has discussed “PEP 774 - Removing the LLVM requirement for JIT builds” and decided that we think this one should be Deferred for the time being.

We generally agree with the approach proposed, but our reasoning is that the JIT work is not bearing meaningful performance wins yet. This is the kind of build change we expect to be taken when expecting the JIT to no longer be an experimental build feature. But it is too soon to consider the JIT not experimental so we feel it is too soon to do this.

Thus deferring it until the JIT is closer to some form of “graduation” in that regard.

It is good to have written up, so thank you for that from all of us and from your future selves! If all goes as we expect the JIT authors are hoping for this will be revisited in the future.

savannahostrowski · March 14, 2025, 9:31pm

Thanks for the update and the SC’s consideration of this PEP. I think this is a totally reasonable verdict, and we can revisit this once the JIT has paid some more tangible dividends.

I will move the draft PR’s POC infrastructure into a separate repo to build stencils across platforms in CI, which will be helpful in testing.

hroncok · March 26, 2025, 4:55pm

So, technically, if I want to implement this on a downstream level, what would I do?

I built Python with ./configure—-enable-experimental-jit=yes-off and have the proper LLVM version installed. The stencils are generated in jit_stencils.h – I save them – for each architecture (and for each build type debug/optimized?).
Elsewhere. I don’t have LLVM, I copy the jit_stencils.h from 1 to the build directory. I build Python with ./configure—-enable-experimental-jit=yes-off.

That’s it?

hroncok · March 26, 2025, 5:55pm

I keep getting the makefile trying to rebuild the file even when I touch it, pehrtaps because it is too clever knowing files in $(JIT_DEPS) were updated.

Invoking make with JIT_DEPS= probably makes it work. However, the PGO task calls make clean and deletes the file.

hroncok · March 27, 2025, 7:33am

Unless I am doing it wrong, would it be viable to support my case in the configure/makefile files somehow? E.g. something like --with-jit-stencil=my-file.h which would copy it instead of try to build it?

hroncok · April 11, 2025, 5:00pm

Should I rather open a new topic with my question?

brettcannon · April 11, 2025, 5:03pm

I would do that.